Command Line Interface

All Schema Automator functionality is available via the schemauto command.

Preamble

Warning

Previous versions had specific commands like tsv2linkml these are now deprecated. Instead these are now subcommands of the main schemauto command, and have been renamed.

Note

we follow the CLIG guidelines as far as possible

Main commands

schemauto

Run the LinkML Schema Automator Command Line.

A subcommand must be passed, for example:

schemauto SUBCOMMAND [OPTIONS] ARGUMENTS

To see logging or debugging info, the verbosity flag should be specified BEFORE the subcommand:

schemauto -vv SUBCOMMAND [OPTIONS] ARGUMENTS

Usage

schemauto [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose

Set the level of verbosity

-q, --quiet <quiet>

Silence all diagnostics

-V, --version

Show the version and exit.

adapt-dbgap

Translate a dbGaP variable digest (data_dict.xml, optionally with the matching var_report.xml) into the canonical schema-automator data dictionary format.

DATA_DICT_XML is the path to a dbGaP *.data_dict.xml file. The optional –var-report enriches the output with empirical signals (numeric min/max, calculated_type fallback). Output defaults to YAML on stdout; –tsv emits the canonical TSV serialization; -o writes to a file.

Usage

schemauto adapt-dbgap [OPTIONS] DATA_DICT_XML

Options

--var-report <var_report_path>

Optional path to the matching var_report.xml. Provides empirical min/max for numeric variables and the calculated_type fallback used when the data_dict <type> element is empty or ambiguous.

-o, --output <output>

path to output file or directory.

--tsv, --yaml

Output format. Default is YAML (lossless structured codes). –tsv emits the canonical DD TSV serialization grammar.

Arguments

DATA_DICT_XML

Required argument

adapt-frictionless

Translate between Frictionless Table Schema and the canonical schema-automator data dictionary format.

INPUT is a path to a JSON or YAML file. By default the input is a Frictionless Table Schema and the output is a canonical data dictionary (YAML). With –reverse, the input is a data dictionary and the output is a Frictionless Table Schema (JSON).

Usage

schemauto adapt-frictionless [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

--reverse, --no-reverse

Reverse direction: read a canonical DD and emit a Frictionless Table Schema. Default reads Frictionless and emits DD.

--from-package, --from-schema

Treat input as a full Frictionless Data Package (datapackage.json) and extract the first resource’s schema. Default treats input as a standalone Table Schema.

Arguments

INPUT

Required argument

annotate-schema

Annotate all elements of a schema.

This uses OAK (https://incatools.github.io/ontology-access-kit), and you can provide any OAK backend that supports text annotation.

At this time, the best choice is likely the bioportal backend

Example:

schemauto annotate-schema -i bioportal: my-schema.yaml -o annotated.yaml

This will require you setting the API key via OAK - see OAK docs.

You can specify a specific ontology

schemauto annotate-schema -i bioportal:ncbitaxon my-schema.yaml -o annotated.yaml

In future OAK will support a much wider variety of annotators including:

  • OLS

  • SciSpacy

  • NLTK

  • OGER

To see all possible selectors, see the OAK docs:

Usage

schemauto annotate-schema [OPTIONS] SCHEMA

Options

--curie-only, --no-curie-only

if set, only use results that are mapped to CURIEs

Default:

False

-i, --input <input>

OAK input ontology selector

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

annotate-using-jsonld

Annotates a schema using a Json-LD context file

Usage

schemauto annotate-using-jsonld [OPTIONS] SCHEMA

Options

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

enrich-using-llm

Enrich a schema using an LLM.

Example:

schemauto enrich-using-llm -m gpt-4-turbo my-schema.yaml -o my-enriched.yaml

This will enrich the schema by adding missing description fields. In future other enrichments may be possible.

Note for this to work, you will need to have LLM installed as an extra.

Example:

pip install schema-automator[llm]

Usage

schemauto enrich-using-llm [OPTIONS] SCHEMA

Options

-m, --model <model>

Name of model

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

enrich-using-ontology

Enrich a schema using an ontology.

Here, “enrich” means copying over metadata from the ontology to the schema. For example, if the schema has a class “Gene” that is mapped to a SO class for “gene”, then calling this command will copy the SO class definition to the schema class.

This will use OAK to add additional metadata using uris and mappings in the schema.

See the OAK docs for options for which annotators to use; examples include:

  • bioportal: # (include the colon) any ontology in bioportal

  • bioportal:umls # a specific ontology in bioportal

  • my.obo # any local OBO file

  • sqlite:obo:cl # a specific OBO file or semsql registered ontology

For example, if your schema has a class with a mapping to a SO class, then the definition of that will be copied to the class description.

Example:

schemauto enrich-using-ontology -i bioportal: my-schema.yaml -o my-enriched.yaml

If your schema has no mappings you can use –annotate to add them

Example:

schemauto enrich-using-ontology -i so.obo --annotate my-schema.yaml -o my-enriched.yaml --annotate

Usage

schemauto enrich-using-ontology [OPTIONS] SCHEMA

Options

-i, --input <input>

OAK input ontology selector

--annotate, --no-annotate

If true, annotate the schema

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

generalize-htmltable

Generalizes from a table parsed from a URL

Uses pandas/beautiful soup.

Note: if the website cannot be accessed directly, you can download the HTML and pass in an argument of the form file:///absolute/path/to/file.html

Usage

schemauto generalize-htmltable [OPTIONS] URL

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

-s, --column-separator <column_separator>

separator

--downcase-header, --no-downcase-header

if true make headers lowercase

--snakecase-header, --no-snakecase-header

if true make headers snakecase

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--infer-optional, --no-infer-optional

mark slots as not required when columns have null or empty values (ignored in pandera mode)

--infer-mixed-types, --no-infer-mixed-types

use any_of to represent columns with mixed types

--infer-enum-from-integers, --no-infer-enum-from-integers

treat low-cardinality integer columns as enum candidates

-c, --class-name <class_name>

Core class name in schema

--pandera, --no-pandera

set to use panderas as inference engine

--data-output <data_output>

Path to file of downloaded data

--table-number <table_number>

If URL has multiple tables, use this one (zero-based)

Default:

0

Arguments

URL

Required argument

generalize-json

Generalizes from a JSON file to a schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-json my/data/persons.json -o my.yaml

Usage

schemauto generalize-json [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

--container-class-name <container_class_name>

name of root class

-f, --format <format>

json or yaml (or json.gz or yaml.gz) or frontmatter

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-mask-columns <enum_mask_columns>

column(s) that are excluded from being enums

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--omit-null, --no-omit-null

if true, ignore null values

--inlined-map <inlined_map>

SLOT_NAME.KEY pairs indicating which slots are inlined as dict

--depluralize, --no-depluralized

Auto-depluralize class names to singular form

Default:

True

Arguments

INPUT

Required argument

generalize-rdf

Generalizes from an RDF file to a schema

See Generalizers for more on the generalization framework

The input must be in turtle

Example:

schemauto generalize-rdf my/data/persons.ttl

Usage

schemauto generalize-rdf [OPTIONS] RDFFILE

Options

-o, --output <output>

path to output file or directory.

-d, --dir <dir>

Required

Arguments

RDFFILE

Required argument

generalize-toml

Generalizes from a TOML file to a schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-toml my/data/conf.toml -o my.yaml

Usage

schemauto generalize-toml [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

--container-class-name <container_class_name>

name of root class

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-mask-columns <enum_mask_columns>

column(s) that are excluded from being enums

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--omit-null, --no-omit-null

if true, ignore null values

Arguments

INPUT

Required argument

generalize-tsv

Generalizes from a single TSV file to a single-class schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-tsv --class-name Person --schema-name PersonInfo my/data/persons.tsv

Usage

schemauto generalize-tsv [OPTIONS] TSVFILE

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

-A, --annotator <annotator>

name of annotator to use for auto-annotating results. Must be an OAK selector

-c, --class-name <class_name>

Core class name in schema

-s, --column-separator <column_separator>

separator

--downcase-header, --no-downcase-header

if true make headers lowercase

--snakecase-header, --no-snakecase-header

if true make headers snakecase

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--infer-optional, --no-infer-optional

mark slots as not required when columns have null or empty values (ignored in pandera mode)

--infer-mixed-types, --no-infer-mixed-types

use any_of to represent columns with mixed types

--infer-enum-from-integers, --no-infer-enum-from-integers

treat low-cardinality integer columns as enum candidates

--data-dictionary-row-count <data_dictionary_row_count>

rows that provide metadata about columns

--robot, --no-robot

set if the TSV is a ROBOT template

--pandera, --no-pandera

set to use panderas as inference engine

Arguments

TSVFILE

Required argument

generalize-tsvs

Generalizes from a multiple TSV files to a multi-class schema

See Generalizers for more on the generalization framework

This uses CsvDataGeneralizer.convert_multiple

Example:

schemauto generalize-tsvs --class-name Person --schema-name PersonInfo my/data/*.tsv

Usage

schemauto generalize-tsvs [OPTIONS] [TSVFILES]...

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

-s, --column-separator <column_separator>

separator

--downcase-header, --no-downcase-header

if true make headers lowercase

--snakecase-header, --no-snakecase-header

if true make headers snakecase

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--infer-optional, --no-infer-optional

mark slots as not required when columns have null or empty values (ignored in pandera mode)

--infer-mixed-types, --no-infer-mixed-types

use any_of to represent columns with mixed types

--infer-enum-from-integers, --no-infer-enum-from-integers

treat low-cardinality integer columns as enum candidates

--robot, --no-robot

set if the TSV is a ROBOT template

Arguments

TSVFILES

Optional argument(s)

import-cadsr

Imports from CADSR CDE JSON API output to LinkML

See Importers for more on the importer framework

Example:

schemauto import-cadsr "cdes/*.json"

Usage

schemauto import-cadsr [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

--schema-id <schema_id>

Schema id

Arguments

INPUT

Required argument

import-dosdps

Imports DOSDP pattern YAML to a LinkML schema

See Importers for more on the importers framework

Example:

schemauto import-dosdps --range-as-enums patterns/*.yaml -o my-schema.yaml

Usage

schemauto import-dosdps [OPTIONS] [DPFILES]...

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

--range-as-enums, --no-range-as-enums

Model range ontology classes as enums

Arguments

DPFILES

Optional argument(s)

import-frictionless

Imports from Frictionless data package to LinkML

See Importers for more on the importer framework

Example:

schemauto import-frictionless cfde.package.json

Usage

schemauto import-frictionless [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

--schema-id <schema_id>

Schema id

Arguments

INPUT

Required argument

import-json-schema

Imports from JSON Schema to LinkML

See Importers for more on the importer framework

Example:

schemauto import-json-schema my/schema/personinfo.schema.json

Usage

schemauto import-json-schema [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

--use-attributes, --no-use-attributes

If true, use attributes over slots/slot_usage

--is-openapi, --no-is-openapi

If true, use OpenAPI schema style

Default:

False

--import-project, --no-import-project

If true, then the input path should be a directory with multiple schema files

-f, --format <format>

JSON Schema format - yaml or json

Arguments

INPUT

Required argument

import-kwalify

Imports from Kwalify Schema to LinkML

See Importers for more on the importer framework

Example:

schemauto import-kwalify my/schema/personinfo.kwalify.yaml

Usage

schemauto import-kwalify [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

--use-attributes, --no-use-attributes

If true, use attributes over slots/slot_usage

Arguments

INPUT

Required argument

import-owl

Import an OWL ontology to LinkML

Note:
  • this works best for “schema-style” ontologies

  • input must be in functional syntax

See Importers for more on the importer framework

For a list of caveats on LinkML to OWL mapping, see:

Example:

schemauto import-owl prov.ofn -o my.yaml

Usage

schemauto import-owl [OPTIONS] OWLFILE

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

-I, --identifier <identifier>

Slot to use as identifier

--model-uri <model_uri>

Model URI prefix

-o, --output <output>

Path to saved yaml schema

Arguments

OWLFILE

Required argument

import-rdfs

Import an RDFS schema to LinkML

Example:

schemauto import-rdfs prov.rdfs.ttl -o prov.yaml

Usage

schemauto import-rdfs [OPTIONS] RDFSFILE

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

-f, --format <format>

Input format, eg. turtle

-I, --identifier <identifier>

Slot to use as identifier

--model-uri <model_uri>

Model URI prefix

--metamodel-mappings <metamodel_mappings>

Path to metamodel mappings YAML dictionary

-o, --output <output>

Path to saved yaml schema

Arguments

RDFSFILE

Required argument

import-sql

Imports a schema by introspecting a relational database

See Importers for more on the importers framework

Usage

schemauto import-sql [OPTIONS] DB

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

Arguments

DB

Required argument

import-xsd

Import an XML Schema Definition Language (XSD) schema to LinkML

Example:

schemauto import-xsd schema.xml -o prov.yaml

Usage

schemauto import-xsd [OPTIONS] XSD

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

'MySchema'

-o, --output <output>

Path to saved yaml schema

Arguments

XSD

Required argument