Command Line Interface¶

All Schema Automator functionality is available via the schemauto command.

Preamble¶

Warning

Previous versions had specific commands like tsv2linkml these are now deprecated. Instead these are now subcommands of the main schemauto command, and have been renamed.

Note

we follow the CLIG guidelines as far as possible

Main commands¶

schemauto¶

Run the LinkML Schema Automator Command Line.

A subcommand must be passed, for example:

schemauto SUBCOMMAND [OPTIONS] ARGUMENTS

To see logging or debugging info, the verbosity flag should be specified BEFORE the subcommand:

schemauto -vv SUBCOMMAND [OPTIONS] ARGUMENTS

schemauto [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose¶: Set the level of verbosity

-q, --quiet <quiet>¶: Silence all diagnostics

-V, --version¶: Show the version and exit.

annotate-schema¶

Annotate all elements of a schema.

This uses OAK (https://incatools.github.io/ontology-access-kit), and you can provide any OAK backend that supports text annotation.

At this time, the best choice is likely the bioportal backend

Example:

schemauto annotate-schema -i bioportal: my-schema.yaml -o annotated.yaml

This will require you setting the API key via OAK - see OAK docs.

You can specify a specific ontology

schemauto annotate-schema -i bioportal:ncbitaxon my-schema.yaml -o annotated.yaml

In future OAK will support a much wider variety of annotators including:

OLS

SciSpacy

NLTK

OGER

To see all possible selectors, see the OAK docs:

https://incatools.github.io/ontology-access-kit/selectors.html

schemauto annotate-schema [OPTIONS] SCHEMA

Options

--curie-only, --no-curie-only¶

if set, only use results that are mapped to CURIEs

Default:: False

-i, --input <input>¶: OAK input ontology selector

-o, --output <output>¶: path to output file or directory.

Arguments

SCHEMA¶: Required argument

annotate-using-jsonld¶

Annotates a schema using a Json-LD context file

schemauto annotate-using-jsonld [OPTIONS] SCHEMA

Options

-o, --output <output>¶: path to output file or directory.

Arguments

SCHEMA¶: Required argument

enrich-using-llm¶

Enrich a schema using an LLM.

Example:

schemauto enrich-using-llm -m gpt-4-turbo my-schema.yaml -o my-enriched.yaml

This will enrich the schema by adding missing description fields. In future other enrichments may be possible.

Note for this to work, you will need to have LLM installed as an extra.

Example:

pip install schema-automator[llm]

schemauto enrich-using-llm [OPTIONS] SCHEMA

Options

-m, --model <model>¶: Name of model

-o, --output <output>¶: path to output file or directory.

Arguments

SCHEMA¶: Required argument

enrich-using-ontology¶

Enrich a schema using an ontology.

Here, “enrich” means copying over metadata from the ontology to the schema. For example, if the schema has a class “Gene” that is mapped to a SO class for “gene”, then calling this command will copy the SO class definition to the schema class.

This will use OAK to add additional metadata using uris and mappings in the schema.

See the OAK docs for options for which annotators to use; examples include:

bioportal: # (include the colon) any ontology in bioportal

bioportal:umls # a specific ontology in bioportal

my.obo # any local OBO file

sqlite:obo:cl # a specific OBO file or semsql registered ontology

For example, if your schema has a class with a mapping to a SO class, then the definition of that will be copied to the class description.

Example:

schemauto enrich-using-ontology -i bioportal: my-schema.yaml -o my-enriched.yaml

If your schema has no mappings you can use –annotate to add them

Example:

schemauto enrich-using-ontology -i so.obo --annotate my-schema.yaml -o my-enriched.yaml --annotate

schemauto enrich-using-ontology [OPTIONS] SCHEMA

Options

-i, --input <input>¶: OAK input ontology selector

--annotate, --no-annotate¶: If true, annotate the schema

-o, --output <output>¶: path to output file or directory.

Arguments

SCHEMA¶: Required argument

generalize-htmltable¶

Generalizes from a table parsed from a URL

Uses pandas/beautiful soup.

Note: if the website cannot be accessed directly, you can download the HTML and pass in an argument of the form file:///absolute/path/to/file.html

schemauto generalize-htmltable [OPTIONS] URL

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

-s, --column-separator <column_separator>¶: separator

--downcase-header, --no-downcase-header¶: if true make headers lowercase

--snakecase-header, --no-snakecase-header¶: if true make headers snakecase

-E, --enum-columns <enum_columns>¶: column(s) that is forced to be an enum

--enum-threshold <enum_threshold>¶: if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>¶: do not create an enum if more than max distinct members

-c, --class-name <class_name>¶: Core class name in schema

--pandera, --no-pandera¶: set to use panderas as inference engine

--data-output <data_output>¶: Path to file of downloaded data

--table-number <table_number>¶

If URL has multiple tables, use this one (zero-based)

Default:: 0

Arguments

URL¶: Required argument

generalize-json¶

Generalizes from a JSON file to a schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-json my/data/persons.json -o my.yaml

schemauto generalize-json [OPTIONS] INPUT

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

--container-class-name <container_class_name>¶: name of root class

-f, --format <format>¶: json or yaml (or json.gz or yaml.gz) or frontmatter

-E, --enum-columns <enum_columns>¶: column(s) that is forced to be an enum

--enum-mask-columns <enum_mask_columns>¶: column(s) that are excluded from being enums

--max-enum-size <max_enum_size>¶: do not create an enum if more than max distinct members

--enum-threshold <enum_threshold>¶: if the number of distinct values / rows is less than this, do not make an enum

--omit-null, --no-omit-null¶: if true, ignore null values

--inlined-map <inlined_map>¶: SLOT_NAME.KEY pairs indicating which slots are inlined as dict

--depluralize, --no-depluralized¶

Auto-depluralize class names to singular form

Default:: True

Arguments

INPUT¶: Required argument

generalize-rdf¶

Generalizes from an RDF file to a schema

See Generalizers for more on the generalization framework

The input must be in turtle

Example:

schemauto generalize-rdf my/data/persons.ttl

schemauto generalize-rdf [OPTIONS] RDFFILE

Options

-o, --output <output>¶: path to output file or directory.

-d, --dir <dir>¶: Required

Arguments

RDFFILE¶: Required argument

generalize-toml¶

Generalizes from a TOML file to a schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-toml my/data/conf.toml -o my.yaml

schemauto generalize-toml [OPTIONS] INPUT

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

--container-class-name <container_class_name>¶: name of root class

-E, --enum-columns <enum_columns>¶: column(s) that is forced to be an enum

--enum-mask-columns <enum_mask_columns>¶: column(s) that are excluded from being enums

--max-enum-size <max_enum_size>¶: do not create an enum if more than max distinct members

--enum-threshold <enum_threshold>¶: if the number of distinct values / rows is less than this, do not make an enum

--omit-null, --no-omit-null¶: if true, ignore null values

Arguments

INPUT¶: Required argument

generalize-tsv¶

Generalizes from a single TSV file to a single-class schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-tsv --class-name Person --schema-name PersonInfo my/data/persons.tsv

schemauto generalize-tsv [OPTIONS] TSVFILE

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

-A, --annotator <annotator>¶: name of annotator to use for auto-annotating results. Must be an OAK selector

-c, --class-name <class_name>¶: Core class name in schema

-s, --column-separator <column_separator>¶: separator

--downcase-header, --no-downcase-header¶: if true make headers lowercase

--snakecase-header, --no-snakecase-header¶: if true make headers snakecase

-E, --enum-columns <enum_columns>¶: column(s) that is forced to be an enum

--enum-threshold <enum_threshold>¶: if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>¶: do not create an enum if more than max distinct members

--data-dictionary-row-count <data_dictionary_row_count>¶: rows that provide metadata about columns

--robot, --no-robot¶: set if the TSV is a ROBOT template

--pandera, --no-pandera¶: set to use panderas as inference engine

Arguments

TSVFILE¶: Required argument

generalize-tsvs¶

Generalizes from a multiple TSV files to a multi-class schema

See Generalizers for more on the generalization framework

This uses CsvDataGeneralizer.convert_multiple

Example:

schemauto generalize-tsvs --class-name Person --schema-name PersonInfo my/data/*.tsv

schemauto generalize-tsvs [OPTIONS] [TSVFILES]...

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

-s, --column-separator <column_separator>¶: separator

--downcase-header, --no-downcase-header¶: if true make headers lowercase

--snakecase-header, --no-snakecase-header¶: if true make headers snakecase

-E, --enum-columns <enum_columns>¶: column(s) that is forced to be an enum

--enum-threshold <enum_threshold>¶: if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>¶: do not create an enum if more than max distinct members

--robot, --no-robot¶: set if the TSV is a ROBOT template

Arguments

TSVFILES¶: Optional argument(s)

import-cadsr¶

Imports from CADSR CDE JSON API output to LinkML

See Importers for more on the importer framework

Example:

schemauto import-cadsr "cdes/*.json"

schemauto import-cadsr [OPTIONS] INPUT

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

--schema-id <schema_id>¶: Schema id

Arguments

INPUT¶: Required argument

import-dosdps¶

Imports DOSDP pattern YAML to a LinkML schema

See Importers for more on the importers framework

Example:

schemauto import-dosdps --range-as-enums patterns/*.yaml -o my-schema.yaml

schemauto import-dosdps [OPTIONS] [DPFILES]...

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

--range-as-enums, --no-range-as-enums¶: Model range ontology classes as enums

Arguments

DPFILES¶: Optional argument(s)

import-frictionless¶

Imports from Frictionless data package to LinkML

See Importers for more on the importer framework

Example:

schemauto import-frictionless cfde.package.json

schemauto import-frictionless [OPTIONS] INPUT

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

--schema-id <schema_id>¶: Schema id

Arguments

INPUT¶: Required argument

import-htmltable¶

Imports from a table parsed from a URL using SchemaSheets

Uses pandas/beautiful soup

schemauto import-htmltable [OPTIONS] URL

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

-c, --class-name <class_name>¶: Core class name in schema

--data-output <data_output>¶: Path to file of downloaded data

--element-type <element_type>¶: E.g. class, enum

--parent <parent>¶: parent ID

--columns <columns>¶: Required comma-separated schemasheets descriptors of each column. Must be in same order

--table-number <table_number>¶

If URL has multiple tables, use this one (zero-based)

Default:: 0

Arguments

URL¶: Required argument

import-json-schema¶

Imports from JSON Schema to LinkML

See Importers for more on the importer framework

Example:

schemauto import-json-schema my/schema/personinfo.schema.json

schemauto import-json-schema [OPTIONS] INPUT

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

--use-attributes, --no-use-attributes¶: If true, use attributes over slots/slot_usage

--is-openapi, --no-is-openapi¶

If true, use OpenAPI schema style

Default:: False

--import-project, --no-import-project¶: If true, then the input path should be a directory with multiple schema files

-f, --format <format>¶: JSON Schema format - yaml or json

Arguments

INPUT¶: Required argument

import-kwalify¶

Imports from Kwalify Schema to LinkML

See Importers for more on the importer framework

Example:

schemauto import-kwalify my/schema/personinfo.kwalify.yaml

schemauto import-kwalify [OPTIONS] INPUT

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

--use-attributes, --no-use-attributes¶: If true, use attributes over slots/slot_usage

Arguments

INPUT¶: Required argument

import-owl¶

Import an OWL ontology to LinkML

Note:

this works best for “schema-style” ontologies
input must be in functional syntax

See Importers for more on the importer framework

For a list of caveats on LinkML to OWL mapping, see:

https://linkml.io/linkml/generators/owl.html

Example:

schemauto import-owl prov.ofn -o my.yaml

schemauto import-owl [OPTIONS] OWLFILE

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

-I, --identifier <identifier>¶: Slot to use as identifier

--model-uri <model_uri>¶: Model URI prefix

-o, --output <output>¶: Path to saved yaml schema

Arguments

OWLFILE¶: Required argument

import-rdfs¶

Import an RDFS schema to LinkML

Example:

schemauto import-rdfs prov.rdfs.ttl -o prov.yaml

schemauto import-rdfs [OPTIONS] RDFSFILE

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

-f, --format <format>¶: Input format, eg. turtle

-I, --identifier <identifier>¶: Slot to use as identifier

--model-uri <model_uri>¶: Model URI prefix

--metamodel-mappings <metamodel_mappings>¶: Path to metamodel mappings YAML dictionary

-o, --output <output>¶: Path to saved yaml schema

Arguments

RDFSFILE¶: Required argument

import-sql¶

Imports a schema by introspecting a relational database

See Importers for more on the importers framework

schemauto import-sql [OPTIONS] DB

Options

-o, --output <output>¶: path to output file or directory.

-n, --schema-name <schema_name>¶

Schema name

Default:: 'MySchema'

Arguments

DB¶: Required argument

Command Line Interface¶

Preamble¶

Main commands¶

schemauto¶

annotate-schema¶

annotate-using-jsonld¶

enrich-using-llm¶

enrich-using-ontology¶

generalize-htmltable¶

generalize-json¶

generalize-rdf¶

generalize-toml¶

generalize-tsv¶

generalize-tsvs¶

import-cadsr¶

import-dosdps¶

import-frictionless¶

import-htmltable¶

import-json-schema¶

import-kwalify¶

import-owl¶

import-rdfs¶

import-sql¶

Schema Automator

Navigation

Related Topics