Command Line

All Schema Automator functionality is available via the schemauto command

Preamble

Warning

Previous versions had specific commands like tsv2linkml these are now deprecated. Instead these are now subcommands of the main schemauto command, and have been renamed.

Note

we follow the CLIG guidelines as far as possible

Main commands

schemauto

Run the LinkML Schema Automator Command Line.

A subcommand must be passed, for example:

schemauto SUBCOMMAND [OPTIONS] ARGUMENTS

To see logging or debugging info, the verbosity flag should be specified BEFORE the subcommand:

schemauto -vv SUBCOMMAND [OPTIONS] ARGUMENTS

schemauto [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose

Set the level of verbosity

-q, --quiet <quiet>

Silence all diagnostics

annotate-schema

Annotate all elements of a schema.

This uses OAK (https://incatools.github.io/ontology-access-kit), and you can provide any OAK backend that supports text annotation.

At this time, the best choice is likely the bioportal backend

Example:

schemauto annotate-schema -i bioportal: my-schema.yaml -o annotated.yaml

This will require you setting the API key via OAK - see OAK docs.

You can specify a specific ontology

schemauto annotate-schema -i bioportal:ncbitaxon my-schema.yaml -o annotated.yaml

In future OAK will support a much wider variety of annotators including:

  • OLS

  • SciSpacy

  • NLTK

  • OGER

To see all possible selectors, see the OAK docs:

schemauto annotate-schema [OPTIONS] SCHEMA

Options

--curie-only, --no-curie-only

if set, only use results that are mapped to CURIEs

Default:

False

-i, --input <input>

OAK input ontology selector

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

annotate-using-jsonld

Annotates a schema using a Json-LD context file

schemauto annotate-using-jsonld [OPTIONS] SCHEMA

Options

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

enrich-using-llm

Enrich a schema using an LLM.

Example:

schemauto enrich-using-llm -m gpt-4-turbo my-schema.yaml -o my-enriched.yaml

This will enrich the schema by adding missing description fields. In future other enrichments may be possible.

Note for this to work, you will need to have LLM installed as an extra.

Example:

pip install schema-automator[llm]

schemauto enrich-using-llm [OPTIONS] SCHEMA

Options

-m, --model <model>

Name of model

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

enrich-using-ontology

Enrich a schema using an ontology.

Here, “enrich” means copying over metadata from the ontology to the schema. For example, if the schema has a class “Gene” that is mapped to a SO class for “gene”, then calling this command will copy the SO class definition to the schema class.

This will use OAK to add additional metadata using uris and mappings in the schema.

See the OAK docs for options for which annotators to use; examples include:

  • bioportal: # (include the colon) any ontology in bioportal

  • bioportal:umls # a specific ontology in bioportal

  • my.obo # any local OBO file

  • sqlite:obo:cl # a specific OBO file or semsql registered ontology

For example, if your schema has a class with a mapping to a SO class, then the definition of that will be copied to the class description.

Example:

schemauto enrich-using-ontology -i bioportal: my-schema.yaml -o my-enriched.yaml

If your schema has no mappings you can use –annotate to add them

Example:

schemauto enrich-using-ontology -i so.obo –annotate my-schema.yaml -o my-enriched.yaml –annotate

schemauto enrich-using-ontology [OPTIONS] SCHEMA

Options

-i, --input <input>

OAK input ontology selector

--annotate, --no-annotate

If true, annotate the schema

-o, --output <output>

path to output file or directory.

Arguments

SCHEMA

Required argument

generalize-htmltable

Generalizes from a table parsed from a URL

Uses pandas/beautiful soup.

Note: if the website cannot be accessed directly, you can download the HTML and pass in an argument of the form file:///absolute/path/to/file.html

schemauto generalize-htmltable [OPTIONS] URL

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

-s, --column-separator <column_separator>

separator

--downcase-header, --no-downcase-header

if true make headers lowercase

--snakecase-header, --no-snakecase-header

if true make headers snakecase

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

-c, --class-name <class_name>

Core class name in schema

--pandera, --no-pandera

set to use panderas as inference engine

--data-output <data_output>

Path to file of downloaded data

--table-number <table_number>

If URL has multiple tables, use this one (zero-based)

Default:

0

Arguments

URL

Required argument

generalize-json

Generalizes from a JSON file to a schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-json my/data/persons.json -o my.yaml

schemauto generalize-json [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

--container-class-name <container_class_name>

name of root class

-f, --format <format>

json or yaml (or json.gz or yaml.gz) or frontmatter

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-mask-columns <enum_mask_columns>

column(s) that are excluded from being enums

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--omit-null, --no-omit-null

if true, ignore null values

--inlined-map <inlined_map>

SLOT_NAME.KEY pairs indicating which slots are inlined as dict

--depluralize, --no-depluralized

Auto-depluralize class names to singular form

Default:

True

Arguments

INPUT

Required argument

generalize-rdf

Generalizes from an RDF file to a schema

See Generalizers for more on the generalization framework

The input must be in turtle

Example:

schemauto generalize-rdf my/data/persons.ttl

schemauto generalize-rdf [OPTIONS] RDFFILE

Options

-o, --output <output>

path to output file or directory.

-d, --dir <dir>

Required

Arguments

RDFFILE

Required argument

generalize-toml

Generalizes from a TOML file to a schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-toml my/data/conf.toml -o my.yaml

schemauto generalize-toml [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

--container-class-name <container_class_name>

name of root class

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-mask-columns <enum_mask_columns>

column(s) that are excluded from being enums

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--omit-null, --no-omit-null

if true, ignore null values

Arguments

INPUT

Required argument

generalize-tsv

Generalizes from a single TSV file to a single-class schema

See Generalizers for more on the generalization framework

Example:

schemauto generalize-tsv –class-name Person –schema-name PersonInfo my/data/persons.tsv

schemauto generalize-tsv [OPTIONS] TSVFILE

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

-A, --annotator <annotator>

name of annotator to use for auto-annotating results. Must be an OAK selector

-c, --class-name <class_name>

Core class name in schema

-s, --column-separator <column_separator>

separator

--downcase-header, --no-downcase-header

if true make headers lowercase

--snakecase-header, --no-snakecase-header

if true make headers snakecase

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--data-dictionary-row-count <data_dictionary_row_count>

rows that provide metadata about columns

--robot, --no-robot

set if the TSV is a ROBOT template

--pandera, --no-pandera

set to use panderas as inference engine

Arguments

TSVFILE

Required argument

generalize-tsvs

Generalizes from a multiple TSV files to a multi-class schema

See Generalizers for more on the generalization framework

This uses CsvDataGeneralizer.convert_multiple

Example:

schemauto generalize-tsvs –class-name Person –schema-name PersonInfo my/data/*.tsv

schemauto generalize-tsvs [OPTIONS] [TSVFILES]...

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

-s, --column-separator <column_separator>

separator

--downcase-header, --no-downcase-header

if true make headers lowercase

--snakecase-header, --no-snakecase-header

if true make headers snakecase

-E, --enum-columns <enum_columns>

column(s) that is forced to be an enum

--enum-threshold <enum_threshold>

if the number of distinct values / rows is less than this, do not make an enum

--max-enum-size <max_enum_size>

do not create an enum if more than max distinct members

--robot, --no-robot

set if the TSV is a ROBOT template

Arguments

TSVFILES

Optional argument(s)

import-cadsr

Imports from CADSR CDE JSON API output to LinkML

See Importers for more on the importer framework

Example:

schemauto import-cadsr “cdes/*.json”

schemauto import-cadsr [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

--schema-id <schema_id>

Schema id

Arguments

INPUT

Required argument

import-dosdps

Imports DOSDP pattern YAML to a LinkML schema

See Importers for more on the importers framework

Example:

schemauto import-dosdps –range-as-enums patterns/*yaml -o my-schema.yaml

schemauto import-dosdps [OPTIONS] [DPFILES]...

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

--range-as-enums, --no-range-as-enums

Model range ontology classes as enums

Arguments

DPFILES

Optional argument(s)

import-frictionless

Imports from Frictionless data package to LinkML

See Importers for more on the importer framework

Example:

schemauto import-frictionless cfde.package.json

schemauto import-frictionless [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

--schema-id <schema_id>

Schema id

Arguments

INPUT

Required argument

import-htmltable

Imports from a table parsed from a URL using SchemaSheets

Uses pandas/beautiful soup

schemauto import-htmltable [OPTIONS] URL

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

-c, --class-name <class_name>

Core class name in schema

--data-output <data_output>

Path to file of downloaded data

--element-type <element_type>

E.g. class, enum

--parent <parent>

parent ID

--columns <columns>

Required comma-separated schemasheets descriptors of each column. Must be in same order

--table-number <table_number>

If URL has multiple tables, use this one (zero-based)

Default:

0

Arguments

URL

Required argument

import-json-schema

Imports from JSON Schema to LinkML

See Importers for more on the importer framework

Example:

schemauto import-json-schema my/schema/personinfo.schema.json

schemauto import-json-schema [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

--use-attributes, --no-use-attributes

If true, use attributes over slots/slot_usage

--is-openapi, --no-is-openapi

If true, use OpenAPI schema style

Default:

False

--import-project, --no-import-project

If true, then the input path should be a directory with multiple schema files

-f, --format <format>

JSON Schema format - yaml or json

Arguments

INPUT

Required argument

import-kwalify

Imports from Kwalify Schema to LinkML

See Importers for more on the importer framework

Example:

schemauto import-kwalify my/schema/personinfo.kwalify.yaml

schemauto import-kwalify [OPTIONS] INPUT

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

--use-attributes, --no-use-attributes

If true, use attributes over slots/slot_usage

Arguments

INPUT

Required argument

import-owl

Import an OWL ontology to LinkML

Note:
  • this works best for “schema-style” ontologies

  • input must be in functional syntax

See Importers for more on the importer framework

Example:

schemauto import-owl prov.ofn -o my.yaml

schemauto import-owl [OPTIONS] OWLFILE

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

-I, --identifier <identifier>

Slot to use as identifier

--model-uri <model_uri>

Model URI prefix

-o, --output <output>

Path to saved yaml schema

Arguments

OWLFILE

Required argument

import-rdfs

Import an RDFS schema to LinkML

Example:

schemauto import-rdfs prov.rdfs.ttl -o prov.yaml

schemauto import-rdfs [OPTIONS] RDFSFILE

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

-I, --input-type <input_type>

Input format, eg. turtle

-I, --identifier <identifier>

Slot to use as identifier

--model-uri <model_uri>

Model URI prefix

--metamodel-mappings <metamodel_mappings>

Path to metamodel mappings YAML dictionary

-o, --output <output>

Path to saved yaml schema

Arguments

RDFSFILE

Required argument

import-sql

Imports a schema by introspecting a relational database

See Importers for more on the importers framework

schemauto import-sql [OPTIONS] DB

Options

-o, --output <output>

path to output file or directory.

-n, --schema-name <schema_name>

Schema name

Default:

MySchema

Arguments

DB

Required argument