Command Line¶
All Schema Automator functionality is available via the schemauto
command
Preamble¶
Warning
Previous versions had specific commands like tsv2linkml
these are now deprecated.
Instead these are now subcommands of the main schemauto
command, and have been renamed.
Note
we follow the CLIG guidelines as far as possible
Main commands¶
schemauto¶
Run the LinkML Schema Automator Command Line.
A subcommand must be passed, for example:
schemauto SUBCOMMAND [OPTIONS] ARGUMENTS
To see logging or debugging info, the verbosity flag should be specified BEFORE the subcommand:
schemauto -vv SUBCOMMAND [OPTIONS] ARGUMENTS
schemauto [OPTIONS] COMMAND [ARGS]...
Options
- -v, --verbose¶
Set the level of verbosity
- -q, --quiet <quiet>¶
Silence all diagnostics
annotate-schema¶
Annotate all elements of a schema.
This uses OAK (https://incatools.github.io/ontology-access-kit), and you can provide any OAK backend that supports text annotation.
At this time, the best choice is likely the bioportal backend
Example:
schemauto annotate-schema -i bioportal: my-schema.yaml -o annotated.yaml
This will require you setting the API key via OAK - see OAK docs.
You can specify a specific ontology
schemauto annotate-schema -i bioportal:ncbitaxon my-schema.yaml -o annotated.yaml
In future OAK will support a much wider variety of annotators including:
OLS
SciSpacy
NLTK
OGER
To see all possible selectors, see the OAK docs:
schemauto annotate-schema [OPTIONS] SCHEMA
Options
- --curie-only, --no-curie-only¶
if set, only use results that are mapped to CURIEs
- Default:
False
- -i, --input <input>¶
OAK input ontology selector
- -o, --output <output>¶
path to output file or directory.
Arguments
- SCHEMA¶
Required argument
annotate-using-jsonld¶
Annotates a schema using a Json-LD context file
schemauto annotate-using-jsonld [OPTIONS] SCHEMA
Options
- -o, --output <output>¶
path to output file or directory.
Arguments
- SCHEMA¶
Required argument
enrich-using-llm¶
Enrich a schema using an LLM.
Example:
schemauto enrich-using-llm -m gpt-4-turbo my-schema.yaml -o my-enriched.yaml
This will enrich the schema by adding missing description fields. In future other enrichments may be possible.
Note for this to work, you will need to have LLM installed as an extra.
Example:
pip install schema-automator[llm]
schemauto enrich-using-llm [OPTIONS] SCHEMA
Options
- -m, --model <model>¶
Name of model
- -o, --output <output>¶
path to output file or directory.
Arguments
- SCHEMA¶
Required argument
enrich-using-ontology¶
Enrich a schema using an ontology.
Here, “enrich” means copying over metadata from the ontology to the schema. For example, if the schema has a class “Gene” that is mapped to a SO class for “gene”, then calling this command will copy the SO class definition to the schema class.
This will use OAK to add additional metadata using uris and mappings in the schema.
See the OAK docs for options for which annotators to use; examples include:
bioportal: # (include the colon) any ontology in bioportal
bioportal:umls # a specific ontology in bioportal
my.obo # any local OBO file
sqlite:obo:cl # a specific OBO file or semsql registered ontology
For example, if your schema has a class with a mapping to a SO class, then the definition of that will be copied to the class description.
Example:
schemauto enrich-using-ontology -i bioportal: my-schema.yaml -o my-enriched.yaml
If your schema has no mappings you can use –annotate to add them
Example:
schemauto enrich-using-ontology -i so.obo –annotate my-schema.yaml -o my-enriched.yaml –annotate
schemauto enrich-using-ontology [OPTIONS] SCHEMA
Options
- -i, --input <input>¶
OAK input ontology selector
- --annotate, --no-annotate¶
If true, annotate the schema
- -o, --output <output>¶
path to output file or directory.
Arguments
- SCHEMA¶
Required argument
generalize-htmltable¶
Generalizes from a table parsed from a URL
Uses pandas/beautiful soup.
Note: if the website cannot be accessed directly, you can download the HTML and pass in an argument of the form file:///absolute/path/to/file.html
schemauto generalize-htmltable [OPTIONS] URL
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- -s, --column-separator <column_separator>¶
separator
- --downcase-header, --no-downcase-header¶
if true make headers lowercase
- --snakecase-header, --no-snakecase-header¶
if true make headers snakecase
- -E, --enum-columns <enum_columns>¶
column(s) that is forced to be an enum
- --enum-threshold <enum_threshold>¶
if the number of distinct values / rows is less than this, do not make an enum
- --max-enum-size <max_enum_size>¶
do not create an enum if more than max distinct members
- -c, --class-name <class_name>¶
Core class name in schema
- --pandera, --no-pandera¶
set to use panderas as inference engine
- --data-output <data_output>¶
Path to file of downloaded data
- --table-number <table_number>¶
If URL has multiple tables, use this one (zero-based)
- Default:
0
Arguments
- URL¶
Required argument
generalize-json¶
Generalizes from a JSON file to a schema
See Generalizers for more on the generalization framework
Example:
schemauto generalize-json my/data/persons.json -o my.yaml
schemauto generalize-json [OPTIONS] INPUT
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- --container-class-name <container_class_name>¶
name of root class
- -f, --format <format>¶
json or yaml (or json.gz or yaml.gz) or frontmatter
- -E, --enum-columns <enum_columns>¶
column(s) that is forced to be an enum
- --enum-mask-columns <enum_mask_columns>¶
column(s) that are excluded from being enums
- --max-enum-size <max_enum_size>¶
do not create an enum if more than max distinct members
- --enum-threshold <enum_threshold>¶
if the number of distinct values / rows is less than this, do not make an enum
- --omit-null, --no-omit-null¶
if true, ignore null values
- --inlined-map <inlined_map>¶
SLOT_NAME.KEY pairs indicating which slots are inlined as dict
- --depluralize, --no-depluralized¶
Auto-depluralize class names to singular form
- Default:
True
Arguments
- INPUT¶
Required argument
generalize-rdf¶
Generalizes from an RDF file to a schema
See Generalizers for more on the generalization framework
The input must be in turtle
Example:
schemauto generalize-rdf my/data/persons.ttl
schemauto generalize-rdf [OPTIONS] RDFFILE
Options
- -o, --output <output>¶
path to output file or directory.
- -d, --dir <dir>¶
Required
Arguments
- RDFFILE¶
Required argument
generalize-toml¶
Generalizes from a TOML file to a schema
See Generalizers for more on the generalization framework
Example:
schemauto generalize-toml my/data/conf.toml -o my.yaml
schemauto generalize-toml [OPTIONS] INPUT
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- --container-class-name <container_class_name>¶
name of root class
- -E, --enum-columns <enum_columns>¶
column(s) that is forced to be an enum
- --enum-mask-columns <enum_mask_columns>¶
column(s) that are excluded from being enums
- --max-enum-size <max_enum_size>¶
do not create an enum if more than max distinct members
- --enum-threshold <enum_threshold>¶
if the number of distinct values / rows is less than this, do not make an enum
- --omit-null, --no-omit-null¶
if true, ignore null values
Arguments
- INPUT¶
Required argument
generalize-tsv¶
Generalizes from a single TSV file to a single-class schema
See Generalizers for more on the generalization framework
Example:
schemauto generalize-tsv –class-name Person –schema-name PersonInfo my/data/persons.tsv
schemauto generalize-tsv [OPTIONS] TSVFILE
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- -A, --annotator <annotator>¶
name of annotator to use for auto-annotating results. Must be an OAK selector
- -c, --class-name <class_name>¶
Core class name in schema
- -s, --column-separator <column_separator>¶
separator
- --downcase-header, --no-downcase-header¶
if true make headers lowercase
- --snakecase-header, --no-snakecase-header¶
if true make headers snakecase
- -E, --enum-columns <enum_columns>¶
column(s) that is forced to be an enum
- --enum-threshold <enum_threshold>¶
if the number of distinct values / rows is less than this, do not make an enum
- --max-enum-size <max_enum_size>¶
do not create an enum if more than max distinct members
- --data-dictionary-row-count <data_dictionary_row_count>¶
rows that provide metadata about columns
- --robot, --no-robot¶
set if the TSV is a ROBOT template
- --pandera, --no-pandera¶
set to use panderas as inference engine
Arguments
- TSVFILE¶
Required argument
generalize-tsvs¶
Generalizes from a multiple TSV files to a multi-class schema
See Generalizers for more on the generalization framework
This uses CsvDataGeneralizer.convert_multiple
Example:
schemauto generalize-tsvs –class-name Person –schema-name PersonInfo my/data/*.tsv
schemauto generalize-tsvs [OPTIONS] [TSVFILES]...
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- -s, --column-separator <column_separator>¶
separator
- --downcase-header, --no-downcase-header¶
if true make headers lowercase
- --snakecase-header, --no-snakecase-header¶
if true make headers snakecase
- -E, --enum-columns <enum_columns>¶
column(s) that is forced to be an enum
- --enum-threshold <enum_threshold>¶
if the number of distinct values / rows is less than this, do not make an enum
- --max-enum-size <max_enum_size>¶
do not create an enum if more than max distinct members
- --robot, --no-robot¶
set if the TSV is a ROBOT template
Arguments
- TSVFILES¶
Optional argument(s)
import-cadsr¶
Imports from CADSR CDE JSON API output to LinkML
See Importers for more on the importer framework
Example:
schemauto import-cadsr “cdes/*.json”
schemauto import-cadsr [OPTIONS] INPUT
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- --schema-id <schema_id>¶
Schema id
Arguments
- INPUT¶
Required argument
import-dosdps¶
Imports DOSDP pattern YAML to a LinkML schema
See Importers for more on the importers framework
Example:
schemauto import-dosdps –range-as-enums patterns/*yaml -o my-schema.yaml
schemauto import-dosdps [OPTIONS] [DPFILES]...
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- --range-as-enums, --no-range-as-enums¶
Model range ontology classes as enums
Arguments
- DPFILES¶
Optional argument(s)
import-frictionless¶
Imports from Frictionless data package to LinkML
See Importers for more on the importer framework
Example:
schemauto import-frictionless cfde.package.json
schemauto import-frictionless [OPTIONS] INPUT
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- --schema-id <schema_id>¶
Schema id
Arguments
- INPUT¶
Required argument
import-htmltable¶
Imports from a table parsed from a URL using SchemaSheets
Uses pandas/beautiful soup
schemauto import-htmltable [OPTIONS] URL
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- -c, --class-name <class_name>¶
Core class name in schema
- --data-output <data_output>¶
Path to file of downloaded data
- --element-type <element_type>¶
E.g. class, enum
- --parent <parent>¶
parent ID
- --columns <columns>¶
Required comma-separated schemasheets descriptors of each column. Must be in same order
- --table-number <table_number>¶
If URL has multiple tables, use this one (zero-based)
- Default:
0
Arguments
- URL¶
Required argument
import-json-schema¶
Imports from JSON Schema to LinkML
See Importers for more on the importer framework
Example:
schemauto import-json-schema my/schema/personinfo.schema.json
schemauto import-json-schema [OPTIONS] INPUT
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- --use-attributes, --no-use-attributes¶
If true, use attributes over slots/slot_usage
- --is-openapi, --no-is-openapi¶
If true, use OpenAPI schema style
- Default:
False
- --import-project, --no-import-project¶
If true, then the input path should be a directory with multiple schema files
- -f, --format <format>¶
JSON Schema format - yaml or json
Arguments
- INPUT¶
Required argument
import-kwalify¶
Imports from Kwalify Schema to LinkML
See Importers for more on the importer framework
Example:
schemauto import-kwalify my/schema/personinfo.kwalify.yaml
schemauto import-kwalify [OPTIONS] INPUT
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- --use-attributes, --no-use-attributes¶
If true, use attributes over slots/slot_usage
Arguments
- INPUT¶
Required argument
import-owl¶
Import an OWL ontology to LinkML
- Note:
this works best for “schema-style” ontologies
input must be in functional syntax
See Importers for more on the importer framework
Example:
schemauto import-owl prov.ofn -o my.yaml
schemauto import-owl [OPTIONS] OWLFILE
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- -I, --identifier <identifier>¶
Slot to use as identifier
- --model-uri <model_uri>¶
Model URI prefix
- -o, --output <output>¶
Path to saved yaml schema
Arguments
- OWLFILE¶
Required argument
import-rdfs¶
Import an RDFS schema to LinkML
Example:
schemauto import-rdfs prov.rdfs.ttl -o prov.yaml
schemauto import-rdfs [OPTIONS] RDFSFILE
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
- -I, --input-type <input_type>¶
Input format, eg. turtle
- -I, --identifier <identifier>¶
Slot to use as identifier
- --model-uri <model_uri>¶
Model URI prefix
- --metamodel-mappings <metamodel_mappings>¶
Path to metamodel mappings YAML dictionary
- -o, --output <output>¶
Path to saved yaml schema
Arguments
- RDFSFILE¶
Required argument
import-sql¶
Imports a schema by introspecting a relational database
See Importers for more on the importers framework
schemauto import-sql [OPTIONS] DB
Options
- -o, --output <output>¶
path to output file or directory.
- -n, --schema-name <schema_name>¶
Schema name
- Default:
MySchema
Arguments
- DB¶
Required argument