Tutorial: Using the Command Line Interface
This tutorial walks through usage of LinkML-Store via the Command Line Interface (CLI)
This tutorial is a Jupyter notebook: it can be executed in a command line environment, or you can try it for yourself by running commands directly.
Note the %%bash
is a directive for Jupyter itself, you don’t need to type this
Top level command
The top level command is linkml-store
. This command doesn’t do anything itself, instead there are various subcommands.
The store command has a few global options to specify configuration/database/collection
linkml-store --help
Usage: linkml-store [OPTIONS] COMMAND [ARGS]...
A CLI for interacting with the linkml-store.
-d, --database TEXT Database name
-c, --collection TEXT Collection name
-i, --input TEXT Input file (alternative to
-C, --config PATH Path to the configuration file
--set TEXT Metadata settings in the form PATHEXPR=value
-v, --verbose
-q, --quiet / --no-quiet
-B, --base-dir TEXT Base directory for the client configuration
--stacktrace / --no-stacktrace If set then show full stacktrace on error
[default: no-stacktrace]
--help Show this message and exit.
apply Apply a patch to a collection.
describe Describe the collection schema.
diff Diffs two collectoons to create a patch.
export Exports a database to a standard dump format.
fq Query facets from the specified collection.
import Imports a database from a dump.
index Create an index over a collection.
indexes Show the indexes for a collection.
infer Predict a complete object from a partial object.
insert Insert objects from files (JSON, YAML, TSV) into the...
query Query objects from the specified collection.
schema Show the schema for a database
search Search objects in the specified collection.
store Store objects from files (JSON, YAML, TSV) into the...
validate Validate objects in the specified collection.
Inserting objects from a file
Next we’ll explore the insert
linkml-store --stacktrace insert --help
Usage: linkml-store insert [OPTIONS] [FILES]...
Insert objects from files (JSON, YAML, TSV) into the specified collection.
Using a configuration:
linkml-store -C config.yaml -c genes insert data/genes/*.json
Note: if you don't provide a schema this will be inferred, but it is usually
better to provide an explicit schema
-f, --format [json|jsonl|yaml|yamll|tsv|csv|python|parquet|formatted|table|duckdb|postgres|mongodb]
Input format
-i, --object TEXT Input object as YAML
--help Show this message and exit.
We’ll insert a small test file (in JSON Lines format) into a fresh database.
head ../../tests/input/countries/countries.jsonl
{"name": "United States", "code": "US", "capital": "Washington, D.C.", "continent": "North America", "languages": ["English"]}
{"name": "Canada", "code": "CA", "capital": "Ottawa", "continent": "North America", "languages": ["English", "French"]}
{"name": "Mexico", "code": "MX", "capital": "Mexico City", "continent": "North America", "languages": ["Spanish"]}
{"name": "Brazil", "code": "BR", "capital": "Brasília", "continent": "South America", "languages": ["Portuguese"]}
{"name": "Argentina", "code": "AR", "capital": "Buenos Aires", "continent": "South America", "languages": ["Spanish"]}
{"name": "United Kingdom", "code": "GB", "capital": "London", "continent": "Europe", "languages": ["English"]}
{"name": "France", "code": "FR", "capital": "Paris", "continent": "Europe", "languages": ["French"]}
{"name": "Germany", "code": "DE", "capital": "Berlin", "continent": "Europe", "languages": ["German"]}
{"name": "Italy", "code": "IT", "capital": "Rome", "continent": "Europe", "languages": ["Italian"]}
{"name": "Spain", "code": "ES", "capital": "Madrid", "continent": "Europe", "languages": ["Spanish"]}
To make sure we have a fresh setup, we’ll create a temporary directory tmp
(if it doesn’t already exist), and be sure to remove any copy of the database we intend to create.
We’ll then insert the objects:
mkdir -p tmp
rm -rf tmp/countries.db
linkml-store --database duckdb:///tmp/countries.db --collection countries insert ../../tests/input/countries/countries.jsonl
Inserted 20 objects from ../../tests/input/countries/countries.jsonl into collection 'countries'.
Note that the --database
and --collection
options come before the insert
With LinkML-Store, everything must go into a collection, so we specified countries
as the name
Next we’ll explore the query
linkml-store query --help
Usage: linkml-store query [OPTIONS]
Query objects from the specified collection.
Leave the query field blank to return all objects in the collection.
linkml-store -d duckdb:///countries.db -c countries query
Queries can be specified in YAML, as basic key-value pairs
linkml-store -d duckdb:///countries.db -c countries query -w 'code: NZ'
More complex queries can be specified using MongoDB-style query syntax
linkml-store -d file:. -c persons query -w 'occupation: {$ne:
Finds all people who are not architects.
-w, --where TEXT WHERE clause for the query, as YAML
-s, --select TEXT SELECT clause for the query, as YAML
-l, --limit INTEGER Maximum number of results to return
-O, --output-type [json|jsonl|yaml|yamll|tsv|csv|python|parquet|formatted|table|duckdb|postgres|mongodb]
Output format
-o, --output PATH Output file path
--help Show this message and exit.
Let’s query for all objects that have code="GB"
, and get the results back as a CSV. The argument for the --where
(or -w
) option is a YAML object with a MongoDB-style query.
linkml-store --database duckdb:///tmp/countries.db -c countries query -w "code: GB" -O table
| | name | code | capital | continent | languages |
| 0 | United Kingdom | GB | London | Europe | ['English'] |
We can get the output in different formats:
linkml-store --database duckdb:///tmp/countries.db -c countries query -w "code: GB" -O yaml
name: United Kingdom
code: GB
capital: London
continent: Europe
- English
Formats include csv, tsv, yaml, json, jsonl, table, formatted (a human-readable format)
Describing the data set
The describe
command gives a high-level overview of the data set:
linkml-store describe --help
Usage: linkml-store describe [OPTIONS]
Describe the collection schema.
-w, --where TEXT WHERE clause for the query
-O, --output-type [json|jsonl|yaml|yamll|tsv|csv|python|parquet|formatted|table|duckdb|postgres|mongodb]
Output format
-o, --output PATH Output file path
-l, --limit INTEGER Maximum number of results to return
[default: -1]
--help Show this message and exit.
Let’s try with the countries dataset:
linkml-store -d duckdb:///tmp/countries.db -c countries describe -O formatted
count unique top freq
capital 20 20 Washington, D.C. 1
code 20 20 US 1
continent 20 6 Europe 5
languages 20 15 [English] 4
name 20 20 United States 1
Note this command is more useful for numeric data…
Facet Counts
You can combine any query (including an empty query, for fetching the whole database) with a facet query which fetches counts for numbers of objects broken down by some specified slot or slots.
linkml-store fq --help
Usage: linkml-store fq [OPTIONS]
Query facets from the specified collection.
:param ctx: :param where: :param limit: :param columns: :param output_type:
:param output: :return:
-w, --where TEXT WHERE clause for the query
-l, --limit INTEGER Maximum number of results to return
-O, --output-type [json|jsonl|yaml|yamll|tsv|csv|python|parquet|formatted|table|duckdb|postgres|mongodb]
Output format
-o, --output PATH Output file path
-S, --columns TEXT Columns to facet on
-U, --wide / --no-wide, --no-U Wide table [default: no-wide]
--help Show this message and exit.
linkml-store -d duckdb:///tmp/countries.db -c countries fq -S continent
"continent": {
"Asia": 5,
"Europe": 5,
"Africa": 3,
"North America": 3,
"South America": 2,
"Oceania": 2
linkml-store --stacktrace -d duckdb:///tmp/countries.db -c countries fq -S continent,languages -O table
| | continent | languages |
| Europe | 5 | nan |
| Asia | 5 | nan |
| North America | 3 | nan |
| Africa | 3 | nan |
| South America | 2 | nan |
| Oceania | 2 | nan |
| English | nan | 8 |
| Spanish | nan | 3 |
| French | nan | 2 |
| Italian | nan | 1 |
| Standard Chinese | nan | 1 |
| Tswana | nan | 1 |
| Southern Sotho | nan | 1 |
| Portuguese | nan | 1 |
| Māori | nan | 1 |
| Xhosa | nan | 1 |
| Zulu | nan | 1 |
| Tsonga | nan | 1 |
| German | nan | 1 |
| Korean | nan | 1 |
| Northern Sotho | nan | 1 |
| Venda | nan | 1 |
| Southern Ndebele | nan | 1 |
| Hindi | nan | 1 |
| Swazi | nan | 1 |
| Japanese | nan | 1 |
| Indonesian | nan | 1 |
| Arabic | nan | 1 |
| Afrikaans | nan | 1 |
Remember this is a test dataset deliberately reduced so we don’t expect to see all countries there!
LinkML-Store is intended to allow for a flexible range of search strategies. Some of these may come from the underlying data store (for example, SOLr or ES is backed by Lucene indexing). Or they may be integrated orthogonally.
A key search mechanism that is supported is text embedding via Large Language Models (LLMs). Note these are not enabled by default.
Currently the default mechanism (which works regardless of the underlying store) is a highly naive trigram-based vector embedding. This requires no external model. It is intended primarily for demonstration purposes, and should be swapped out for something else.
Indexing a collection
First we will explore the index
linkml-store index --help
Usage: linkml-store index [OPTIONS]
Create an index over a collection.
By default a simple trigram index is used.
-t, --index-type TEXT Type of index to create. Values: simple, llm
[default: simple]
-E, --cached-embeddings-database TEXT
Path to the database where embeddings are
-T, --text-template TEXT Template for text embeddings
--help Show this message and exit.
Next we’ll make a (default) index
linkml-store -d duckdb:///tmp/countries.db -c countries index
Searching a collection using an index
Let’s explore the search
linkml-store search --help
Usage: linkml-store search [OPTIONS] SEARCH_TERM
Search objects in the specified collection.
-w, --where TEXT WHERE clause for the search
-l, --limit INTEGER Maximum number of search results
-O, --output-type [json|jsonl|yaml|yamll|tsv|csv|python|parquet|formatted|table|duckdb|postgres|mongodb]
Output format
-o, --output PATH Output file path
--auto-index / --no-auto-index Automatically index the collection
[default: no-auto-index]
-t, --index-type TEXT Type of index to create. Values: simple, llm
[default: simple]
--help Show this message and exit.
Now we’ll search for countries in the North where both English and French are spoken. We’ll pose this as a natural language query, but the default index is only picking up on trigram tokens in the strings.
linkml-store -d duckdb:///tmp/countries.db -c countries search "countries in the North where both english and french spoken" --limit 5 -O csv
0.15670402880167877,Canada,CA,Ottawa,North America,"['English', 'French']"
0.14806601565681218,South Africa,ZA,Pretoria,Africa,"['Zulu', 'Xhosa', 'Afrikaans', 'English', 'Northern Sotho', 'Tswana', 'Southern Sotho', 'Tsonga', 'Swazi', 'Venda', 'Southern Ndebele']"
0.13749236361227862,United States,US,"Washington, D.C.",North America,['English']
0.09860812114511587,Argentina,AR,Buenos Aires,South America,['Spanish']
0.09765536333140983,Mexico,MX,Mexico City,North America,['Spanish']
By default, all fields in the object are indexed. Canada comes out top as the strings for English and France are present (or rather trigrams from those words). But remember the default method is just for illustration!
Indexing using an LLM (OPTIONAL)
Note for this to work, you need to have installed this package with the llm
extra, like this:
pip install linkml-store[llm]
Or if you have this repo checked out and are using Poetry:
poetry install --all-extras
You will also need an OpenAI account.
If this is too much, you can just skip this section!
linkml-store -d duckdb:///tmp/countries.db -c countries index -t llm -E tmp/llm_countries_cache.db
linkml-store -d duckdb:///tmp/countries.db -c countries search -t llm "countries in the North where both english and french spoken" --limit 5 -O csv
0.7927589434263863,Canada,CA,Ottawa,North America,"['English', 'French']"
0.7546847140878102,United States,US,"Washington, D.C.",North America,['English']
0.741656789495497,United Kingdom,GB,London,Europe,['English']
The results are not particularly meaningful, but the idea is that this could be used in a RAG-style system.
Note in the above we did not explicitly specify a schema; instead it is induced.
We can use the schema
command to see the induced schema in LinkML YAML.
linkml-store -d duckdb:///tmp/countries.db schema
name: test-schema
id: http://example.org/test-schema
- linkml:types
prefix_prefix: linkml
prefix_reference: https://w3id.org/linkml/
prefix_prefix: test_schema
prefix_reference: http://example.org/test-schema/
default_prefix: test_schema
default_range: string
name: countries
name: name
range: string
required: false
multivalued: false
name: code
range: string
required: false
multivalued: false
name: capital
range: string
required: false
multivalued: false
name: continent
range: string
required: false
multivalued: false
name: languages
range: string
required: false
multivalued: true
name: internal__index__countries__llm
name: name
range: string
required: false
multivalued: false
name: code
range: string
required: false
multivalued: false
name: capital
range: string
required: false
multivalued: false
name: continent
range: string
required: false
multivalued: false
name: languages
range: string
required: false
multivalued: true
name: __index__
range: string
required: false
multivalued: true
name: internal__index__countries__simple
name: name
range: string
required: false
multivalued: false
name: code
range: string
required: false
multivalued: false
name: capital
range: string
required: false
multivalued: false
name: continent
range: string
required: false
multivalued: false
name: languages
range: string
required: false
multivalued: true
name: __index__
range: string
required: false
multivalued: true
Configuration Files and Explicit Schemas
Rather than repeat --database
and --collection
each time, we can make use of YAML config files.
These can also package useful information and schemas.
First we will create a fresh copy of a directory with both configuration files and schemas:
cp -pr ../../tests/input/countries tmp
rm tmp/countries/countries.db
The configuration YAML is fairly minimal - it specifies a single database with a single collection, and a pointer to a schema
cat tmp/countries/countries.config.yaml
handle: "duckdb:///{base_dir}/countries.db"
schema_location: "{base_dir}/countries.linkml.yaml"
type: Country
The schema itself is fairly basic - a single class (whose name matches the type
) in the configuration, with some slots. Note the slots have some constraints, e.g. regexps
cat tmp/countries/countries.linkml.yaml
id: https://example.org/countries
name: countries
description: A schema for representing countries
license: https://creativecommons.org/publicdomain/zero/1.0/
countries: https://example.org/countries/
linkml: https://w3id.org/linkml/
default_prefix: countries
default_range: string
- linkml:types
description: A sovereign state
- name
- code
- capital
- continent
- languages
- origin
- destination
- method
description: The name of the country
required: true
# identifier: true
description: The ISO 3166-1 alpha-2 code of the country
required: true
pattern: '^[A-Z]{2}$'
identifier: true
description: The capital city of the country
required: true
description: The continent where the country is located
required: true
description: The main languages spoken in the country
range: Language
multivalued: true
range: Country
range: Country
range: MethodEnum
typeof: string
description: A human language
linkml-store -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db schema
name: countries
description: A schema for representing countries
id: https://example.org/countries
- linkml:types
license: https://creativecommons.org/publicdomain/zero/1.0/
prefix_prefix: countries
prefix_reference: https://example.org/countries/
prefix_prefix: linkml
prefix_reference: https://w3id.org/linkml/
default_prefix: countries
default_range: string
name: Language
description: A human language
typeof: string
name: MethodEnum
text: rail
text: air
text: road
name: name
description: The name of the country
required: true
name: code
description: The ISO 3166-1 alpha-2 code of the country
identifier: true
required: true
pattern: ^[A-Z]{2}$
name: capital
description: The capital city of the country
required: true
name: continent
description: The continent where the country is located
required: true
name: languages
description: The main languages spoken in the country
range: Language
multivalued: true
name: origin
range: Country
name: destination
range: Country
name: method
range: MethodEnum
name: Country
description: A sovereign state
- name
- code
- capital
- continent
- languages
name: Route
- origin
- destination
- method
source_file: tmp/countries/countries.linkml.yaml
linkml-store -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db -c countries insert tmp/countries/countries.jsonl
Inserted 20 objects from tmp/countries/countries.jsonl into collection 'countries'.
linkml-store -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db list-collections
alias: countries
type: Country
additional_properties: null
attributes: null
indexers: null
hidden: false
is_prepopulated: false
source: null
derived_from: null
page_size: null
graph_projection: null
validate_modifications: false
linkml-store --stacktrace -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db -c countries query -w "code: GB"
"name": "United Kingdom",
"code": "GB",
"capital": "London",
"continent": "Europe",
"languages": [
LinkML-Store is designed to allow for rich validation, regardless of the underlying database store used.
For validation to work, we need to specify an explicit schema, as we have done with the configuration above.
To test it, we will insert some fake data:
linkml-store -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db insert --object '{name: Foolandia, code: "X Y", languages: ["Fooish"]}'
Inserted 3 objects from {name: Foolandia, code: "X Y", languages: ["Fooish"]} into collection 'countries'.
Let’s check that the data is there:
linkml-store -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db query -w 'name: Foolandia'
"name": "Foolandia",
"code": "X Y",
"capital": null,
"continent": null,
"languages": [
Note that by default, validation is deferred. You can insert whatever you like, and then validate later.
Other configurations may be more suited to your project, including strict/prospective validation.
Next let’s examine the schema:
linkml-store -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db schema
name: countries
description: A schema for representing countries
id: https://example.org/countries
- linkml:types
license: https://creativecommons.org/publicdomain/zero/1.0/
prefix_prefix: countries
prefix_reference: https://example.org/countries/
prefix_prefix: linkml
prefix_reference: https://w3id.org/linkml/
default_prefix: countries
default_range: string
name: Language
description: A human language
typeof: string
name: MethodEnum
text: rail
text: air
text: road
name: name
description: The name of the country
required: true
name: code
description: The ISO 3166-1 alpha-2 code of the country
identifier: true
required: true
pattern: ^[A-Z]{2}$
name: capital
description: The capital city of the country
required: true
name: continent
description: The continent where the country is located
required: true
name: languages
description: The main languages spoken in the country
range: Language
multivalued: true
name: origin
range: Country
name: destination
range: Country
name: method
range: MethodEnum
name: Country
description: A sovereign state
- name
- code
- capital
- continent
- languages
name: Route
- origin
- destination
- method
source_file: tmp/countries/countries.linkml.yaml
Run validation
Next we will run the validate
linkml-store -B tmp/countries -C tmp/countries/countries.config.yaml -d countries_db validate -O table
| | type | severity | message | instance | instance_index | instantiates | context |
| 0 | jsonschema validation | ERROR | 'X Y' does not match '^[A-Z]{2}$' in /code | {'name': 'Foolandia', 'code': 'X Y', 'languages': ['Fooish']} | 0 | Country | [] |
| 1 | jsonschema validation | ERROR | 'capital' is a required property in / | {'name': 'Foolandia', 'code': 'X Y', 'languages': ['Fooish']} | 0 | Country | [] |
| 2 | jsonschema validation | ERROR | 'continent' is a required property in / | {'name': 'Foolandia', 'code': 'X Y', 'languages': ['Fooish']} | 0 | Country | [] |
Here we can see 3 issues with the data we added:
the code doesn’t match the regexp we provided (it has a space)
the capital is missing
the continent is missing
LinkML implements the “CRUDSI” pattern: In addition to Create, Read, Update, Delete, we support Search, we also support I_nference_.
Inference is a procedure for filling in missing attribute values, or for correcting or repairing existing attribute values.
Different inference strategies include:
procedural or rule-based inference
projection or transformation of data
statistical inference or machine learning (ML), for example by inferring decision trees or regression models
inference using generative AI and Large Language Models (LLMs)
We will demonstrate the use of LLM inference, via the RAGInferenceEngine. This works by fetching the most relevant rows from the collection at the time of inference (based on supplied input), presenting these as example input-output pairs to the LLM, and then asking the LLM to complete the supplied input.
Our countries collection is (intentionally) incomplete. Let’s fill in some missing rows:
linkml-store -d duckdb:///tmp/countries.db -c countries infer -t rag -q 'name: Uruguay'
capital: Montevideo
code: UY
continent: South America
- Spanish
llm install llm-claude-3
Requirement already satisfied: llm-claude-3 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (0.4)
Requirement already satisfied: llm in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm-claude-3) (0.15)
Requirement already satisfied: anthropic>=0.17.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm-claude-3) (0.32.0)
Requirement already satisfied: anyio<5,>=3.5.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (4.4.0)
Requirement already satisfied: distro<2,>=1.7.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (1.9.0)
Requirement already satisfied: httpx<1,>=0.23.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (0.27.0)
Requirement already satisfied: jiter<1,>=0.4.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (0.5.0)
Requirement already satisfied: pydantic<3,>=1.9.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (2.8.2)
Requirement already satisfied: sniffio in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (1.3.1)
Requirement already satisfied: tokenizers>=0.13.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (0.19.1)
Requirement already satisfied: typing-extensions<5,>=4.7 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anthropic>=0.17.0->llm-claude-3) (4.12.2)
Requirement already satisfied: click in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (8.1.7)
Requirement already satisfied: openai>=1.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (1.40.1)
Requirement already satisfied: click-default-group>=1.2.3 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (1.2.4)
Requirement already satisfied: sqlite-utils>=3.37 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (3.37)
Requirement already satisfied: sqlite-migrate>=0.1a2 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (0.1b0)
Requirement already satisfied: PyYAML in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (6.0.2)
Requirement already satisfied: pluggy in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (1.5.0)
Requirement already satisfied: python-ulid in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (2.7.0)
Requirement already satisfied: setuptools in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (72.1.0)
Requirement already satisfied: pip in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from llm->llm-claude-3) (24.2)
Requirement already satisfied: idna>=2.8 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anyio<5,>=3.5.0->anthropic>=0.17.0->llm-claude-3) (3.7)
Requirement already satisfied: exceptiongroup>=1.0.2 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from anyio<5,>=3.5.0->anthropic>=0.17.0->llm-claude-3) (1.2.2)
Requirement already satisfied: certifi in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from httpx<1,>=0.23.0->anthropic>=0.17.0->llm-claude-3) (2024.7.4)
Requirement already satisfied: httpcore==1.* in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from httpx<1,>=0.23.0->anthropic>=0.17.0->llm-claude-3) (1.0.5)
Requirement already satisfied: h11<0.15,>=0.13 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->anthropic>=0.17.0->llm-claude-3) (0.14.0)
Requirement already satisfied: tqdm>4 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from openai>=1.0->llm->llm-claude-3) (4.66.5)
Requirement already satisfied: annotated-types>=0.4.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from pydantic<3,>=1.9.0->anthropic>=0.17.0->llm-claude-3) (0.7.0)
Requirement already satisfied: pydantic-core==2.20.1 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from pydantic<3,>=1.9.0->anthropic>=0.17.0->llm-claude-3) (2.20.1)
Requirement already satisfied: sqlite-fts4 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from sqlite-utils>=3.37->llm->llm-claude-3) (1.0.3)
Requirement already satisfied: tabulate in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from sqlite-utils>=3.37->llm->llm-claude-3) (0.9.0)
Requirement already satisfied: python-dateutil in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from sqlite-utils>=3.37->llm->llm-claude-3) (2.9.0.post0)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from tokenizers>=0.13.0->anthropic>=0.17.0->llm-claude-3) (0.24.5)
Requirement already satisfied: filelock in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.0->anthropic>=0.17.0->llm-claude-3) (3.15.4)
Requirement already satisfied: fsspec>=2023.5.0 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.0->anthropic>=0.17.0->llm-claude-3) (2024.6.1)
Requirement already satisfied: packaging>=20.9 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.0->anthropic>=0.17.0->llm-claude-3) (24.1)
Requirement already satisfied: requests in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.0->anthropic>=0.17.0->llm-claude-3) (2.32.3)
Requirement already satisfied: six>=1.5 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from python-dateutil->sqlite-utils>=3.37->llm->llm-claude-3) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from requests->huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.0->anthropic>=0.17.0->llm-claude-3) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages (from requests->huggingface-hub<1.0,>=0.16.4->tokenizers>=0.13.0->anthropic>=0.17.0->llm-claude-3) (2.2.2)
linkml-store -d duckdb:///tmp/countries.db -c countries infer -t rag:llm_config.model_name=claude-3-opus -q 'name: Uruguay'
capital: Montevideo
code: UY
continent: South America
- Spanish
We can also restrict the predictions to a specific attribute:
linkml-store -d duckdb:///tmp/countries.db -c countries infer -t rag -q 'name: Uruguay' -T continent
continent: South America
Note that LLMs are particularly suited to this kind of inference, when we supply an out of distribution (in our existing collection); we are relying on pre-trained knowledge in the model.
This is not expected to work with a traditional ML model - in this case it will complain that it has no data on the provided feature column:
linkml-store -d duckdb:///tmp/countries.db -c countries infer -t sklearn -T continent -q 'name: Uruguay' || echo "Failed as expected"
KeyError: 'Uruguay'
During handling of the above exception, another exception occurred:
ValueError: y contains previously unseen labels: 'Uruguay'
Failed as expected
Inference using statistical models
A more appropriate dataset for a traditional ML model would be the Iris dataset. Let’s first explore it:
linkml-store -i ../../tests/input/iris.jsonl describe
count unique top freq mean std min 25% 50% 75% max
petal_length 100.0 NaN NaN NaN 2.861 1.449549 1.0 1.5 2.45 4.325 5.1
petal_width 100.0 NaN NaN NaN 0.786 0.565153 0.1 0.2 0.8 1.3 1.8
sepal_length 100.0 NaN NaN NaN 5.471 0.641698 4.3 5.0 5.4 5.9 7.0
sepal_width 100.0 NaN NaN NaN 3.099 0.478739 2.0 2.8 3.05 3.4 4.4
species 100 2 setosa 50 NaN NaN NaN NaN NaN NaN NaN
linkml-store -i ../../tests/input/iris.jsonl infer -t sklearn -q '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
species: setosa
[ ]: