API

class Client(handle=None, metadata=None)[source]

Bases: object

A client is the top-level object for interacting with databases.

  • A client has access to one or more Database objects.

  • Each database consists of a number of Collection objects.

Creating a client

>>> client = Client()

Attaching a database

>>> db = client.attach_database("duckdb", alias="test")

Note that normally a handle would be specified by a locator such as duckdb:///<PATH>, but for convenience, an in-memory duckdb object can be specified without a full locator

We can check the actual handle:

>>> db.handle
'duckdb:///:memory:'

Creating a new collection

>>> collection = db.create_collection("Person")
>>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}]
>>> collection.insert(objs)
>>> qr = collection.find()
>>> len(qr.rows)
2
>>> qr.rows[0]["id"]
'P1'
>>> qr.rows[1]["name"]
'Alice'
>>> qr = collection.find({"name": "John"})
>>> len(qr.rows)
1
>>> qr.rows[0]["name"]
'John'
__init__(handle=None, metadata=None)[source]

Initialize a client.

Parameters:
  • handle (Optional[str])

  • metadata (Optional[ClientConfig])

metadata: Optional[ClientConfig] = None
property handle: str | None
property base_dir: str | None

Get the base directory for the client.

Wraps metadata.base_dir.

Returns:

from_config(config, base_dir=None, auto_attach=False, **kwargs)[source]

Create a client from a configuration.

Examples

>>> from linkml_store.api.config import ClientConfig
>>> client = Client().from_config(ClientConfig(databases={"test": {"handle": "duckdb:///:memory:"}}))
>>> len(client.databases)
0
>>> client = Client().from_config(ClientConfig(databases={"test": {"handle": "duckdb:///:memory:"}}),
...                                auto_attach=True)
>>> len(client.databases)
1
>>> "test" in client.databases
True
>>> client.databases["test"].handle
'duckdb:///:memory:'
type config:

Union[ClientConfig, dict, str, Path]

param config:

type base_dir:

param base_dir:

type auto_attach:

param auto_attach:

type kwargs:

param kwargs:

return:

attach_database(handle, alias=None, schema_view=None, recreate_if_exists=False, **kwargs)[source]

Associate a database with a handle.

Examples

>>> client = Client()
>>> db = client.attach_database("duckdb", alias="memory")
>>> "memory" in client.databases
True
>>> db = client.attach_database("duckdb:///tmp/another.db", alias="disk")
>>> len(client.databases)
2
>>> "disk" in client.databases
True
type handle:

str

param handle:

handle for the database, e.g. duckdb:///foo.db

type alias:

Optional[str]

param alias:

alias for the database, e.g foo

type schema_view:

Optional[SchemaView]

param schema_view:

schema view to associate with the database

type kwargs:

param kwargs:

rtype:

Database

return:

get_database(name=None, create_if_not_exists=True, **kwargs)[source]

Get a named database.

Examples

>>> client = Client()
>>> db = client.attach_database("duckdb:///test.db", alias="test")
>>> retrieved_db = client.get_database("test")
>>> db == retrieved_db
True
type name:

Optional[str]

param name:

if None, there must be a single database attached

type create_if_not_exists:

param create_if_not_exists:

type kwargs:

param kwargs:

rtype:

Database

return:

property databases: Dict[str, Database]

Return all attached databases

Examples

>>> client = Client()
>>> _ = client.attach_database("duckdb", alias="test1")
>>> _ = client.attach_database("duckdb", alias="test2")
>>> len(client.databases)
2
>>> "test1" in client.databases
True
>>> "test2" in client.databases
True
>>> client.databases["test1"].handle
'duckdb:///:memory:'
>>> client.databases["test2"].handle
'duckdb:///:memory:'
Returns:

drop_database(name, missing_ok=False, **kwargs)[source]

Drop a database.

Example (in-memory):

>>> client = Client()
>>> db1 = client.attach_database("duckdb", alias="test1")
>>> db2 = client.attach_database("duckdb", alias="test2")
>>> len(client.databases)
2
>>> client.drop_database("test1")
>>> len(client.databases)
1

Databases that persist on disk:

>>> client = Client()
>>> path = Path("tmp/test.db")
>>> path.parent.mkdir(parents=True, exist_ok=True)
>>> db = client.attach_database(f"duckdb:///{path}", alias="test")
>>> len(client.databases)
1
>>> db.store({"persons": [{"id": "P1", "name": "John"}]})
>>> db.commit()
>>> Path("tmp/test.db").exists()
True
>>> client.drop_database("test")
>>> len(client.databases)
0
>>> Path("tmp/test.db").exists()
False

Dropping a non-existent database:

>>> client = Client()
>>> client.drop_database("duckdb:///tmp/made-up1", missing_ok=True)
>>> client.drop_database("duckdb:///tmp/made-up2", missing_ok=False)
Traceback (most recent call last):
...
ValueError: Database duckdb:///tmp/made-up2 not found
Parameters:
  • name (str)

  • missing_ok

Returns:

drop_all_databases(**kwargs)[source]

Drop all databases.

Example (in-memory):

>>> client = Client()
>>> db1 = client.attach_database("duckdb", alias="test1")
>>> assert "test1" in client.databases
>>> db2 = client.attach_database("duckdb", alias="test2")
>>> assert "test2" in client.databases
>>> client.drop_all_databases()
>>> len(client.databases)
0
Parameters:

missing_ok

Returns:

class Database(handle=None, metadata=None, **kwargs)[source]

Bases: ABC, Generic[CollectionType]

A Database provides access to named collections of data.

A database object is owned by a Client. The database object uses a handle to know what kind of external dataase system to connect to (e.g. duckdb, mongodb). The handle is a string <DatabaseType>:<LocalLocator>

The database object may also have an alias that is mapped to the handle.

Attaching a database

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb:///:memory:", alias="test")

We can check the value of the handle:

>>> db.handle
'duckdb:///:memory:'

The alias can be used to retrieve the database object from the client

>>> assert db == client.get_database("test")

Creating a collection

>>> collection = db.create_collection("Person")
>>> len(db.list_collections())
1
>>> db.get_collection("Person") == collection
True
>>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}]
>>> collection.insert(objs)
>>> qr = collection.find()
>>> len(qr.rows)
2
>>> qr.rows[0]["id"]
'P1'
>>> qr.rows[1]["name"]
'Alice'
>>> qr = collection.find({"name": "John"})
>>> len(qr.rows)
1
>>> qr.rows[0]["name"]
'John'
parent: Optional[Client] = None
collection_class: ClassVar[Optional[Type[Collection]]] = None
listeners: Optional[List[Callable[[Collection, List[PatchDict]], None]]] = None
__init__(handle=None, metadata=None, **kwargs)[source]
metadata: Optional[DatabaseConfig] = None
from_config(db_config, **kwargs)[source]

Initialize a database from a configuration.

TODO: DEPRECATE

Parameters:
  • db_config (DatabaseConfig) – database configuration

  • kwargs – additional arguments

property recreate_if_exists: bool

Return whether to recreate the database if it already exists.

Returns:

property handle: str

Return the database handle.

Examples:

  • duckdb:///:memory:

  • duckdb:///tmp/test.db

  • mongodb://localhost:27017/

Returns:

property alias
store(obj, **kwargs)[source]

Store an object in the database.

The object is assumed to be a Dictionary of Collections.

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> db.store({"persons": [{"id": "P1", "name": "John", "age_in_years": 30}]})
>>> collection = db.get_collection("persons")
>>> qr = collection.find()
>>> qr.num_rows
1
Parameters:
  • obj (Dict[str, Any]) – object to store

  • kwargs – additional arguments

commit(**kwargs)[source]

Commit pending changes to the database.

Parameters:

kwargs

Returns:

close(**kwargs)[source]

Close the database.

Parameters:

kwargs

Returns:

create_collection(name, alias=None, metadata=None, recreate_if_exists=False, **kwargs)[source]

Create a new collection in the current database.

The collection must have a Type, and may have an Alias.

Examples:

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person", alias="persons")
>>> collection.alias
'persons'
>>> collection.target_class_name
'Person'

If alias is not provided, it defaults to the name of the type.

>>> collection = db.create_collection("Organization")
>>> collection.alias
'Organization'
Parameters:
  • name (str) – name of the collection

  • alias (Optional[str]) – alias for the collection

  • metadata (Optional[CollectionConfig]) – metadata for the collection

  • recreate_if_exists – recreate the collection if it already exists

  • kwargs – additional arguments

Return type:

Collection

list_collections(include_internal=False)[source]

List all collections.

Examples

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> c1 = db.create_collection("Person")
>>> c2 = db.create_collection("Product")
>>> collections = db.list_collections()
>>> len(collections)
2
>>> [c.target_class_name for c in collections]
['Person', 'Product']
type include_internal:

param include_internal:

include internal collections

rtype:

Sequence[Collection]

return:

list of collections

list_collection_names(**kwargs)[source]

List all collection names.

Return type:

Sequence[str]

Examples

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> c1 = db.create_collection("Person")
>>> c2 = db.create_collection("Product")
>>> collection_names = db.list_collection_names()
>>> len(collection_names)
2
>>> collection_names
['Person', 'Product']
get_collection(name, type=None, create_if_not_exists=True, **kwargs)[source]

Get a named collection.

Return type:

Collection

Examples

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> db.get_collection("Person") == collection
True
>>> db.get_collection("NonExistent", create_if_not_exists=False)
Traceback (most recent call last):
    ...
KeyError: 'Collection NonExistent does not exist'
type name:

str

param name:

name of the collection

type type:

Optional[str]

param type:

target class name

type create_if_not_exists:

param create_if_not_exists:

create the collection if it does not exist

init_collections()[source]

Initialize collections.

TODO: Not typically called directly: consider making this private :return:

query(query, **kwargs)[source]

Run a query against the database.

Examples

>>> from linkml_store.api.client import Client
>>> from linkml_store.api.queries import Query
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> collection.insert([{"id": "P1", "name": "John"}, {"id": "P2", "name": "Alice"}])
>>> query = Query(from_table="Person", where_clause={"name": "John"})
>>> result = db.query(query)
>>> len(result.rows)
1
>>> result.rows[0]["id"]
'P1'
type query:

Query

param query:

type kwargs:

param kwargs:

rtype:

QueryResult

return:

property schema_view: SchemaView

Return a schema view for the named collection.

If no explicit schema is provided, this will generalize one

Induced schema example:

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person", alias="persons")
>>> collection.insert([{"id": "P1", "name": "John", "age_in_years": 25}])
>>> schema_view = db.schema_view
>>> cd = schema_view.get_class("Person")
>>> cd.attributes["id"].range
'string'
>>> cd.attributes["age_in_years"].range
'integer'

We can reuse the same class:

>>> collection2 = db.create_collection("Person", alias="other_persons")
>>> collection2.class_definition().attributes["age_in_years"].range
'integer'
set_schema_view(schema_view)[source]

Set the schema view for the database.

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> sv = SchemaView("tests/input/countries/countries.linkml.yaml")
>>> db.set_schema_view(sv)
>>> cd = db.schema_view.schema.classes["Country"]
>>> sorted(cd.slots)
['capital', 'code', 'continent', 'languages', 'name']
>>> induced_slots = {s.name: s for s in sv.class_induced_slots("Country")}
>>> sorted(induced_slots.keys())
['capital', 'code', 'continent', 'languages', 'name']
>>> induced_slots["code"].identifier
True

Creating a new collection will align with the schema view:

>>> collection = db.create_collection("Country", "all_countries")
>>> sorted(collection.class_definition().slots)
['capital', 'code', 'continent', 'languages', 'name']
Parameters:

schema_view (Union[str, Path, SchemaView]) – can be either a path to the schema, or a SchemaView object

Returns:

load_schema_view(path)[source]

Load a schema view from a file.

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> db.load_schema_view("tests/input/countries/countries.linkml.yaml")
>>> sv = db.schema_view
>>> cd = sv.schema.classes["Country"]
>>> sorted(cd.slots)
['capital', 'code', 'continent', 'languages', 'name']
>>> induced_slots = {s.name: s for s in sv.class_induced_slots("Country")}
>>> sorted(induced_slots.keys())
['capital', 'code', 'continent', 'languages', 'name']
>>> induced_slots["code"].identifier
True

Creating a new collection will align with the schema view:

>>> collection = db.create_collection("Country", "all_countries")
>>> sorted(collection.class_definition().slots)
['capital', 'code', 'continent', 'languages', 'name']
Parameters:

path (Union[str, Path])

Returns:

induce_schema_view()[source]

Induce a schema view from a schema definition.

>>> from linkml_store.api.client import Client
>>> from linkml_store.api.queries import Query
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> collection.insert([{"id": "P1", "name": "John", "age_in_years": 25},
...                 {"id": "P2", "name": "Alice", "age_in_years": 25}])
>>> schema_view = db.induce_schema_view()
>>> cd = schema_view.get_class("Person")
>>> cd.attributes["id"].range
'string'
>>> cd.attributes["age_in_years"].range
'integer'
Return type:

SchemaView

Returns:

A schema view

iter_validate_database(**kwargs)[source]

Validate the contents of the database.

An an example, let’s create a database with a predefined schema from the countries.linkml.yaml file:

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> db.load_schema_view("tests/input/countries/countries.linkml.yaml")

Let’s introspect the schema to see what slots are applicable for the class “Country”:

>>> sv = db.schema_view
>>> for slot in sv.class_induced_slots("Country"):
...     print(slot.name, slot.range, slot.required)
name string True
code string True
capital string True
continent string True
languages Language None

Next we’ll create a collection, binding it to the target class “Country”, and insert valid data:

>>> collection = db.create_collection("Country", "all_countries")
>>> obj = {"code": "US", "name": "United States", "continent": "North America", "capital": "Washington, D.C."}
>>> collection.insert([obj])
>>> list(db.iter_validate_database())
[]

Now let’s insert some invalid data (missing required fields)

>>> collection.insert([{"code": "FR", "name": "France"}])
>>> for r in db.iter_validate_database():
...    print(r.message[0:32])
'capital' is a required property
'continent' is a required proper
Parameters:

kwargs

Return type:

Iterator[ValidationResult]

Returns:

iterator over validation results

drop(**kwargs)[source]

Drop the database and all collections.

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> path = Path("/tmp/test.db")
>>> path.parent.mkdir(exist_ok=True, parents=True)
>>> db = client.attach_database(f"duckdb:///{path}")
>>> db.store({"persons": [{"id": "P1", "name": "John", "age_in_years": 30}]})
>>> coll = db.get_collection("persons")
>>> coll.find({}).num_rows
1
>>> db.drop()
>>> db = client.attach_database("duckdb:///tmp/test.db", alias="test")
>>> coll = db.get_collection("persons")
>>> coll.find({}).num_rows
0
Parameters:

kwargs – additional arguments

import_database(location, source_format=None, collection_name=None, **kwargs)[source]

Import a database from a file or location.

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> db.import_database("tests/input/iris.csv", Format.CSV, collection_name="iris")
>>> db.list_collection_names()
['iris']
>>> collection = db.get_collection("iris")
>>> collection.find({}).num_rows
150
Parameters:
  • location (str) – location of the file

  • source_format (Union[str, Format, None]) – source format

  • collection_name (Optional[str]) – (Optional) name of the collection, for data that is flat

  • kwargs – additional arguments

export_database(location, target_format=None, **kwargs)[source]

Export a database to a file or location.

>>> from linkml_store.api.client import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> db.import_database("tests/input/iris.csv", Format.CSV, collection_name="iris")
>>> db.export_database("/tmp/iris.yaml", Format.YAML)
Parameters:
  • location (str) – location of the file

  • target_format (Union[str, Format, None]) – target format

  • kwargs – additional arguments

broadcast(source, patches)[source]
class Collection(name, parent=None, metadata=None, **kwargs)[source]

Bases: Generic[DatabaseType]

A collection is an organized set of objects of the same or similar type.

  • For relational databases, a collection is typically a table

  • For document databases such as MongoDB, a collection is the native type

  • For a file system, a collection could be a single tabular file such as Parquet or CSV.

Collection objects are typically not created directly - instead they are generated from a parent Database object:

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}]
>>> collection.insert(objs)
default_index_name: ClassVar[str] = 'simple'
__init__(name, parent=None, metadata=None, **kwargs)[source]
parent: Optional[TypeVar(DatabaseType, bound= Database)] = None
metadata: Optional[CollectionConfig] = None
property hidden: bool

True if the collection is hidden.

An example of a hidden collection is a collection that indexes another collection

Returns:

True if the collection is hidden

property target_class_name

Return the name of the class that this collection represents

This MUST be a LinkML class name

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person", alias="persons")
>>> collection.target_class_name
'Person'
>>> collection = db.create_collection("Organization")
>>> collection.target_class_name
'Organization'
>>> collection.alias
'Organization'
Returns:

name of the class which members of this collection instantiate

property alias

Return the primary name/alias used for the collection.

This MAY be the name of the LinkML class, but it may be desirable to have an alias, for example “persons” which collects all instances of class Person.

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person", alias="persons")
>>> collection.alias
'persons'

If no explicit alias is provided, then the target class name is used:

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> collection.alias
'Person'

The alias SHOULD be used for Table names in SQL.

For nested data, the alias SHOULD be used as the key; e.g

{ "persons": [ { "name": "Alice" }, { "name": "Bob" } ] }
Returns:

replace(objs, **kwargs)[source]

Replace entire collection with objects.

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}]
>>> collection.insert(objs)
Parameters:
  • objs (Union[Dict[str, Any], BaseModel, Type, List[Union[Dict[str, Any], BaseModel, Type]]])

  • kwargs

Returns:

insert(objs, **kwargs)[source]

Add one or more objects to the collection.

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}]
>>> collection.insert(objs)
Parameters:
  • objs (Union[Dict[str, Any], BaseModel, Type, List[Union[Dict[str, Any], BaseModel, Type]]])

  • kwargs

Returns:

delete(objs, **kwargs)[source]

Delete one or more objects from the collection.

First let’s set up a collection:

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}]
>>> collection.insert(objs)
>>> collection.find({}).num_rows
2

Now let’s delete an object:

>>> collection.delete(objs[0])
>>> collection.find({}).num_rows
1

Deleting the same object again should have no effect:

>>> collection.delete(objs[0])
>>> collection.find({}).num_rows
1
Parameters:
  • objs (Union[Dict[str, Any], BaseModel, Type, List[Union[Dict[str, Any], BaseModel, Type]]])

  • kwargs

Return type:

Optional[int]

Returns:

delete_where(where=None, missing_ok=True, **kwargs)[source]

Delete objects that match a query.

First let’s set up a collection:

>>> from linkml_store import Client
>>> client = Client()
>>> db = client.attach_database("duckdb", alias="test")
>>> collection = db.create_collection("Person")
>>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}]
>>> collection.insert(objs)

Now let’s delete an object:

>>> collection.delete_where({"id": "P1"})
>>> collection.find({}).num_rows
1

Match everything:

>>> collection.delete_where({})
>>> collection.find({}).num_rows
0
Parameters:
  • where (Optional[Dict[str, Any]]) – where conditions

  • missing_ok – if True, do not raise an error if the collection does not exist

  • kwargs

Return type:

Optional[int]

Returns:

number of objects deleted (or -1 if unsupported)

update(objs, **kwargs)[source]

Update one or more objects in the collection.

Parameters:
  • objs (Union[Dict[str, Any], BaseModel, Type, List[Union[Dict[str, Any], BaseModel, Type]]])

  • kwargs

Returns:

query(query, **kwargs)[source]

Run a query against the collection.

First let’s load a collection:

>>> from linkml_store import Client
>>> from linkml_store.utils.format_utils import load_objects
>>> client = Client()
>>> db = client.attach_database("duckdb")
>>> collection = db.create_collection("Country")
>>> objs = load_objects("tests/input/countries/countries.jsonl")
>>> collection.insert(objs)

Now let’s run a query:

TODO

Parameters:
  • query (Query)

  • kwargs

Return type:

QueryResult

Returns:

query_facets(where=None, facet_columns=None, facet_limit=100, **kwargs)[source]

Run a query to get facet counts for one or more columns.

This function takes a database connection, a Query object, and a list of column names. It generates and executes a facet count query for each specified column and returns the results as a dictionary where the keys are the column names and the values are pandas DataFrames containing the facet counts.

The facet count query is generated by modifying the original query’s WHERE clause to exclude conditions directly related to the facet column. This allows for counting the occurrences of each unique value in the facet column while still applying the other filtering conditions.

Parameters:
  • con – A DuckDB database connection.

  • query – A Query object representing the base query.

  • facet_columns (Optional[List[str]]) – A list of column names to get facet counts for.

  • facet_limit

Return type:

Dict[str, List[Tuple[Any, int]]]

Returns:

A dictionary where keys are column names and values are tuples containing the facet counts for each unique value in the respective column.

get(ids, **kwargs)[source]

Get one or more objects by ID.

Parameters:
  • ids (Optional[List[str]])

  • kwargs

Return type:

QueryResult

Returns:

get_one(id, **kwargs)[source]

Get one object by ID.

Parameters:
  • id (str)

  • kwargs

Return type:

Union[Dict[str, Any], BaseModel, Type, None]

Returns:

find(where=None, **kwargs)[source]

Find objects in the collection using a where query.

As an example, first load a collection:

>>> from linkml_store import Client
>>> from linkml_store.utils.format_utils import load_objects
>>> client = Client()
>>> db = client.attach_database("duckdb")
>>> collection = db.create_collection("Country")
>>> objs = load_objects("tests/input/countries/countries.jsonl")
>>> collection.insert(objs)

Now let’s find all objects:

>>> qr = collection.find({})
>>> qr.num_rows
20

We can do a more restrictive query:

>>> qr = collection.find({"code": "FR"})
>>> qr.num_rows
1
>>> qr.rows[0]["name"]
'France'
Parameters:
  • where (Optional[Any])

  • kwargs

Return type:

QueryResult

Returns:

find_iter(where=None, page_size=100, **kwargs)[source]

Find objects in the collection using a where query.

Parameters:
  • where (Optional[Any])

  • kwargs

Return type:

Iterator[Union[Dict[str, Any], BaseModel, Type]]

Returns:

search(query, where=None, index_name=None, limit=None, mmr_relevance_factor=None, **kwargs)[source]

Search the collection using a text-based index index.

Example:

>>> from linkml_store import Client
>>> from linkml_store.utils.format_utils import load_objects
>>> client = Client()
>>> db = client.attach_database("duckdb")
>>> collection = db.create_collection("Country")
>>> objs = load_objects("tests/input/countries/countries.jsonl")
>>> collection.insert(objs)

Now let’s index, using the simple trigram-based index

>>> index = get_indexer("simple")
>>> _ = collection.attach_indexer(index)

Now let’s find all objects:

>>> qr = collection.search("France")
>>> score, top_obj = qr.ranked_rows[0]
>>> assert score > 0.1
>>> top_obj["code"]
'FR'
Parameters:
  • query (str)

  • where (Optional[Any])

  • index_name (Optional[str])

  • limit (Optional[int])

  • kwargs

Return type:

QueryResult

Returns:

property is_internal: bool

Check if the collection is internal.

Internal collections are hidden by default. Examples of internal collections include shadow “index” collections

Returns:

exists()[source]

Check if the collection exists.

Return type:

Optional[bool]

Returns:

load_from_source(load_if_exists=False)[source]

Load objects from the source location.

Parameters:

load_if_exists

Returns:

size()[source]

Return the number of objects in the collection.

Return type:

int

Returns:

The number of objects in the collection.

rows_iter()[source]

Return an iterator over the objects in the collection.

Return type:

Iterable[Union[Dict[str, Any], BaseModel, Type]]

Returns:

rows()[source]

Return a list of objects in the collection.

Return type:

List[Union[Dict[str, Any], BaseModel, Type]]

Returns:

ranked_rows()[source]

Return a list of objects in the collection, with scores.

Return type:

List[Tuple[float, Union[Dict[str, Any], BaseModel, Type]]]

attach_indexer(index, name=None, auto_index=True, **kwargs)[source]

Attach an index to the collection.

As an example, first let’s create a collection in a database:

>>> from linkml_store import Client
>>> from linkml_store.utils.format_utils import load_objects
>>> client = Client()
>>> db = client.attach_database("duckdb")
>>> collection = db.create_collection("Country")
>>> objs = load_objects("tests/input/countries/countries.jsonl")
>>> collection.insert(objs)

We will create two indexes - one that indexes the whole object (default behavior), the other one indexes the name only

>>> full_index = get_indexer("simple")
>>> full_index.name = "full"
>>> name_index = get_indexer("simple", text_template="{name}")
>>> name_index.name = "name"
>>> _ = collection.attach_indexer(full_index)
>>> _ = collection.attach_indexer(name_index)

Now let’s find objects using the full index, using the string “France”. We expect the country France to be the top hit, but the score will be less than zero because we did not match all fields in the object.

>>> qr = collection.search("France", index_name="full")
>>> score, top_obj = qr.ranked_rows[0]
>>> assert score > 0.1
>>> assert score < 0.5
>>> top_obj["code"]
'FR'

Now using the name index

>>> qr = collection.search("France", index_name="name")
>>> score, top_obj = qr.ranked_rows[0]
>>> assert score > 0.99
>>> top_obj["code"]
'FR'
Parameters:
  • index (Union[Indexer, str])

  • name (Optional[str])

  • auto_index – Automatically index all objects in the collection

  • kwargs

Return type:

Indexer

Returns:

get_index_collection_name(indexer)[source]
Return type:

str

index_objects(objs, index_name, replace=False, **kwargs)[source]

Index a list of objects using a specified index.

By default, the indexed objects will be stored in a shadow collection in the same database, with additional fields for the index vector

Parameters:
  • objs (List[Union[Dict[str, Any], BaseModel, Type]])

  • index_name (str) – e.g. simple, llm

  • replace

  • kwargs

Returns:

list_index_names()[source]

Return a list of index names

Return type:

List[str]

Returns:

property indexers: Dict[str, Indexer]

Return a list of indexers

Returns:

peek(limit=None)[source]

Return the first N objects in the collection

Parameters:

limit (Optional[int])

Return type:

QueryResult

Returns:

class_definition()[source]

Return the class definition for the collection.

If no schema has been explicitly set, and the native database does not have a schema, then a schema will be induced from the objects in the collection.

Return type:

Optional[ClassDefinition]

Returns:

property identifier_attribute_name: str | None

Return the name of the identifier attribute for the collection.

AKA the primary key.

Returns:

The name of the identifier attribute, if one exists.

set_identifier_attribute_name(name)[source]

Set the name of the identifier attribute for the collection.

AKA the primary key.

Parameters:

name (str) – The name of the identifier attribute.

object_identifier(obj, auto=True)[source]

Return the identifier for an object.

Parameters:
  • obj (Union[Dict[str, Any], BaseModel, Type])

  • auto – If True, generate an identifier if one does not exist.

Return type:

Optional[str]

Returns:

induce_class_definition_from_objects(objs, max_sample_size=None)[source]

Induce a class definition from a list of objects.

This uses a heuristic procedure to infer the class definition from a list of objects. In general it is recommended you explicitly provide a schema.

Parameters:
  • objs (List[Union[Dict[str, Any], BaseModel, Type]])

  • max_sample_size (Optional[int])

Return type:

ClassDefinition

Returns:

import_data(location, **kwargs)[source]

Import data from a file or stream

Parameters:
  • location (Union[Path, str, TextIO])

  • kwargs

Returns:

export_data(location, **kwargs)[source]

Export data to a file or stream

Parameters:
  • location (Union[Path, str, TextIO])

  • kwargs

Returns:

apply_patches(patches, **kwargs)[source]

Apply a patch to the collection.

Patches conform to the JSON Patch format.

Parameters:
  • patches (List[PatchDict])

  • kwargs

Returns:

diff(other, **kwargs)[source]

Diff two collections.

Parameters:
  • other (Collection) – The collection to diff against

  • kwargs

Return type:

List[PatchDict]

Returns:

iter_validate_collection(objects=None, **kwargs)[source]

Validate the contents of the collection

Parameters:
  • kwargs

  • objects (Optional[Iterable[Union[Dict[str, Any], BaseModel, Type]]]) – objects to validate

Return type:

Iterator[ValidationResult]

Returns:

iterator over validation results

commit()[source]

Commit changes to the collection.

Returns: