API
- class Client(handle=None, metadata=None)[source]
Bases:
object
A client is the top-level object for interacting with databases.
A client has access to one or more
Database
objects.Each database consists of a number of
Collection
objects.
Creating a client
>>> client = Client()
Attaching a database
>>> db = client.attach_database("duckdb", alias="test")
Note that normally a handle would be specified by a locator such as
duckdb:///<PATH>
, but for convenience, an in-memory duckdb object can be specified without a full locatorWe can check the actual handle:
>>> db.handle 'duckdb:///:memory:'
Creating a new collection
>>> collection = db.create_collection("Person") >>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}] >>> collection.insert(objs) >>> qr = collection.find() >>> len(qr.rows) 2 >>> qr.rows[0]["id"] 'P1' >>> qr.rows[1]["name"] 'Alice' >>> qr = collection.find({"name": "John"}) >>> len(qr.rows) 1 >>> qr.rows[0]["name"] 'John'
- __init__(handle=None, metadata=None)[source]
Initialize a client.
- Parameters:
handle (
Optional
[str
])metadata (
Optional
[ClientConfig
])
-
metadata:
Optional
[ClientConfig
] = None
- property handle: str | None
- property base_dir: str | None
Get the base directory for the client.
Wraps metadata.base_dir.
- Returns:
- from_config(config, base_dir=None, auto_attach=False, **kwargs)[source]
Create a client from a configuration.
Examples
>>> from linkml_store.api.config import ClientConfig >>> client = Client().from_config(ClientConfig(databases={"test": {"handle": "duckdb:///:memory:"}})) >>> len(client.databases) 0 >>> client = Client().from_config(ClientConfig(databases={"test": {"handle": "duckdb:///:memory:"}}), ... auto_attach=True) >>> len(client.databases) 1 >>> "test" in client.databases True >>> client.databases["test"].handle 'duckdb:///:memory:'
- type config:
Union
[ClientConfig
,dict
,str
,Path
]- param config:
- type base_dir:
- param base_dir:
- type auto_attach:
- param auto_attach:
- type kwargs:
- param kwargs:
- return:
- attach_database(handle, alias=None, schema_view=None, recreate_if_exists=False, **kwargs)[source]
Associate a database with a handle.
Examples
>>> client = Client() >>> db = client.attach_database("duckdb", alias="memory") >>> "memory" in client.databases True >>> db = client.attach_database("duckdb:///tmp/another.db", alias="disk") >>> len(client.databases) 2 >>> "disk" in client.databases True
- type handle:
str
- param handle:
handle for the database, e.g. duckdb:///foo.db
- type alias:
Optional
[str
]- param alias:
alias for the database, e.g foo
- type schema_view:
Optional
[SchemaView
]- param schema_view:
schema view to associate with the database
- type kwargs:
- param kwargs:
- rtype:
- return:
- get_database(name=None, create_if_not_exists=True, **kwargs)[source]
Get a named database.
Examples
>>> client = Client() >>> db = client.attach_database("duckdb:///test.db", alias="test") >>> retrieved_db = client.get_database("test") >>> db == retrieved_db True
- type name:
Optional
[str
]- param name:
if None, there must be a single database attached
- type create_if_not_exists:
- param create_if_not_exists:
- type kwargs:
- param kwargs:
- rtype:
- return:
- property databases: Dict[str, Database]
Return all attached databases
Examples
>>> client = Client() >>> _ = client.attach_database("duckdb", alias="test1") >>> _ = client.attach_database("duckdb", alias="test2") >>> len(client.databases) 2 >>> "test1" in client.databases True >>> "test2" in client.databases True >>> client.databases["test1"].handle 'duckdb:///:memory:' >>> client.databases["test2"].handle 'duckdb:///:memory:'
- Returns:
- drop_database(name, missing_ok=False, **kwargs)[source]
Drop a database.
Example (in-memory):
>>> client = Client() >>> db1 = client.attach_database("duckdb", alias="test1") >>> db2 = client.attach_database("duckdb", alias="test2") >>> len(client.databases) 2 >>> client.drop_database("test1") >>> len(client.databases) 1
Databases that persist on disk:
>>> client = Client() >>> path = Path("tmp/test.db") >>> path.parent.mkdir(parents=True, exist_ok=True) >>> db = client.attach_database(f"duckdb:///{path}", alias="test") >>> len(client.databases) 1 >>> db.store({"persons": [{"id": "P1", "name": "John"}]}) >>> db.commit() >>> Path("tmp/test.db").exists() True >>> client.drop_database("test") >>> len(client.databases) 0 >>> Path("tmp/test.db").exists() False
Dropping a non-existent database:
>>> client = Client() >>> client.drop_database("duckdb:///tmp/made-up1", missing_ok=True) >>> client.drop_database("duckdb:///tmp/made-up2", missing_ok=False) Traceback (most recent call last): ... ValueError: Database duckdb:///tmp/made-up2 not found
- Parameters:
name (
str
)missing_ok
- Returns:
- drop_all_databases(**kwargs)[source]
Drop all databases.
Example (in-memory):
>>> client = Client() >>> db1 = client.attach_database("duckdb", alias="test1") >>> assert "test1" in client.databases >>> db2 = client.attach_database("duckdb", alias="test2") >>> assert "test2" in client.databases >>> client.drop_all_databases() >>> len(client.databases) 0
- Parameters:
missing_ok
- Returns:
- class Database(handle=None, metadata=None, **kwargs)[source]
Bases:
ABC
,Generic
[CollectionType
]A Database provides access to named collections of data.
A database object is owned by a Client. The database object uses a handle to know what kind of external dataase system to connect to (e.g. duckdb, mongodb). The handle is a string
<DatabaseType>:<LocalLocator>
The database object may also have an alias that is mapped to the handle.
Attaching a database
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb:///:memory:", alias="test")
We can check the value of the handle:
>>> db.handle 'duckdb:///:memory:'
The alias can be used to retrieve the database object from the client
>>> assert db == client.get_database("test")
Creating a collection
>>> collection = db.create_collection("Person") >>> len(db.list_collections()) 1 >>> db.get_collection("Person") == collection True >>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}] >>> collection.insert(objs) >>> qr = collection.find() >>> len(qr.rows) 2 >>> qr.rows[0]["id"] 'P1' >>> qr.rows[1]["name"] 'Alice' >>> qr = collection.find({"name": "John"}) >>> len(qr.rows) 1 >>> qr.rows[0]["name"] 'John'
-
collection_class:
ClassVar
[Optional
[Type
[Collection
]]] = None
-
listeners:
Optional
[List
[Callable
[[Collection
,List
[PatchDict
]],None
]]] = None
-
metadata:
Optional
[DatabaseConfig
] = None
- from_config(db_config, **kwargs)[source]
Initialize a database from a configuration.
TODO: DEPRECATE
- Parameters:
db_config (
DatabaseConfig
) – database configurationkwargs – additional arguments
- property recreate_if_exists: bool
Return whether to recreate the database if it already exists.
- Returns:
- property handle: str
Return the database handle.
Examples:
duckdb:///:memory:
duckdb:///tmp/test.db
mongodb://localhost:27017/
- Returns:
- property alias
- store(obj, **kwargs)[source]
Store an object in the database.
The object is assumed to be a Dictionary of Collections.
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> db.store({"persons": [{"id": "P1", "name": "John", "age_in_years": 30}]}) >>> collection = db.get_collection("persons") >>> qr = collection.find() >>> qr.num_rows 1
- Parameters:
obj (
Dict
[str
,Any
]) – object to storekwargs – additional arguments
- create_collection(name, alias=None, metadata=None, recreate_if_exists=False, **kwargs)[source]
Create a new collection in the current database.
The collection must have a Type, and may have an Alias.
Examples:
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person", alias="persons") >>> collection.alias 'persons' >>> collection.target_class_name 'Person'
If alias is not provided, it defaults to the name of the type.
>>> collection = db.create_collection("Organization") >>> collection.alias 'Organization'
- Parameters:
name (
str
) – name of the collectionalias (
Optional
[str
]) – alias for the collectionmetadata (
Optional
[CollectionConfig
]) – metadata for the collectionrecreate_if_exists – recreate the collection if it already exists
kwargs – additional arguments
- Return type:
- list_collections(include_internal=False)[source]
List all collections.
Examples
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> c1 = db.create_collection("Person") >>> c2 = db.create_collection("Product") >>> collections = db.list_collections() >>> len(collections) 2 >>> [c.target_class_name for c in collections] ['Person', 'Product']
- type include_internal:
- param include_internal:
include internal collections
- rtype:
Sequence
[Collection
]- return:
list of collections
- list_collection_names(**kwargs)[source]
List all collection names.
- Return type:
Sequence
[str
]
Examples
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> c1 = db.create_collection("Person") >>> c2 = db.create_collection("Product") >>> collection_names = db.list_collection_names() >>> len(collection_names) 2 >>> collection_names ['Person', 'Product']
- get_collection(name, type=None, create_if_not_exists=True, **kwargs)[source]
Get a named collection.
- Return type:
Examples
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> db.get_collection("Person") == collection True >>> db.get_collection("NonExistent", create_if_not_exists=False) Traceback (most recent call last): ... KeyError: 'Collection NonExistent does not exist'
- type name:
str
- param name:
name of the collection
- type type:
Optional
[str
]- param type:
target class name
- type create_if_not_exists:
- param create_if_not_exists:
create the collection if it does not exist
- init_collections()[source]
Initialize collections.
TODO: Not typically called directly: consider making this private :return:
- query(query, **kwargs)[source]
Run a query against the database.
Examples
>>> from linkml_store.api.client import Client >>> from linkml_store.api.queries import Query >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> collection.insert([{"id": "P1", "name": "John"}, {"id": "P2", "name": "Alice"}]) >>> query = Query(from_table="Person", where_clause={"name": "John"}) >>> result = db.query(query) >>> len(result.rows) 1 >>> result.rows[0]["id"] 'P1'
- type query:
Query
- param query:
- type kwargs:
- param kwargs:
- rtype:
QueryResult
- return:
- property schema_view: SchemaView
Return a schema view for the named collection.
If no explicit schema is provided, this will generalize one
Induced schema example:
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person", alias="persons") >>> collection.insert([{"id": "P1", "name": "John", "age_in_years": 25}]) >>> schema_view = db.schema_view >>> cd = schema_view.get_class("Person") >>> cd.attributes["id"].range 'string' >>> cd.attributes["age_in_years"].range 'integer'
We can reuse the same class:
>>> collection2 = db.create_collection("Person", alias="other_persons") >>> collection2.class_definition().attributes["age_in_years"].range 'integer'
- set_schema_view(schema_view)[source]
Set the schema view for the database.
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> sv = SchemaView("tests/input/countries/countries.linkml.yaml") >>> db.set_schema_view(sv) >>> cd = db.schema_view.schema.classes["Country"] >>> sorted(cd.slots) ['capital', 'code', 'continent', 'languages', 'name'] >>> induced_slots = {s.name: s for s in sv.class_induced_slots("Country")} >>> sorted(induced_slots.keys()) ['capital', 'code', 'continent', 'languages', 'name'] >>> induced_slots["code"].identifier True
Creating a new collection will align with the schema view:
>>> collection = db.create_collection("Country", "all_countries") >>> sorted(collection.class_definition().slots) ['capital', 'code', 'continent', 'languages', 'name']
- Parameters:
schema_view (
Union
[str
,Path
,SchemaView
]) – can be either a path to the schema, or a SchemaView object- Returns:
- load_schema_view(path)[source]
Load a schema view from a file.
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> db.load_schema_view("tests/input/countries/countries.linkml.yaml") >>> sv = db.schema_view >>> cd = sv.schema.classes["Country"] >>> sorted(cd.slots) ['capital', 'code', 'continent', 'languages', 'name'] >>> induced_slots = {s.name: s for s in sv.class_induced_slots("Country")} >>> sorted(induced_slots.keys()) ['capital', 'code', 'continent', 'languages', 'name'] >>> induced_slots["code"].identifier True
Creating a new collection will align with the schema view:
>>> collection = db.create_collection("Country", "all_countries") >>> sorted(collection.class_definition().slots) ['capital', 'code', 'continent', 'languages', 'name']
- Parameters:
path (
Union
[str
,Path
])- Returns:
- induce_schema_view()[source]
Induce a schema view from a schema definition.
>>> from linkml_store.api.client import Client >>> from linkml_store.api.queries import Query >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> collection.insert([{"id": "P1", "name": "John", "age_in_years": 25}, ... {"id": "P2", "name": "Alice", "age_in_years": 25}]) >>> schema_view = db.induce_schema_view() >>> cd = schema_view.get_class("Person") >>> cd.attributes["id"].range 'string' >>> cd.attributes["age_in_years"].range 'integer'
- Return type:
SchemaView
- Returns:
A schema view
- iter_validate_database(**kwargs)[source]
Validate the contents of the database.
An an example, let’s create a database with a predefined schema from the countries.linkml.yaml file:
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> db.load_schema_view("tests/input/countries/countries.linkml.yaml")
Let’s introspect the schema to see what slots are applicable for the class “Country”:
>>> sv = db.schema_view >>> for slot in sv.class_induced_slots("Country"): ... print(slot.name, slot.range, slot.required) name string True code string True capital string True continent string True languages Language None
Next we’ll create a collection, binding it to the target class “Country”, and insert valid data:
>>> collection = db.create_collection("Country", "all_countries") >>> obj = {"code": "US", "name": "United States", "continent": "North America", "capital": "Washington, D.C."} >>> collection.insert([obj]) >>> list(db.iter_validate_database()) []
Now let’s insert some invalid data (missing required fields)
>>> collection.insert([{"code": "FR", "name": "France"}]) >>> for r in db.iter_validate_database(): ... print(r.message[0:32]) 'capital' is a required property 'continent' is a required proper
- Parameters:
kwargs
- Return type:
Iterator
[ValidationResult
]- Returns:
iterator over validation results
- drop(**kwargs)[source]
Drop the database and all collections.
>>> from linkml_store.api.client import Client >>> client = Client() >>> path = Path("/tmp/test.db") >>> path.parent.mkdir(exist_ok=True, parents=True) >>> db = client.attach_database(f"duckdb:///{path}") >>> db.store({"persons": [{"id": "P1", "name": "John", "age_in_years": 30}]}) >>> coll = db.get_collection("persons") >>> coll.find({}).num_rows 1 >>> db.drop() >>> db = client.attach_database("duckdb:///tmp/test.db", alias="test") >>> coll = db.get_collection("persons") >>> coll.find({}).num_rows 0
- Parameters:
kwargs – additional arguments
- import_database(location, source_format=None, collection_name=None, **kwargs)[source]
Import a database from a file or location.
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> db.import_database("tests/input/iris.csv", Format.CSV, collection_name="iris") >>> db.list_collection_names() ['iris'] >>> collection = db.get_collection("iris") >>> collection.find({}).num_rows 150
- Parameters:
location (
str
) – location of the filesource_format (
Union
[str
,Format
,None
]) – source formatcollection_name (
Optional
[str
]) – (Optional) name of the collection, for data that is flatkwargs – additional arguments
- export_database(location, target_format=None, **kwargs)[source]
Export a database to a file or location.
>>> from linkml_store.api.client import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> db.import_database("tests/input/iris.csv", Format.CSV, collection_name="iris") >>> db.export_database("/tmp/iris.yaml", Format.YAML)
- Parameters:
location (
str
) – location of the filetarget_format (
Union
[str
,Format
,None
]) – target formatkwargs – additional arguments
-
collection_class:
- class Collection(name, parent=None, metadata=None, **kwargs)[source]
Bases:
Generic
[DatabaseType
]A collection is an organized set of objects of the same or similar type.
For relational databases, a collection is typically a table
For document databases such as MongoDB, a collection is the native type
For a file system, a collection could be a single tabular file such as Parquet or CSV.
Collection objects are typically not created directly - instead they are generated from a parent
Database
object:>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}] >>> collection.insert(objs)
-
default_index_name:
ClassVar
[str
] = 'simple'
-
parent:
Optional
[TypeVar
(DatabaseType
, bound= Database)] = None
-
metadata:
Optional
[CollectionConfig
] = None
True if the collection is hidden.
An example of a hidden collection is a collection that indexes another collection
- Returns:
True if the collection is hidden
- property target_class_name
Return the name of the class that this collection represents
This MUST be a LinkML class name
>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person", alias="persons") >>> collection.target_class_name 'Person'
>>> collection = db.create_collection("Organization") >>> collection.target_class_name 'Organization' >>> collection.alias 'Organization'
- Returns:
name of the class which members of this collection instantiate
- property alias
Return the primary name/alias used for the collection.
This MAY be the name of the LinkML class, but it may be desirable to have an alias, for example “persons” which collects all instances of class Person.
>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person", alias="persons") >>> collection.alias 'persons'
If no explicit alias is provided, then the target class name is used:
>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> collection.alias 'Person'
The alias SHOULD be used for Table names in SQL.
For nested data, the alias SHOULD be used as the key; e.g
{ "persons": [ { "name": "Alice" }, { "name": "Bob" } ] }
- Returns:
- replace(objs, **kwargs)[source]
Replace entire collection with objects.
>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}] >>> collection.insert(objs)
- Parameters:
objs (
Union
[Dict
[str
,Any
],BaseModel
,Type
,List
[Union
[Dict
[str
,Any
],BaseModel
,Type
]]])kwargs
- Returns:
- insert(objs, **kwargs)[source]
Add one or more objects to the collection.
>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}] >>> collection.insert(objs)
- Parameters:
objs (
Union
[Dict
[str
,Any
],BaseModel
,Type
,List
[Union
[Dict
[str
,Any
],BaseModel
,Type
]]])kwargs
- Returns:
- delete(objs, **kwargs)[source]
Delete one or more objects from the collection.
First let’s set up a collection:
>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}] >>> collection.insert(objs) >>> collection.find({}).num_rows 2
Now let’s delete an object:
>>> collection.delete(objs[0]) >>> collection.find({}).num_rows 1
Deleting the same object again should have no effect:
>>> collection.delete(objs[0]) >>> collection.find({}).num_rows 1
- Parameters:
objs (
Union
[Dict
[str
,Any
],BaseModel
,Type
,List
[Union
[Dict
[str
,Any
],BaseModel
,Type
]]])kwargs
- Return type:
Optional
[int
]- Returns:
- delete_where(where=None, missing_ok=True, **kwargs)[source]
Delete objects that match a query.
First let’s set up a collection:
>>> from linkml_store import Client >>> client = Client() >>> db = client.attach_database("duckdb", alias="test") >>> collection = db.create_collection("Person") >>> objs = [{"id": "P1", "name": "John", "age_in_years": 30}, {"id": "P2", "name": "Alice", "age_in_years": 25}] >>> collection.insert(objs)
Now let’s delete an object:
>>> collection.delete_where({"id": "P1"}) >>> collection.find({}).num_rows 1
Match everything:
>>> collection.delete_where({}) >>> collection.find({}).num_rows 0
- Parameters:
where (
Optional
[Dict
[str
,Any
]]) – where conditionsmissing_ok – if True, do not raise an error if the collection does not exist
kwargs
- Return type:
Optional
[int
]- Returns:
number of objects deleted (or -1 if unsupported)
- update(objs, **kwargs)[source]
Update one or more objects in the collection.
- Parameters:
objs (
Union
[Dict
[str
,Any
],BaseModel
,Type
,List
[Union
[Dict
[str
,Any
],BaseModel
,Type
]]])kwargs
- Returns:
- query(query, **kwargs)[source]
Run a query against the collection.
First let’s load a collection:
>>> from linkml_store import Client >>> from linkml_store.utils.format_utils import load_objects >>> client = Client() >>> db = client.attach_database("duckdb") >>> collection = db.create_collection("Country") >>> objs = load_objects("tests/input/countries/countries.jsonl") >>> collection.insert(objs)
Now let’s run a query:
TODO
- Parameters:
query (
Query
)kwargs
- Return type:
QueryResult
- Returns:
- query_facets(where=None, facet_columns=None, facet_limit=100, **kwargs)[source]
Run a query to get facet counts for one or more columns.
This function takes a database connection, a Query object, and a list of column names. It generates and executes a facet count query for each specified column and returns the results as a dictionary where the keys are the column names and the values are pandas DataFrames containing the facet counts.
The facet count query is generated by modifying the original query’s WHERE clause to exclude conditions directly related to the facet column. This allows for counting the occurrences of each unique value in the facet column while still applying the other filtering conditions.
- Parameters:
con – A DuckDB database connection.
query – A Query object representing the base query.
facet_columns (
Optional
[List
[str
]]) – A list of column names to get facet counts for.facet_limit
- Return type:
Dict
[str
,List
[Tuple
[Any
,int
]]]- Returns:
A dictionary where keys are column names and values are tuples containing the facet counts for each unique value in the respective column.
- get(ids, **kwargs)[source]
Get one or more objects by ID.
- Parameters:
ids (
Optional
[List
[str
]])kwargs
- Return type:
QueryResult
- Returns:
- get_one(id, **kwargs)[source]
Get one object by ID.
- Parameters:
id (
str
)kwargs
- Return type:
Union
[Dict
[str
,Any
],BaseModel
,Type
,None
]- Returns:
- find(where=None, **kwargs)[source]
Find objects in the collection using a where query.
As an example, first load a collection:
>>> from linkml_store import Client >>> from linkml_store.utils.format_utils import load_objects >>> client = Client() >>> db = client.attach_database("duckdb") >>> collection = db.create_collection("Country") >>> objs = load_objects("tests/input/countries/countries.jsonl") >>> collection.insert(objs)
Now let’s find all objects:
>>> qr = collection.find({}) >>> qr.num_rows 20
We can do a more restrictive query:
>>> qr = collection.find({"code": "FR"}) >>> qr.num_rows 1 >>> qr.rows[0]["name"] 'France'
- Parameters:
where (
Optional
[Any
])kwargs
- Return type:
QueryResult
- Returns:
- find_iter(where=None, page_size=100, **kwargs)[source]
Find objects in the collection using a where query.
- Parameters:
where (
Optional
[Any
])kwargs
- Return type:
Iterator
[Union
[Dict
[str
,Any
],BaseModel
,Type
]]- Returns:
- search(query, where=None, index_name=None, limit=None, mmr_relevance_factor=None, **kwargs)[source]
Search the collection using a text-based index index.
Example:
>>> from linkml_store import Client >>> from linkml_store.utils.format_utils import load_objects >>> client = Client() >>> db = client.attach_database("duckdb") >>> collection = db.create_collection("Country") >>> objs = load_objects("tests/input/countries/countries.jsonl") >>> collection.insert(objs)
Now let’s index, using the simple trigram-based index
>>> index = get_indexer("simple") >>> _ = collection.attach_indexer(index)
Now let’s find all objects:
>>> qr = collection.search("France") >>> score, top_obj = qr.ranked_rows[0] >>> assert score > 0.1 >>> top_obj["code"] 'FR'
- Parameters:
query (
str
)where (
Optional
[Any
])index_name (
Optional
[str
])limit (
Optional
[int
])kwargs
- Return type:
QueryResult
- Returns:
- property is_internal: bool
Check if the collection is internal.
Internal collections are hidden by default. Examples of internal collections include shadow “index” collections
- Returns:
- load_from_source(load_if_exists=False)[source]
Load objects from the source location.
- Parameters:
load_if_exists
- Returns:
- size()[source]
Return the number of objects in the collection.
- Return type:
int
- Returns:
The number of objects in the collection.
- rows_iter()[source]
Return an iterator over the objects in the collection.
- Return type:
Iterable
[Union
[Dict
[str
,Any
],BaseModel
,Type
]]- Returns:
- rows()[source]
Return a list of objects in the collection.
- Return type:
List
[Union
[Dict
[str
,Any
],BaseModel
,Type
]]- Returns:
- ranked_rows()[source]
Return a list of objects in the collection, with scores.
- Return type:
List
[Tuple
[float
,Union
[Dict
[str
,Any
],BaseModel
,Type
]]]
- attach_indexer(index, name=None, auto_index=True, **kwargs)[source]
Attach an index to the collection.
As an example, first let’s create a collection in a database:
>>> from linkml_store import Client >>> from linkml_store.utils.format_utils import load_objects >>> client = Client() >>> db = client.attach_database("duckdb") >>> collection = db.create_collection("Country") >>> objs = load_objects("tests/input/countries/countries.jsonl") >>> collection.insert(objs)
We will create two indexes - one that indexes the whole object (default behavior), the other one indexes the name only
>>> full_index = get_indexer("simple") >>> full_index.name = "full" >>> name_index = get_indexer("simple", text_template="{name}") >>> name_index.name = "name" >>> _ = collection.attach_indexer(full_index) >>> _ = collection.attach_indexer(name_index)
Now let’s find objects using the full index, using the string “France”. We expect the country France to be the top hit, but the score will be less than zero because we did not match all fields in the object.
>>> qr = collection.search("France", index_name="full") >>> score, top_obj = qr.ranked_rows[0] >>> assert score > 0.1 >>> assert score < 0.5 >>> top_obj["code"] 'FR'
Now using the name index
>>> qr = collection.search("France", index_name="name") >>> score, top_obj = qr.ranked_rows[0] >>> assert score > 0.99 >>> top_obj["code"] 'FR'
- Parameters:
index (
Union
[Indexer
,str
])name (
Optional
[str
])auto_index – Automatically index all objects in the collection
kwargs
- Return type:
Indexer
- Returns:
- index_objects(objs, index_name, replace=False, **kwargs)[source]
Index a list of objects using a specified index.
By default, the indexed objects will be stored in a shadow collection in the same database, with additional fields for the index vector
- Parameters:
objs (
List
[Union
[Dict
[str
,Any
],BaseModel
,Type
]])index_name (
str
) – e.g. simple, llmreplace
kwargs
- Returns:
- property indexers: Dict[str, Indexer]
Return a list of indexers
- Returns:
- peek(limit=None)[source]
Return the first N objects in the collection
- Parameters:
limit (
Optional
[int
])- Return type:
QueryResult
- Returns:
- class_definition()[source]
Return the class definition for the collection.
If no schema has been explicitly set, and the native database does not have a schema, then a schema will be induced from the objects in the collection.
- Return type:
Optional
[ClassDefinition
]- Returns:
- property identifier_attribute_name: str | None
Return the name of the identifier attribute for the collection.
AKA the primary key.
- Returns:
The name of the identifier attribute, if one exists.
- set_identifier_attribute_name(name)[source]
Set the name of the identifier attribute for the collection.
AKA the primary key.
- Parameters:
name (
str
) – The name of the identifier attribute.
- object_identifier(obj, auto=True)[source]
Return the identifier for an object.
- Parameters:
obj (
Union
[Dict
[str
,Any
],BaseModel
,Type
])auto – If True, generate an identifier if one does not exist.
- Return type:
Optional
[str
]- Returns:
- induce_class_definition_from_objects(objs, max_sample_size=None)[source]
Induce a class definition from a list of objects.
This uses a heuristic procedure to infer the class definition from a list of objects. In general it is recommended you explicitly provide a schema.
- Parameters:
objs (
List
[Union
[Dict
[str
,Any
],BaseModel
,Type
]])max_sample_size (
Optional
[int
])
- Return type:
ClassDefinition
- Returns:
- import_data(location, **kwargs)[source]
Import data from a file or stream
- Parameters:
location (
Union
[Path
,str
,TextIO
])kwargs
- Returns:
- export_data(location, **kwargs)[source]
Export data to a file or stream
- Parameters:
location (
Union
[Path
,str
,TextIO
])kwargs
- Returns:
- apply_patches(patches, **kwargs)[source]
Apply a patch to the collection.
Patches conform to the JSON Patch format.
- Parameters:
patches (
List
[PatchDict
])kwargs
- Returns:
- diff(other, **kwargs)[source]
Diff two collections.
- Parameters:
other (
Collection
) – The collection to diff againstkwargs
- Return type:
List
[PatchDict
]- Returns: