{ "cells": [ { "cell_type": "markdown", "source": [ "# How to use Semantic Search\n", "\n", "This tutorial will show you how to use indexing and semantic search.\n", "\n", "## Background\n", "\n", "LinkML-Store allows you to *compose* different indexing strategies with any backend. Currently there are two\n", "indexing strategies:\n", "\n", "- Simple trigram-based\n", "- LLM text or image embedding based (using models from OpenAI, HuggingFace, and others)\n", "\n", "These indexes can be added into any backend (duckdb, mongo, ...)\n", "\n", "Additionally, some backends may have their own indexing strategy\n", "\n", "- Solr has a number of text-based indexing strategies\n", "- ChromaDB can use text-based vector embeddings\n", "\n", "LinkML-Store allows for maximum flexibility.\n", "\n", "This tutorial shows how to use an OpenAI-based embedding strategy in combination with DuckDB." ], "metadata": { "collapsed": false }, "id": "315813bcb5f486a4" }, { "cell_type": "markdown", "source": [ "## Obtaining upstream files\n", "\n", "We will use the OBO Graphs encoding of the Enzyme Commission (EC) database, via [biogragmatics](https://w3id.org/biopragmatics)\n", "\n", "We will use the pystow library to cache the upstream file. " ], "metadata": { "collapsed": false }, "id": "2559f81507693d24" }, { "cell_type": "code", "execution_count": 21, "outputs": [], "source": [ "import pystow\n", "path=pystow.ensure(\"tmp\", \"eccode.json\", url=\"https://w3id.org/biopragmatics/resources/eccode/eccode.json\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.097147Z", "start_time": "2024-05-09T21:34:38.095093Z" } }, "id": "5a0030a3e24b1545" }, { "cell_type": "markdown", "source": [ "Let's examining the structure of the JSON. There is a top level `graphs` index, each of which holds a set of `nodes` and `edges`:" ], "metadata": { "collapsed": false }, "id": "8e046de9070b82c1" }, { "cell_type": "code", "execution_count": 22, "outputs": [], "source": [ "import json\n", "\n", "graphdoc = json.load(open(path))\n", "graph = graphdoc[\"graphs\"][0]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.447475Z", "start_time": "2024-05-09T21:34:38.097539Z" } }, "id": "793566db96fc1cd9" }, { "cell_type": "code", "execution_count": 23, "outputs": [ { "data": { "text/plain": "(7177, 506022)" }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(graph[\"nodes\"]), len(graph[\"edges\"])" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.453170Z", "start_time": "2024-05-09T21:34:38.450138Z" } }, "id": "b3c3cf57c2d9aeed" }, { "cell_type": "markdown", "source": [ "## Storing the JSON\n", "\n", "We will create a duckdb database to insert the JSON objects. We'll put this in a `tmp/` folder" ], "metadata": { "collapsed": false }, "id": "9f11d6d2b7a2c602" }, { "cell_type": "code", "execution_count": 24, "outputs": [], "source": [ "!mkdir -p tmp" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.582813Z", "start_time": "2024-05-09T21:34:38.462457Z" } }, "id": "cbae2c783889c9b3" }, { "cell_type": "code", "execution_count": 25, "outputs": [], "source": [ "from linkml_store import Client\n", "\n", "client = Client()\n", "db = client.attach_database(\"duckdb:///tmp/eccode.db\", \"eccode\", recreate_if_exists=True)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.587860Z", "start_time": "2024-05-09T21:34:38.585172Z" } }, "id": "6a8adce3d3ec93c6" }, { "cell_type": "markdown", "source": [ "We will create an index for nodes. (we could make a separate collection for edges, but this is less relevant\n", "for this tutorial)" ], "metadata": { "collapsed": false }, "id": "1dcfaf6e8d4b3bf6" }, { "cell_type": "code", "execution_count": 26, "outputs": [], "source": [ "nodes_collection = db.create_collection(\"Node\", \"nodes\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.591738Z", "start_time": "2024-05-09T21:34:38.588796Z" } }, "id": "4fa95b75cd1f19cf" }, { "cell_type": "markdown", "source": [ "For demonstration purposes we'll only store the first 200 entries (it can be slow to index everything via the OpenAI API)" ], "metadata": { "collapsed": false }, "id": "ecc62533a80b1175" }, { "cell_type": "code", "execution_count": 27, "outputs": [], "source": [ "nodes_collection.insert(graph[\"nodes\"][0:200])" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.657942Z", "start_time": "2024-05-09T21:34:38.594941Z" } }, "id": "aee6a95b4a86432d" }, { "cell_type": "code", "execution_count": 28, "outputs": [ { "data": { "text/plain": "[{'id': 'http://purl.obolibrary.org/obo/RO_0002327',\n 'lbl': 'enables',\n 'type': 'PROPERTY',\n 'meta': None},\n {'id': 'http://purl.obolibrary.org/obo/RO_0002351',\n 'lbl': 'has member',\n 'type': 'PROPERTY',\n 'meta': None},\n {'id': 'http://purl.obolibrary.org/obo/eccode_1',\n 'lbl': 'Oxidoreductases',\n 'type': 'CLASS',\n 'meta': None},\n {'id': 'http://purl.obolibrary.org/obo/eccode_1.1',\n 'lbl': 'Acting on the CH-OH group of donors',\n 'type': 'CLASS',\n 'meta': None},\n {'id': 'http://purl.obolibrary.org/obo/eccode_1.1.1',\n 'lbl': 'With NAD(+) or NADP(+) as acceptor',\n 'type': 'CLASS',\n 'meta': None},\n {'id': 'http://purl.obolibrary.org/obo/eccode_1.1.1.1',\n 'lbl': 'alcohol dehydrogenase',\n 'type': 'CLASS',\n 'meta': ['synonyms']}]" }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nodes_collection.find(limit=6).rows" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.677242Z", "start_time": "2024-05-09T21:34:38.658654Z" } }, "id": "95c06a10ecfcfb67" }, { "cell_type": "markdown", "source": [ "## Creating an LLMIndexer\n", "\n", "We will create an indexer, and configure it to cache calls. This means that the 2nd time we run this notebook\n", "it will be much faster, since all the embeddings will be cached.\n", "\n", "The indexer will index using the `lbl` field. In OBO Graphs JSON, this is the name/label of the concept." ], "metadata": { "collapsed": false }, "id": "80d64d4e6570d69c" }, { "cell_type": "code", "execution_count": 29, "outputs": [], "source": [ "from linkml_store.index.implementations.llm_indexer import LLMIndexer\n", "\n", "index = LLMIndexer(name=\"test\", cached_embeddings_database=\"tmp/llm_cache.db\", index_attributes=[\"lbl\"])" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:34:38.681471Z", "start_time": "2024-05-09T21:34:38.675793Z" } }, "id": "8e75156bfdafe7b" }, { "cell_type": "code", "execution_count": 30, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages/duckdb_engine/__init__.py:580: SAWarning: Did not recognize type 'list' of column 'embedding'\n", " columns = self._get_columns_info(rows, domains, enums, schema) # type: ignore[attr-defined]\n", "/Users/cjm/Library/Caches/pypoetry/virtualenvs/linkml-store-8ZYO4kTy-py3.10/lib/python3.10/site-packages/duckdb_engine/__init__.py:173: DuckDBEngineWarning: duckdb-engine doesn't yet support reflection on indices\n", " warnings.warn(\n" ] } ], "source": [ "nodes_collection.attach_indexer(index)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:41.479112Z", "start_time": "2024-05-09T21:34:38.679704Z" } }, "id": "993c0360941dbb3b" }, { "cell_type": "markdown", "source": [ "## Searching using the index\n", "\n", "Now we have attached an index, we can use it in semantic search. We'll search our EC subset nodes collection for a string `sugar transporters`. Note that this string doesn't occur precisely in the index but we can still rank closeness in semantic space.\n", "\n", "When using `search` the field `ranked_rows` is populated in the result object. This is a list of `(score, object)` tuples, which we will look at by translating into a pandas DataFrame:" ], "metadata": { "collapsed": false }, "id": "47c1cc117f379d63" }, { "cell_type": "code", "execution_count": 31, "outputs": [], "source": [ "qr = nodes_collection.search(\"sugar transporters\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:41.829516Z", "start_time": "2024-05-09T21:36:41.479725Z" } }, "id": "e3abb5d529063c6e" }, { "cell_type": "code", "execution_count": 32, "outputs": [], "source": [ "results = [{\"sim\": r[0], \"id\": r[1][\"id\"], \"name\": r[1][\"lbl\"]} for r in qr.ranked_rows]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:41.834697Z", "start_time": "2024-05-09T21:36:41.830216Z" } }, "id": "cc3bce09e81d66a" }, { "cell_type": "code", "execution_count": 33, "outputs": [], "source": [ "import pandas as pd\n", "df = pd.DataFrame(results)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:41.839351Z", "start_time": "2024-05-09T21:36:41.834571Z" } }, "id": "5b9d6d63561b72db" }, { "cell_type": "code", "execution_count": 34, "outputs": [ { "data": { "text/plain": " sim id \\\n0 0.811244 http://purl.obolibrary.org/obo/eccode_1.1.1.22 \n1 0.811244 http://purl.obolibrary.org/obo/eccode_1.1.1.22 \n2 0.809951 http://purl.obolibrary.org/obo/eccode_1.1.1.124 \n3 0.809951 http://purl.obolibrary.org/obo/eccode_1.1.1.124 \n4 0.808296 http://purl.obolibrary.org/obo/eccode_1.1.1.10 \n.. ... ... \n395 0.738170 http://purl.obolibrary.org/obo/RO_0002351 \n396 0.730031 http://purl.obolibrary.org/obo/eccode_1.1.1.104 \n397 0.730031 http://purl.obolibrary.org/obo/eccode_1.1.1.104 \n398 0.722299 http://purl.obolibrary.org/obo/eccode_1.1.1.223 \n399 0.722299 http://purl.obolibrary.org/obo/eccode_1.1.1.223 \n\n name \n0 UDP-glucose 6-dehydrogenase \n1 UDP-glucose 6-dehydrogenase \n2 fructose 5-dehydrogenase (NADP(+)) \n3 fructose 5-dehydrogenase (NADP(+)) \n4 L-xylulose reductase \n.. ... \n395 has member \n396 4-oxoproline reductase \n397 4-oxoproline reductase \n398 isopiperitenol dehydrogenase \n399 isopiperitenol dehydrogenase \n\n[400 rows x 3 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
simidname
00.811244http://purl.obolibrary.org/obo/eccode_1.1.1.22UDP-glucose 6-dehydrogenase
10.811244http://purl.obolibrary.org/obo/eccode_1.1.1.22UDP-glucose 6-dehydrogenase
20.809951http://purl.obolibrary.org/obo/eccode_1.1.1.124fructose 5-dehydrogenase (NADP(+))
30.809951http://purl.obolibrary.org/obo/eccode_1.1.1.124fructose 5-dehydrogenase (NADP(+))
40.808296http://purl.obolibrary.org/obo/eccode_1.1.1.10L-xylulose reductase
............
3950.738170http://purl.obolibrary.org/obo/RO_0002351has member
3960.730031http://purl.obolibrary.org/obo/eccode_1.1.1.1044-oxoproline reductase
3970.730031http://purl.obolibrary.org/obo/eccode_1.1.1.1044-oxoproline reductase
3980.722299http://purl.obolibrary.org/obo/eccode_1.1.1.223isopiperitenol dehydrogenase
3990.722299http://purl.obolibrary.org/obo/eccode_1.1.1.223isopiperitenol dehydrogenase
\n

400 rows × 3 columns

\n
" }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:41.844362Z", "start_time": "2024-05-09T21:36:41.840551Z" } }, "id": "dc81533a0b074360" }, { "cell_type": "markdown", "source": [ "Even though our dataset had no actual sugar transporters, there are still ranked results, with the top 3 ranked highly by virtue of concerning sugars (even if they are not transporters).\n", "\n", "Note if we had indexed all of EC we would see sugar transporters." ], "metadata": { "collapsed": false }, "id": "f0b5eadd3dd4885a" }, { "cell_type": "code", "execution_count": 41, "outputs": [ { "data": { "text/plain": "[(0.8111048198475599,\n {'id': 'http://purl.obolibrary.org/obo/eccode_1.1.1.22',\n 'lbl': 'UDP-glucose 6-dehydrogenase',\n 'type': 'CLASS',\n 'meta': None}),\n (0.8111048198475599,\n {'id': 'http://purl.obolibrary.org/obo/eccode_1.1.1.22',\n 'lbl': 'UDP-glucose 6-dehydrogenase',\n 'type': 'CLASS',\n 'meta': None}),\n (0.8098110004639347,\n {'id': 'http://purl.obolibrary.org/obo/eccode_1.1.1.124',\n 'lbl': 'fructose 5-dehydrogenase (NADP(+))',\n 'type': 'CLASS',\n 'meta': ['synonyms']})]" }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = nodes_collection.search(\"sugar transporters\", where={\"type\": \"CLASS\"})\n", "qr.ranked_rows[0:3]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:46:24.525183Z", "start_time": "2024-05-09T21:46:24.126159Z" } }, "id": "c4904a4d788a4241" }, { "cell_type": "markdown", "source": [ "## How it works\n", "\n", "Let's peek under the hood into the duckdb instance to see how this is all implemented in DuckDB.\n", "\n", "To do this we'll connect to the duckdb instance directly using the `sql` extension in Jupyter" ], "metadata": { "collapsed": false }, "id": "69357d61c2eea997" }, { "cell_type": "markdown", "source": [ "Load extension:" ], "metadata": { "collapsed": false }, "id": "8d140543ebaed624" }, { "cell_type": "code", "execution_count": 36, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sql extension is already loaded. To reload it, use:\n", " %reload_ext sql\n" ] } ], "source": [ "%load_ext sql" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:42.209182Z", "start_time": "2024-05-09T21:36:42.205831Z" } }, "id": "c1df3a0598731367" }, { "cell_type": "code", "execution_count": 37, "outputs": [], "source": [ "%config SqlMagic.autopandas = True\n", "%config SqlMagic.feedback = False\n", "%config SqlMagic.displaycon = False" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:42.214533Z", "start_time": "2024-05-09T21:36:42.211784Z" } }, "id": "21d66e04be33e641" }, { "cell_type": "markdown", "source": [ "Connect to the duckdb database\n", "\n", "__NOTE__ in general you don't need to do this - we are just doing this here to show the internals." ], "metadata": { "collapsed": false }, "id": "26e3dcd952672b18" }, { "cell_type": "code", "execution_count": 38, "outputs": [], "source": [ "%sql duckdb:///tmp/eccode.db" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:42.304962Z", "start_time": "2024-05-09T21:36:42.221503Z" } }, "id": "7a747b441ebfe095" }, { "cell_type": "markdown", "source": [ "Query the `nodes` table (no index)" ], "metadata": { "collapsed": false }, "id": "884ec42c3c80e6e0" }, { "cell_type": "code", "execution_count": 39, "outputs": [ { "data": { "text/plain": " id \\\n0 http://purl.obolibrary.org/obo/RO_0002327 \n1 http://purl.obolibrary.org/obo/RO_0002351 \n2 http://purl.obolibrary.org/obo/eccode_1 \n3 http://purl.obolibrary.org/obo/eccode_1.1 \n4 http://purl.obolibrary.org/obo/eccode_1.1.1 \n.. ... \n195 http://purl.obolibrary.org/obo/eccode_1.1.1.284 \n196 http://purl.obolibrary.org/obo/eccode_1.1.1.285 \n197 http://purl.obolibrary.org/obo/eccode_1.1.1.286 \n198 http://purl.obolibrary.org/obo/eccode_1.1.1.287 \n199 http://purl.obolibrary.org/obo/eccode_1.1.1.288 \n\n lbl type meta \n0 enables PROPERTY NaN \n1 has member PROPERTY NaN \n2 Oxidoreductases CLASS NaN \n3 Acting on the CH-OH group of donors CLASS NaN \n4 With NAD(+) or NADP(+) as acceptor CLASS NaN \n.. ... ... ... \n195 S-(hydroxymethyl)glutathione dehydrogenase CLASS [synonyms] \n196 3''-deamino-3''-oxonicotianamine reductase CLASS NaN \n197 isocitrate--homoisocitrate dehydrogenase CLASS [synonyms] \n198 D-arabinitol dehydrogenase (NADP(+)) CLASS [synonyms] \n199 xanthoxin dehydrogenase CLASS [synonyms] \n\n[200 rows x 4 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idlbltypemeta
0http://purl.obolibrary.org/obo/RO_0002327enablesPROPERTYNaN
1http://purl.obolibrary.org/obo/RO_0002351has memberPROPERTYNaN
2http://purl.obolibrary.org/obo/eccode_1OxidoreductasesCLASSNaN
3http://purl.obolibrary.org/obo/eccode_1.1Acting on the CH-OH group of donorsCLASSNaN
4http://purl.obolibrary.org/obo/eccode_1.1.1With NAD(+) or NADP(+) as acceptorCLASSNaN
...............
195http://purl.obolibrary.org/obo/eccode_1.1.1.284S-(hydroxymethyl)glutathione dehydrogenaseCLASS[synonyms]
196http://purl.obolibrary.org/obo/eccode_1.1.1.2853''-deamino-3''-oxonicotianamine reductaseCLASSNaN
197http://purl.obolibrary.org/obo/eccode_1.1.1.286isocitrate--homoisocitrate dehydrogenaseCLASS[synonyms]
198http://purl.obolibrary.org/obo/eccode_1.1.1.287D-arabinitol dehydrogenase (NADP(+))CLASS[synonyms]
199http://purl.obolibrary.org/obo/eccode_1.1.1.288xanthoxin dehydrogenaseCLASS[synonyms]
\n

200 rows × 4 columns

\n
" }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT * FROM nodes;" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:42.489264Z", "start_time": "2024-05-09T21:36:42.306059Z" } }, "id": "262c07565b1cd356" }, { "cell_type": "markdown", "source": [ "Query the index. Behind the scenes, linkml-store will create a table to cache each index for each collection. These currently start with `internal__index__` and are followed by the type of the objects, followed by the name of the index." ], "metadata": { "collapsed": false }, "id": "4aacb2992bc49785" }, { "cell_type": "code", "execution_count": 40, "outputs": [ { "data": { "text/plain": " id \\\n0 http://purl.obolibrary.org/obo/RO_0002327 \n1 http://purl.obolibrary.org/obo/RO_0002351 \n2 http://purl.obolibrary.org/obo/eccode_1 \n3 http://purl.obolibrary.org/obo/eccode_1.1 \n4 http://purl.obolibrary.org/obo/eccode_1.1.1 \n.. ... \n195 http://purl.obolibrary.org/obo/eccode_1.1.1.284 \n196 http://purl.obolibrary.org/obo/eccode_1.1.1.285 \n197 http://purl.obolibrary.org/obo/eccode_1.1.1.286 \n198 http://purl.obolibrary.org/obo/eccode_1.1.1.287 \n199 http://purl.obolibrary.org/obo/eccode_1.1.1.288 \n\n lbl type meta \\\n0 enables PROPERTY NaN \n1 has member PROPERTY NaN \n2 Oxidoreductases CLASS NaN \n3 Acting on the CH-OH group of donors CLASS NaN \n4 With NAD(+) or NADP(+) as acceptor CLASS NaN \n.. ... ... ... \n195 S-(hydroxymethyl)glutathione dehydrogenase CLASS [synonyms] \n196 3''-deamino-3''-oxonicotianamine reductase CLASS NaN \n197 isocitrate--homoisocitrate dehydrogenase CLASS [synonyms] \n198 D-arabinitol dehydrogenase (NADP(+)) CLASS [synonyms] \n199 xanthoxin dehydrogenase CLASS [synonyms] \n\n __index__ \n0 [-0.021716245, -0.024930306, -0.015913868, -0.... \n1 [-0.03492431, -0.015462456, 0.002913293, -0.02... \n2 [-0.031664208, -0.026391044, 8.377296e-05, -0.... \n3 [-0.023240522, -0.019391688, -0.006624823, -0.... \n4 [0.00993415, -0.039508518, 0.023213472, -0.016... \n.. ... \n195 [-0.020920139, -0.0042932644, 0.0039249077, -0... \n196 [-0.02360331, -0.025488937, -0.010397324, -0.0... \n197 [-0.019758105, -0.016041763, 0.017833093, -0.0... \n198 [-0.012204822, -0.034410186, 0.007919805, -0.0... \n199 [-0.012760352, -0.016459865, 0.011843718, -0.0... \n\n[200 rows x 5 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idlbltypemeta__index__
0http://purl.obolibrary.org/obo/RO_0002327enablesPROPERTYNaN[-0.021716245, -0.024930306, -0.015913868, -0....
1http://purl.obolibrary.org/obo/RO_0002351has memberPROPERTYNaN[-0.03492431, -0.015462456, 0.002913293, -0.02...
2http://purl.obolibrary.org/obo/eccode_1OxidoreductasesCLASSNaN[-0.031664208, -0.026391044, 8.377296e-05, -0....
3http://purl.obolibrary.org/obo/eccode_1.1Acting on the CH-OH group of donorsCLASSNaN[-0.023240522, -0.019391688, -0.006624823, -0....
4http://purl.obolibrary.org/obo/eccode_1.1.1With NAD(+) or NADP(+) as acceptorCLASSNaN[0.00993415, -0.039508518, 0.023213472, -0.016...
..................
195http://purl.obolibrary.org/obo/eccode_1.1.1.284S-(hydroxymethyl)glutathione dehydrogenaseCLASS[synonyms][-0.020920139, -0.0042932644, 0.0039249077, -0...
196http://purl.obolibrary.org/obo/eccode_1.1.1.2853''-deamino-3''-oxonicotianamine reductaseCLASSNaN[-0.02360331, -0.025488937, -0.010397324, -0.0...
197http://purl.obolibrary.org/obo/eccode_1.1.1.286isocitrate--homoisocitrate dehydrogenaseCLASS[synonyms][-0.019758105, -0.016041763, 0.017833093, -0.0...
198http://purl.obolibrary.org/obo/eccode_1.1.1.287D-arabinitol dehydrogenase (NADP(+))CLASS[synonyms][-0.012204822, -0.034410186, 0.007919805, -0.0...
199http://purl.obolibrary.org/obo/eccode_1.1.1.288xanthoxin dehydrogenaseCLASS[synonyms][-0.012760352, -0.016459865, 0.011843718, -0.0...
\n

200 rows × 5 columns

\n
" }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT * FROM internal__index__Node__test;" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-05-09T21:36:42.638425Z", "start_time": "2024-05-09T21:36:42.486738Z" } }, "id": "c9424eca0ae2ebbd" }, { "cell_type": "markdown", "source": [ "We can see that the index duplicates the content of the main table, and adds an additional vector column with the embedding." ], "metadata": { "collapsed": false }, "id": "cccee7d6d1f46b9e" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false }, "id": "2a53092d82802b01" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }