{ "cells": [ { "cell_type": "markdown", "source": [ "# How to index GO-CAMs with LinkML-Store\n", "\n", "\n", "\n" ], "metadata": { "collapsed": false }, "id": "fc4794dd116ed21" }, { "cell_type": "code", "execution_count": 1, "outputs": [], "source": [ "import pandas as pd\n", "import yaml\n", "\n", "path = \"input/gocam-models.yaml\"" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:13.287137Z", "start_time": "2024-07-01T14:58:12.914420Z" } }, "id": "158d589d95a155e5" }, { "cell_type": "code", "execution_count": 2, "outputs": [], "source": [ "models = list(yaml.safe_load_all(open(path)))" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:24.625955Z", "start_time": "2024-07-01T14:58:13.264364Z" } }, "id": "ae009995ea5e535a" }, { "cell_type": "markdown", "source": [ "## Creating a client and attaching to a database\n", "\n", "First we will create a client as normal:" ], "metadata": { "collapsed": false }, "id": "493c7599d2f40c27" }, { "cell_type": "code", "execution_count": 3, "outputs": [], "source": [ "from linkml_store import Client\n", "\n", "client = Client()" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:25.741214Z", "start_time": "2024-07-01T14:58:24.626407Z" } }, "id": "initial_id" }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false }, "id": "9687f29e4d54f5fa" }, { "cell_type": "markdown", "source": [ "Next we'll attach to a MongoDB instance. this assumes you have one running already.\n", "\n", "We will make a database called \"GO-CAMs\" and recreate it if it already exists\n", "\n", "(note for people running this notebook locally - if you happen to have a database with this name in your current mongo instance it will be deleted!)" ], "metadata": { "collapsed": false }, "id": "470f1cb70bf3641b" }, { "cell_type": "code", "execution_count": 4, "outputs": [], "source": [ "db = client.attach_database(\"mongodb://localhost:27017/gocams\", \"gocams\", recreate_if_exists=True)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:25.746459Z", "start_time": "2024-07-01T14:58:25.741102Z" } }, "id": "cc164c0acbe4c39d" }, { "cell_type": "markdown", "source": [ "## Creating a collection\n", "\n", "We'll create a simple test collection. The concept of collection in linkml-store maps directly to mongodb collections" ], "metadata": { "collapsed": false }, "id": "334ea2ced79828f7" }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false }, "id": "a0a98c5a5c9f0072" }, { "cell_type": "code", "execution_count": 5, "outputs": [], "source": [ "collection = db.create_collection(\"main\", recreate_if_exists=True)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:25.774518Z", "start_time": "2024-07-01T14:58:25.745435Z" } }, "id": "c3a79013f9359a9" }, { "cell_type": "markdown", "source": [ "## Inserting objects into the store\n", "\n", "We'll use the standard `insert` method to insert the GO-CAMs into the collection. At this stage there is no explicit schema." ], "metadata": { "collapsed": false }, "id": "207f35ee61edc14d" }, { "cell_type": "code", "execution_count": 6, "outputs": [], "source": [ "collection.insert(models)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.308712Z", "start_time": "2024-07-01T14:58:25.794460Z" } }, "id": "4a09a78fe3c8dc33" }, { "cell_type": "markdown", "source": [ "## Check contents\n", "\n", "We can check the number of rows in the collection, to ensure everything was inserted correctly:" ], "metadata": { "collapsed": false }, "id": "47f933e901372da8" }, { "cell_type": "code", "execution_count": 7, "outputs": [ { "data": { "text/plain": "793" }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.find({}, limit=1).num_rows" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.317654Z", "start_time": "2024-07-01T14:58:26.312615Z" } }, "id": "f505fdc8cc20196e" }, { "cell_type": "code", "execution_count": 8, "outputs": [], "source": [ "assert collection.find({}, limit=1).num_rows == len(models)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.324094Z", "start_time": "2024-07-01T14:58:26.317394Z" } }, "id": "e6ae22c986b9ba5b" }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false }, "id": "adc134486070cf0d" }, { "cell_type": "code", "execution_count": 9, "outputs": [ { "data": { "text/plain": " id \\\n0 gomodel:568b0f9600000284 \n1 gomodel:5b528b1100000489 \n2 gomodel:5b91dbd100002057 \n\n title taxon \\\n0 Antibacterial innate immune response in the in... NCBITaxon:6239 \n1 XBP-1 is a cell-nonautonomous regulator of str... NCBITaxon:6239 \n2 Antifungal innate immune response in the hypod... NCBITaxon:6239 \n\n status comments \\\n0 production [Automated change 2023-03-16: RO:0002212 repla... \n1 production [Automated change 2023-03-16: RO:0002213 repla... \n2 production NaN \n\n activities \\\n0 [{'id': 'gomodel:568b0f9600000284/57ec3a7e0000... \n1 [{'id': 'gomodel:5b528b1100000489/5b528b110000... \n2 [{'id': 'gomodel:5b91dbd100002057/5b91dbd10000... \n\n objects \n0 [{'id': 'WB:WBGene00006599', 'label': 'tpa-1 C... \n1 [{'id': 'WB:WBGene00006959', 'label': 'xbp-1 C... \n2 [{'id': 'WB:WBGene00010700', 'label': 'nipi-3 ... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idtitletaxonstatuscommentsactivitiesobjects
0gomodel:568b0f9600000284Antibacterial innate immune response in the in...NCBITaxon:6239production[Automated change 2023-03-16: RO:0002212 repla...[{'id': 'gomodel:568b0f9600000284/57ec3a7e0000...[{'id': 'WB:WBGene00006599', 'label': 'tpa-1 C...
1gomodel:5b528b1100000489XBP-1 is a cell-nonautonomous regulator of str...NCBITaxon:6239production[Automated change 2023-03-16: RO:0002213 repla...[{'id': 'gomodel:5b528b1100000489/5b528b110000...[{'id': 'WB:WBGene00006959', 'label': 'xbp-1 C...
2gomodel:5b91dbd100002057Antifungal innate immune response in the hypod...NCBITaxon:6239productionNaN[{'id': 'gomodel:5b91dbd100002057/5b91dbd10000...[{'id': 'WB:WBGene00010700', 'label': 'nipi-3 ...
\n
" }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = collection.find({\"taxon\": \"NCBITaxon:6239\"}, limit=3)\n", "qr.rows_dataframe" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.377528Z", "start_time": "2024-07-01T14:58:26.325173Z" } }, "id": "2cc4ee9738cec27d" }, { "cell_type": "markdown", "source": [ "Let's check with pandas just to make sure it looks as expected; we'll query for a specific OMIM disease:" ], "metadata": { "collapsed": false }, "id": "90e2e9793375431f" }, { "cell_type": "code", "execution_count": 10, "outputs": [ { "data": { "text/plain": " id \\\n0 gomodel:568b0f9600000284 \n1 gomodel:5b91dbd100002057 \n\n title taxon \\\n0 Antibacterial innate immune response in the in... NCBITaxon:6239 \n1 Antifungal innate immune response in the hypod... NCBITaxon:6239 \n\n status comments \\\n0 production [Automated change 2023-03-16: RO:0002212 repla... \n1 production NaN \n\n activities \\\n0 [{'id': 'gomodel:568b0f9600000284/57ec3a7e0000... \n1 [{'id': 'gomodel:5b91dbd100002057/5b91dbd10000... \n\n objects \n0 [{'id': 'WB:WBGene00006599', 'label': 'tpa-1 C... \n1 [{'id': 'WB:WBGene00010700', 'label': 'nipi-3 ... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idtitletaxonstatuscommentsactivitiesobjects
0gomodel:568b0f9600000284Antibacterial innate immune response in the in...NCBITaxon:6239production[Automated change 2023-03-16: RO:0002212 repla...[{'id': 'gomodel:568b0f9600000284/57ec3a7e0000...[{'id': 'WB:WBGene00006599', 'label': 'tpa-1 C...
1gomodel:5b91dbd100002057Antifungal innate immune response in the hypod...NCBITaxon:6239productionNaN[{'id': 'gomodel:5b91dbd100002057/5b91dbd10000...[{'id': 'WB:WBGene00010700', 'label': 'nipi-3 ...
\n
" }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = collection.find({\"activities.enabled_by\": \"WB:WBGene00006575\"}, limit=3)\n", "qr.rows_dataframe" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.385318Z", "start_time": "2024-07-01T14:58:26.356377Z" } }, "id": "e763fe6cd50022e2" }, { "cell_type": "markdown", "source": [ "As expected, there are three rows with the OMIM disease 618499." ], "metadata": { "collapsed": false }, "id": "4a266efbcb405673" }, { "cell_type": "markdown", "source": [ "## Query faceting\n", "\n", "We will now demonstrate faceted queries, allowing us to count the number of instances of different categorical values or categorical value combinations.\n", "\n", "First we'll facet on the subject sex. We can use path notation, e.g. `subject.sex` here:" ], "metadata": { "collapsed": false }, "id": "d4749758585df35c" }, { "cell_type": "code", "execution_count": 11, "outputs": [ { "data": { "text/plain": "{'taxon': [('NCBITaxon:9606', 541),\n ('NCBITaxon:10090', 185),\n ('NCBITaxon:4896', 15),\n ('NCBITaxon:7955', 14),\n ('NCBITaxon:7227', 13),\n ('NCBITaxon:559292', 6),\n ('NCBITaxon:9823', 4),\n ('NCBITaxon:6239', 4),\n ('NCBITaxon:5074', 1),\n ('NCBITaxon:1735992', 1),\n ('NCBITaxon:229533', 1),\n ('NCBITaxon:1403190', 1),\n ('NCBITaxon:8355', 1),\n ('NCBITaxon:425011', 1),\n ('NCBITaxon:28576', 1),\n ('NCBITaxon:602072', 1),\n ('NCBITaxon:8364', 1),\n ('NCBITaxon:227321', 1),\n ('NCBITaxon:99287', 1)]}" }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.query_facets({}, facet_columns=[\"taxon\"])" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.464344Z", "start_time": "2024-07-01T14:58:26.386316Z" } }, "id": "9b7f01f14d36958b" }, { "cell_type": "code", "execution_count": 12, "outputs": [ { "data": { "text/plain": "{('taxon',\n 'status'): [({'taxon': 'NCBITaxon:9606', 'status': 'production'},\n 541), ({'taxon': 'NCBITaxon:10090',\n 'status': 'production'}, 185), ({'taxon': 'NCBITaxon:4896', 'status': 'production'},\n 15), ({'taxon': 'NCBITaxon:7955', 'status': 'production'},\n 14), ({'taxon': 'NCBITaxon:7227',\n 'status': 'production'}, 13), ({'taxon': 'NCBITaxon:559292', 'status': 'production'},\n 6), ({'taxon': 'NCBITaxon:6239', 'status': 'production'},\n 4), ({'taxon': 'NCBITaxon:9823',\n 'status': 'production'}, 4), ({'taxon': 'NCBITaxon:227321', 'status': 'production'},\n 1), ({'taxon': 'NCBITaxon:8364', 'status': 'production'},\n 1), ({'taxon': 'NCBITaxon:1735992',\n 'status': 'production'}, 1), ({'taxon': 'NCBITaxon:8355', 'status': 'production'},\n 1), ({'taxon': 'NCBITaxon:602072', 'status': 'production'},\n 1), ({'taxon': 'NCBITaxon:229533',\n 'status': 'production'}, 1), ({'taxon': 'NCBITaxon:1403190', 'status': 'production'},\n 1), ({'taxon': 'NCBITaxon:425011', 'status': 'production'},\n 1), ({'taxon': 'NCBITaxon:5074',\n 'status': 'production'}, 1), ({'taxon': 'NCBITaxon:28576', 'status': 'production'},\n 1), ({'taxon': 'NCBITaxon:99287', 'status': 'production'}, 1)]}" }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.query_facets({}, facet_columns=[(\"taxon\", \"status\")])" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.484073Z", "start_time": "2024-07-01T14:58:26.399497Z" } }, "id": "7c6a6718d9f96194" }, { "cell_type": "markdown", "source": [ "We can also facet by the disease name/label. We'll restrict this to the top 20" ], "metadata": { "collapsed": false }, "id": "ea6e13f82ec50e62" }, { "cell_type": "code", "execution_count": 13, "outputs": [ { "data": { "text/plain": "{'activities.molecular_function.term': [(['GO:0004930',\n 'GO:0019706',\n 'GO:0004674'],\n 2),\n (['GO:0048018', 'GO:0038023', 'GO:0004252', 'GO:0004252'], 2),\n (['GO:0008083', 'GO:0016167', 'GO:0004714'], 2),\n (['GO:0004714', 'GO:0004713', 'GO:0048018'], 2),\n (['GO:0005179', 'GO:0016500', 'GO:0005125'], 2),\n (['GO:0061630', 'GO:0004879', 'GO:0030374'], 2),\n (['GO:0140311', 'GO:0003700', 'GO:0140311', 'GO:1990756'], 1),\n (['GO:0004674', 'GO:0061665', 'GO:1990931', 'GO:0070139', 'GO:0004674'], 1),\n (['GO:0060090',\n 'GO:0004674',\n 'GO:0060090',\n 'GO:0060090',\n 'GO:0061630',\n 'GO:0060090',\n 'GO:0061630'],\n 1),\n (['GO:0061630', 'GO:0170011', 'GO:0061630', 'GO:0061630'], 1),\n (['GO:0140463', 'GO:0140463', 'GO:0004674', 'GO:0004674', 'GO:0003887'], 1),\n (['GO:0038023', 'GO:0060090', 'GO:0060090', 'GO:0140311'], 1),\n (['GO:0003846', 'GO:0004144', 'GO:0005488'], 1),\n (['GO:0019706', 'GO:0140693', 'GO:0003690'], 1),\n (['GO:0004810', 'GO:0004521', 'GO:0004549', 'GO:1904678'], 1),\n (['GO:0003700', 'GO:0004715', 'GO:0005125', 'GO:0004900'], 1),\n (['GO:0004888', 'GO:0030674', 'GO:0004713', 'GO:0048018'], 1),\n (['GO:0003674',\n 'GO:0003674',\n 'GO:0000511',\n 'GO:0003674',\n 'GO:0140683',\n 'GO:0140463',\n 'GO:0003674',\n 'GO:0003674',\n 'GO:0030674',\n 'GO:0008428',\n 'GO:0003674',\n 'GO:0004525',\n 'GO:0003674',\n 'GO:0003968',\n 'GO:0032129',\n 'GO:0031078',\n 'GO:0000511',\n 'GO:0140566',\n 'GO:0031078',\n 'GO:0003724',\n 'GO:0016891',\n 'GO:0140750',\n 'GO:0046974',\n 'GO:0140683',\n 'GO:0042393',\n 'GO:0003674',\n 'GO:0005515',\n 'GO:0060090',\n 'GO:0140566'],\n 1),\n (['GO:0004714',\n 'GO:0005125',\n 'GO:0008201',\n 'GO:0005125',\n 'GO:0005125',\n 'GO:0004714'],\n 1),\n (['GO:0061891', 'GO:0061891', 'GO:0005262', 'GO:0030674'], 1)]}" }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.query_facets({}, facet_columns=[\"activities.molecular_function.term\"], facet_limit=20)\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.541133Z", "start_time": "2024-07-01T14:58:26.406831Z" } }, "id": "27857349279abc41" }, { "cell_type": "markdown", "source": [ "## Semantic Search\n", "\n", "We will index GO-CAMs using a template that extracts key elements\n", "\n", "First we will create a textualization template for a GO-CAM. We will keep it minimal for simplicity - this doesn't include treatments, families, etc." ], "metadata": { "collapsed": false }, "id": "648f05e75f250221" }, { "cell_type": "code", "execution_count": 14, "outputs": [], "source": [ "template = \"\"\"\n", "id: {{id}}\n", "title: {{title}}\n", "taxon: {{taxon}}\n", "status: {{status}}\n", "objects: {% for o in objects %} {{o.id}} \"{{o.label}}\"; {% endfor %}\n", "\"\"\"" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.542585Z", "start_time": "2024-07-01T14:58:26.486193Z" } }, "id": "976095541027ce9e" }, { "cell_type": "markdown", "source": [ "Next we will create an indexer using the template. This will use the Jinja2 syntax for templating.\n", "We will also cache LLM embedding queries, so if we want to incrementally add new GO-CAMs we can avoid re-running the LLM embeddings calls." ], "metadata": { "collapsed": false }, "id": "76a71f8590bd5602" }, { "cell_type": "code", "execution_count": 15, "outputs": [], "source": [ "from linkml_store.index.implementations.llm_indexer import LLMIndexer\n", "\n", "index = LLMIndexer(\n", " name=\"gocam\", \n", " cached_embeddings_database=\"tmp/llm_gocam_cache.db\",\n", " text_template=template,\n", " text_template_syntax=\"jinja2\",\n", ")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.542687Z", "start_time": "2024-07-01T14:58:26.489005Z" } }, "id": "e98f9d6eb4a5e385" }, { "cell_type": "markdown", "source": [ "We can test the template on the first row of the collection:" ], "metadata": { "collapsed": false }, "id": "e6c28d4d95b920ba" }, { "cell_type": "code", "execution_count": 16, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "id: gomodel:568b0f9600000284\n", "title: Antibacterial innate immune response in the intestine via MAPK cascade (C. elegans)\n", "taxon: NCBITaxon:6239\n", "status: production\n", "objects: WB:WBGene00006599 \"tpa-1 Cele\"; GO:0004674 \"protein serine/threonine kinase activity\"; GO:0002225 \"positive regulation of antimicrobial peptide production\"; ECO:0000501 \"evidence used in automatic assertion\"; ECO:0000315 \"mutant phenotype evidence used in manual assertion\"; ECO:0000318 \"biological aspect of ancestor evidence used in manual assertion\"; WB:WBGene00006923 \"vhp-1 Cele\"; GO:0017017 \"MAP kinase tyrosine/serine/threonine phosphatase activity\"; GO:1900425 \"negative regulation of defense response to bacterium\"; ECO:0000314 \"direct assay evidence used in manual assertion\"; ECO:0000316 \"genetic interaction evidence used in manual assertion\"; WB:WBGene00002187 \"kgb-1 Cele\"; GO:0005515 \"protein binding\"; GO:1900181 \"negative regulation of protein localization to nucleus\"; ECO:0000353 \"physical interaction evidence used in manual assertion\"; ECO:0000250 \"sequence similarity evidence used in manual assertion\"; ECO:0000307 \"no evidence data found used in manual assertion\"; GO:0005737 \"cytoplasm\"; GO:0005829 \"cytosol\"; GO:0009898 \"cytoplasmic side of plasma membrane\"; WB:WBGene00000223 \"atf-7 Cele\"; WB:WBGene00004055 \"pmk-1 Cele\"; WB:WBGene00004758 \"sek-1 Cele\"; GO:0004707 \"MAP kinase activity\"; GO:0000981 \"DNA-binding transcription factor activity, RNA polymerase II-specific\"; GO:0005634 \"nucleus\"; GO:0016045 \"detection of bacterium\"; GO:0003674 \"molecular_function\"; GO:0035591 \"signaling adaptor activity\"; WB:WBGene00006575 \"tir-1 Cele\"; GO:0004709 \"MAP kinase kinase kinase activity\"; WB:WBGene00003822 \"nsy-1 Cele\"; GO:0004708 \"MAP kinase kinase activity\"; WB:WBGene00012019 \"dkf-2 Cele\"; GO:0140367 \"antibacterial innate immune response\"; WB:WBGene00011979 \"sysm-1 Cele\"; ECO:0000270 \"expression pattern evidence used in manual assertion\"; GO:0110165 \"cellular anatomical entity\"; \n" ] } ], "source": [ "print(index.object_to_text(qr.rows[0]))" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:26.543012Z", "start_time": "2024-07-01T14:58:26.493491Z" } }, "id": "16dce837e31c88f6" }, { "cell_type": "markdown", "source": [ "That looks as expected. We can now attach the indexer to the collection and index the collection:" ], "metadata": { "collapsed": false }, "id": "4fbd1fc091c4c7b" }, { "cell_type": "code", "execution_count": 17, "outputs": [], "source": [ "collection.attach_indexer(index, auto_index=True)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:51.118555Z", "start_time": "2024-07-01T14:58:26.496515Z" } }, "id": "18a0bd86de7f1d81" }, { "cell_type": "markdown", "source": [ "## Semantic Search\n", "\n", "Let's query based on text criteria:" ], "metadata": { "collapsed": false }, "id": "f49056b209918a9" }, { "cell_type": "code", "execution_count": 18, "outputs": [ { "data": { "text/plain": " score id \\\n0 0.821816 gomodel:64e7eefa00001233 \n1 0.814937 gomodel:62b4ffe300000240 \n2 0.810594 gomodel:62b4ffe300000335 \n3 0.809383 gomodel:663d668500001246 \n4 0.805333 gomodel:62b4ffe300001804 \n\n title taxon \\\n0 Extrinsic apoptotic signaling pathway via deat... NCBITaxon:10090 \n1 Perforin maturation leading to granzyme-mediat... NCBITaxon:10090 \n2 Perforin maturation leading to granzyme-mediat... NCBITaxon:9606 \n3 Pyroptotic cell death mediated by GSDMD and NI... NCBITaxon:9606 \n4 Cleavage and inactivation of PARP1 by CASP3 an... NCBITaxon:9606 \n\n status activities \\\n0 production [{'id': 'gomodel:64e7eefa00001233/64e7eefa0000... \n1 production [{'id': 'gomodel:62b4ffe300000240/62b4ffe30000... \n2 production [{'id': 'gomodel:62b4ffe300000335/62b4ffe30000... \n3 production [{'id': 'gomodel:663d668500001246/663d66850000... \n4 production [{'id': 'gomodel:62b4ffe300001804/62b4ffe30000... \n\n objects \\\n0 [{'id': 'GO:0035591', 'label': 'signaling adap... \n1 [{'id': 'GO:0005509', 'label': 'calcium ion bi... \n2 [{'id': 'GO:0140375', 'label': 'immune recepto... \n3 [{'id': 'GO:0004197', 'label': 'cysteine-type ... \n4 [{'id': 'GO:0097200', 'label': 'cysteine-type ... \n\n comments \n0 NaN \n1 [Automated change 2022-09-22: GO:0005887 repla... \n2 [Automated change 2022-09-22: GO:0005887 repla... \n3 NaN \n4 [Automated change 2023-03-16: RO:0002212 repla... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
scoreidtitletaxonstatusactivitiesobjectscomments
00.821816gomodel:64e7eefa00001233Extrinsic apoptotic signaling pathway via deat...NCBITaxon:10090production[{'id': 'gomodel:64e7eefa00001233/64e7eefa0000...[{'id': 'GO:0035591', 'label': 'signaling adap...NaN
10.814937gomodel:62b4ffe300000240Perforin maturation leading to granzyme-mediat...NCBITaxon:10090production[{'id': 'gomodel:62b4ffe300000240/62b4ffe30000...[{'id': 'GO:0005509', 'label': 'calcium ion bi...[Automated change 2022-09-22: GO:0005887 repla...
20.810594gomodel:62b4ffe300000335Perforin maturation leading to granzyme-mediat...NCBITaxon:9606production[{'id': 'gomodel:62b4ffe300000335/62b4ffe30000...[{'id': 'GO:0140375', 'label': 'immune recepto...[Automated change 2022-09-22: GO:0005887 repla...
30.809383gomodel:663d668500001246Pyroptotic cell death mediated by GSDMD and NI...NCBITaxon:9606production[{'id': 'gomodel:663d668500001246/663d66850000...[{'id': 'GO:0004197', 'label': 'cysteine-type ...NaN
40.805333gomodel:62b4ffe300001804Cleavage and inactivation of PARP1 by CASP3 an...NCBITaxon:9606production[{'id': 'gomodel:62b4ffe300001804/62b4ffe30000...[{'id': 'GO:0097200', 'label': 'cysteine-type ...[Automated change 2023-03-16: RO:0002212 repla...
\n
" }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = collection.search(\"pathways involving cell death\")\n", "qr.rows_dataframe[0:5]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:51.705608Z", "start_time": "2024-07-01T14:58:51.120092Z" } }, "id": "1ddd4ac75719342d" }, { "cell_type": "markdown", "source": [ "Let's check the first one" ], "metadata": { "collapsed": false }, "id": "b54c088d3d69f8a3" }, { "cell_type": "code", "execution_count": 19, "outputs": [ { "data": { "text/plain": "(0.8218155512474502,\n {'id': 'gomodel:64e7eefa00001233',\n 'title': 'Extrinsic apoptotic signaling pathway via death domain receptors 1(Mouse)',\n 'taxon': 'NCBITaxon:10090',\n 'status': 'production',\n 'activities': [{'id': 'gomodel:64e7eefa00001233/64e7eefa00001250',\n 'enabled_by': 'MGI:MGI:109200',\n 'molecular_function': {'evidence': [{'term': 'ECO:0000266',\n 'reference': 'PMID:8565075',\n 'with_objects': ['UniProtKB:Q15628'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'provenances': [],\n 'term': 'GO:0035591'},\n 'occurs_in': {'evidence': [], 'term': 'GO:0005829'},\n 'part_of': {'evidence': [{'term': 'ECO:0000266',\n 'reference': 'PMID:8565075',\n 'with_objects': ['UniProtKB:Q15628'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'term': 'GO:1900119'},\n 'causal_associations': [{'evidence': [],\n 'predicate': 'RO:0002413',\n 'downstream_activity': 'gomodel:64e7eefa00001233/64e7eefa00001506'},\n {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:8565075',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'predicate': 'RO:0002413',\n 'downstream_activity': 'gomodel:64e7eefa00001233/64e7eefa00001241'}]},\n {'id': 'gomodel:64e7eefa00001233/64e7eefa00001249',\n 'enabled_by': 'MGI:MGI:1261423',\n 'molecular_function': {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:9837723',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'provenances': [],\n 'term': 'GO:0008656'},\n 'occurs_in': {'evidence': [], 'term': 'GO:0005829'},\n 'part_of': {'evidence': [{'term': 'ECO:0000315',\n 'reference': 'PMID:9729047',\n 'with_objects': ['MGI:MGI:2159369'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]},\n {'term': 'ECO:0000314',\n 'reference': 'PMID:9837723',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'term': 'GO:1900119'}},\n {'id': 'gomodel:64e7eefa00001233/64e7eefa00001505',\n 'enabled_by': 'MGI:MGI:104798',\n 'molecular_function': {'evidence': [],\n 'provenances': [],\n 'term': 'GO:0048018'},\n 'occurs_in': {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:8409382',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-13'}]}],\n 'term': 'GO:0005886'},\n 'part_of': {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:8409382',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-13'}]}],\n 'term': 'GO:1900119'},\n 'causal_associations': [{'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:1647956',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'predicate': 'RO:0002629',\n 'downstream_activity': 'gomodel:64e7eefa00001233/64e7eefa00001503'}]},\n {'id': 'gomodel:64e7eefa00001233/64e7eefa00001241',\n 'enabled_by': 'MGI:MGI:109324',\n 'molecular_function': {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:8565075',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'provenances': [],\n 'term': 'GO:0035591'},\n 'occurs_in': {'evidence': [], 'term': 'GO:0005829'},\n 'part_of': {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:8565075',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]},\n {'term': 'ECO:0000315',\n 'reference': 'PMID:12887920',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'term': 'GO:1900119'},\n 'causal_associations': [{'evidence': [],\n 'predicate': 'RO:0002629',\n 'downstream_activity': 'gomodel:64e7eefa00001233/64e7eefa00001249'}]},\n {'id': 'gomodel:64e7eefa00001233/64e7eefa00001503',\n 'enabled_by': 'MGI:MGI:1314884',\n 'molecular_function': {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:1647956',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'provenances': [],\n 'term': 'GO:0005031'},\n 'occurs_in': {'evidence': [{'term': 'ECO:0000314',\n 'reference': 'PMID:1647956',\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'term': 'GO:0005886'},\n 'part_of': {'evidence': [{'term': 'ECO:0000316',\n 'reference': 'PMID:10702415',\n 'with_objects': ['MGI:MGI:103290'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]},\n {'term': 'ECO:0000315',\n 'reference': 'PMID:10678933',\n 'with_objects': ['MGI:MGI:1857468'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'term': 'GO:1900119'},\n 'causal_associations': [{'evidence': [{'term': 'ECO:0000266',\n 'reference': 'PMID:8565075',\n 'with_objects': ['UniProtKB:Q15628'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'predicate': 'RO:0002413',\n 'downstream_activity': 'gomodel:64e7eefa00001233/64e7eefa00001250'}]},\n {'id': 'gomodel:64e7eefa00001233/64e7eefa00001506',\n 'enabled_by': 'MGI:MGI:108212',\n 'molecular_function': {'evidence': [{'term': 'ECO:0000315',\n 'reference': 'PMID:25015821',\n 'with_objects': ['MGI:MGI:6272324'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'provenances': [],\n 'term': 'GO:0004672'},\n 'occurs_in': {'evidence': [], 'term': 'GO:0005829'},\n 'part_of': {'evidence': [{'term': 'ECO:0000315',\n 'reference': 'PMID:26555174',\n 'with_objects': ['MGI:MGI:6272324'],\n 'provenances': [{'contributor': 'https://orcid.org/0000-0001-7476-6306',\n 'date': '2023-09-14'}]}],\n 'term': 'GO:1900119'},\n 'causal_associations': [{'evidence': [],\n 'predicate': 'RO:0002413',\n 'downstream_activity': 'gomodel:64e7eefa00001233/64e7eefa00001241'}]}],\n 'objects': [{'id': 'GO:0035591', 'label': 'signaling adaptor activity'},\n {'id': 'GO:0008656',\n 'label': 'cysteine-type endopeptidase activator activity involved in apoptotic process'},\n {'id': 'GO:0005829', 'label': 'cytosol'},\n {'id': 'GO:1900119',\n 'label': 'positive regulation of execution phase of apoptosis'},\n {'id': 'MGI:MGI:109324', 'label': 'Fadd Mmus'},\n {'id': 'MGI:MGI:1261423', 'label': 'Casp8 Mmus'},\n {'id': 'MGI:MGI:109200', 'label': 'Tradd Mmus'},\n {'id': 'MGI:MGI:1314884', 'label': 'Tnfrsf1a Mmus'},\n {'id': 'GO:0005031', 'label': 'tumor necrosis factor receptor activity'},\n {'id': 'MGI:MGI:104798', 'label': 'Tnf Mmus'},\n {'id': 'GO:0048018', 'label': 'receptor ligand activity'},\n {'id': 'GO:0004672', 'label': 'protein kinase activity'},\n {'id': 'MGI:MGI:108212', 'label': 'Ripk1 Mmus'},\n {'id': 'GO:0005886', 'label': 'plasma membrane'},\n {'id': 'ECO:0000314',\n 'label': 'direct assay evidence used in manual assertion'},\n {'id': 'ECO:0000316',\n 'label': 'genetic interaction evidence used in manual assertion'},\n {'id': 'ECO:0000315',\n 'label': 'mutant phenotype evidence used in manual assertion'},\n {'id': 'ECO:0000266',\n 'label': 'sequence orthology evidence used in manual assertion'},\n {'id': 'GO:0008625',\n 'label': 'extrinsic apoptotic signaling pathway via death domain receptors'},\n {'id': 'GO:0097194', 'label': 'execution phase of apoptosis'},\n {'id': 'ECO:0000304',\n 'label': 'author statement supported by traceable reference used in manual assertion'}]})" }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr.ranked_rows[0]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:51.712103Z", "start_time": "2024-07-01T14:58:51.709563Z" } }, "id": "5a4fd8fe217fdf6b" }, { "cell_type": "markdown", "source": [ "We can combine semantic search with queries:" ], "metadata": { "collapsed": false }, "id": "4f38cf9889a15086" }, { "cell_type": "code", "execution_count": 20, "outputs": [ { "data": { "text/plain": " score id \\\n0 0.829536 gomodel:64e7eefa00001233 \n1 0.822032 gomodel:62b4ffe300000240 \n2 0.805320 gomodel:645d887900001077 \n3 0.801173 gomodel:5ce58dde00001215 \n4 0.795273 gomodel:6516135700000211 \n\n title taxon \\\n0 Extrinsic apoptotic signaling pathway via deat... NCBITaxon:10090 \n1 Perforin maturation leading to granzyme-mediat... NCBITaxon:10090 \n2 Cell type specific, p53-independent mitotic G2... NCBITaxon:10090 \n3 Mouse-Aatf-antiapoptosis NCBITaxon:10090 \n4 Tumor necrosis factor-mediated signaling pathw... NCBITaxon:10090 \n\n status activities \\\n0 production [{'id': 'gomodel:64e7eefa00001233/64e7eefa0000... \n1 production [{'id': 'gomodel:62b4ffe300000240/62b4ffe30000... \n2 production [{'id': 'gomodel:645d887900001077/645d88790000... \n3 production [{'id': 'gomodel:5ce58dde00001215/5ce58dde0000... \n4 production [{'id': 'gomodel:6516135700000211/651613570000... \n\n objects \\\n0 [{'id': 'GO:0035591', 'label': 'signaling adap... \n1 [{'id': 'GO:0005509', 'label': 'calcium ion bi... \n2 [{'id': 'GO:0004674', 'label': 'protein serine... \n3 [{'id': 'MGI:MGI:87986', 'label': 'Akt1 Mmus'}... \n4 [{'id': 'GO:0005125', 'label': 'cytokine activ... \n\n comments \n0 NaN \n1 [Automated change 2022-09-22: GO:0005887 repla... \n2 NaN \n3 [Automated change 2023-03-16: RO:0002213 repla... \n4 NaN ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
scoreidtitletaxonstatusactivitiesobjectscomments
00.829536gomodel:64e7eefa00001233Extrinsic apoptotic signaling pathway via deat...NCBITaxon:10090production[{'id': 'gomodel:64e7eefa00001233/64e7eefa0000...[{'id': 'GO:0035591', 'label': 'signaling adap...NaN
10.822032gomodel:62b4ffe300000240Perforin maturation leading to granzyme-mediat...NCBITaxon:10090production[{'id': 'gomodel:62b4ffe300000240/62b4ffe30000...[{'id': 'GO:0005509', 'label': 'calcium ion bi...[Automated change 2022-09-22: GO:0005887 repla...
20.805320gomodel:645d887900001077Cell type specific, p53-independent mitotic G2...NCBITaxon:10090production[{'id': 'gomodel:645d887900001077/645d88790000...[{'id': 'GO:0004674', 'label': 'protein serine...NaN
30.801173gomodel:5ce58dde00001215Mouse-Aatf-antiapoptosisNCBITaxon:10090production[{'id': 'gomodel:5ce58dde00001215/5ce58dde0000...[{'id': 'MGI:MGI:87986', 'label': 'Akt1 Mmus'}...[Automated change 2023-03-16: RO:0002213 repla...
40.795273gomodel:6516135700000211Tumor necrosis factor-mediated signaling pathw...NCBITaxon:10090production[{'id': 'gomodel:6516135700000211/651613570000...[{'id': 'GO:0005125', 'label': 'cytokine activ...NaN
\n
" }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = collection.search(\"cell death pathways\", where={\"taxon\": \"NCBITaxon:10090\"})\n", "qr.rows_dataframe[0:5]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:52.142597Z", "start_time": "2024-07-01T14:58:51.712661Z" } }, "id": "8a218f8f7688a2d3" }, { "cell_type": "markdown", "source": [ "## Validation\n", "\n", "Next we will demonstrate validation over a whole collection.\n", "\n", "Currently validating depends on a LinkML schema - we have previously copied this schema into the test folder.\n", "We will load the schema into the database object:" ], "metadata": { "collapsed": false }, "id": "41a14e7976a923b3" }, { "cell_type": "code", "execution_count": 21, "outputs": [], "source": [ "db.load_schema_view(\"input/gocam-models-schema.yaml\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:52.209803Z", "start_time": "2024-07-01T14:58:52.144816Z" } }, "id": "5294ee7927a372f1" }, { "cell_type": "markdown", "source": [ "Quick sanity check to ensure that worked:" ], "metadata": { "collapsed": false }, "id": "292d662d92bdfdb4" }, { "cell_type": "code", "execution_count": 22, "outputs": [ { "data": { "text/plain": "['Model',\n 'Activity',\n 'EvidenceItem',\n 'Association',\n 'CausalAssociation',\n 'TermAssociation',\n 'MolecularFunctionAssociation',\n 'BiologicalProcessAssociation',\n 'CellularAnatomicalEntityAssociation',\n 'MoleculeAssociation']" }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(db.schema_view.all_classes())[0:10]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:52.214483Z", "start_time": "2024-07-01T14:58:52.204727Z" } }, "id": "c211d3ce33b05fd5" }, { "cell_type": "code", "execution_count": 23, "outputs": [], "source": [ "collection.metadata.type = \"Model\"" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:52.214803Z", "start_time": "2024-07-01T14:58:52.206848Z" } }, "id": "7109f8da1228fe6a" }, { "cell_type": "code", "execution_count": 24, "outputs": [], "source": [ "from linkml_runtime.dumpers import yaml_dumper\n", "for r in db.iter_validate_database():\n", " # known issue - https://github.com/monarch-initiative/GO-CAM-store/issues/97\n", " if \"is not of type 'integer'\" in r.message:\n", " continue\n", " print(r.message[0:100])\n", " print(r)\n", " raise ValueError(\"Unexpected validation error\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T14:58:54.417557Z", "start_time": "2024-07-01T14:58:52.210053Z" } }, "id": "bce050193361ecf2" }, { "cell_type": "markdown", "source": [ "## Command Line Usage\n", "\n", "We can also use the command line for all of the above operations.\n", "\n", "For example, feceted queries:" ], "metadata": { "collapsed": false }, "id": "8ff5109280b990e0" }, { "cell_type": "code", "execution_count": 26, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\r\n", " \"taxon\": {\r\n", " \"NCBITaxon:9606\": 541,\r\n", " \"NCBITaxon:10090\": 185,\r\n", " \"NCBITaxon:4896\": 15,\r\n", " \"NCBITaxon:7955\": 14,\r\n", " \"NCBITaxon:7227\": 13,\r\n", " \"NCBITaxon:559292\": 6,\r\n", " \"NCBITaxon:6239\": 4,\r\n", " \"NCBITaxon:9823\": 4,\r\n", " \"NCBITaxon:227321\": 1,\r\n", " \"NCBITaxon:8355\": 1,\r\n", " \"NCBITaxon:1403190\": 1,\r\n", " \"NCBITaxon:1735992\": 1,\r\n", " \"NCBITaxon:229533\": 1,\r\n", " \"NCBITaxon:5074\": 1,\r\n", " \"NCBITaxon:602072\": 1,\r\n", " \"NCBITaxon:425011\": 1,\r\n", " \"NCBITaxon:28576\": 1,\r\n", " \"NCBITaxon:99287\": 1,\r\n", " \"NCBITaxon:8364\": 1\r\n", " }\r\n", "}\r\n" ] } ], "source": [ "!linkml-store -d mongodb://localhost:27017/gocams -c main fq -S taxon" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T22:39:35.169204Z", "start_time": "2024-07-01T22:39:31.690901Z" } }, "id": "92208567bec477fb" }, { "cell_type": "code", "execution_count": 29, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "activities.enabled_by:\r\n", " MGI:MGI:109482: 91\r\n", " MGI:MGI:98973: 55\r\n", " UniProtKB:P42345: 53\r\n", " UniProtKB:Q8N884: 43\r\n", " UniProtKB:P57764: 37\r\n", " UniProtKB:Q9UHD2: 31\r\n", " MGI:MGI:109349: 30\r\n", " UniProtKB:Q9HB90: 25\r\n", " UniProtKB:Q13315: 23\r\n", " UniProtKB:Q13501: 23\r\n", " MGI:MGI:95294: 23\r\n", " UniProtKB:P29466: 22\r\n", " UniProtKB:Q7L523: 22\r\n", " UniProtKB:Q86WV6: 22\r\n", " UniProtKB:P09874: 21\r\n", " UniProtKB:P62877: 21\r\n", " UniProtKB:Q15382: 19\r\n", " UniProtKB:Q7Z434: 17\r\n", " UniProtKB:O43318: 17\r\n", " UniProtKB:Q9Y4K3: 16\r\n", " UniProtKB:P62753: 16\r\n", " MGI:MGI:2686159: 16\r\n", " UniProtKB:P23443: 16\r\n", " UniProtKB:Q9UBS0: 15\r\n", " UniProtKB:P23458: 15\r\n", " UniProtKB:Q96P20: 15\r\n", " UniProtKB:P49662: 14\r\n", " UniProtKB:Q13541: 14\r\n", " UniProtKB:P31749: 14\r\n", " UniProtKB:P49959: 14\r\n", " UniProtKB:Q04206: 14\r\n", " UniProtKB:Q14653: 14\r\n", " MGI:MGI:1916396: 14\r\n", " MGI:MGI:3647519: 13\r\n", " MGI:MGI:98907: 13\r\n", " UniProtKB:Q9C000: 13\r\n", " MGI:MGI:1916142: 13\r\n", " UniProtKB:Q92993: 13\r\n", " MGI:MGI:97365: 13\r\n", " UniProtKB:Q9UBF6: 12\r\n", " UniProtKB:O60934: 12\r\n", " MGI:MGI:95410: 12\r\n", " UniProtKB:P25963: 12\r\n", " UniProtKB:P40763: 12\r\n", " UniProtKB:Q8N6T7: 12\r\n", " MGI:MGI:95654: 12\r\n", " UniProtKB:O75385: 12\r\n", " UniProtKB:P68400: 11\r\n", " UniProtKB:O43914: 11\r\n", " UniProtKB:P43405: 10\r\n", " MGI:MGI:95797: 10\r\n", " MGI:MGI:1924809: 10\r\n", " UniProtKB:Q8N122: 10\r\n", " UniProtKB:Q93034: 10\r\n", " UniProtKB:P42229: 9\r\n", " UniProtKB:O60674: 9\r\n", " UniProtKB:O94874: 9\r\n", " UniProtKB:Q99836: 9\r\n", " UniProtKB:P42224: 9\r\n", " UniProtKB:Q13131: 9\r\n", " UniProtKB:O00206: 9\r\n", " UniProtKB:P55210: 8\r\n", " UniProtKB:P05198: 8\r\n", " UniProtKB:Q7NWF2: 8\r\n", " UniProtKB:Q9NZC2: 8\r\n", " UniProtKB:O14920: 8\r\n", " UniProtKB:Q6IAA8: 8\r\n", " MGI:MGI:107700: 8\r\n", " UniProtKB:Q14790: 8\r\n", " MGI:MGI:95805: 8\r\n", " UniProtKB:P06493: 8\r\n", " MGI:MGI:96433: 8\r\n", " UniProtKB:P42574: 8\r\n", " UniProtKB:P0DP23: 8\r\n", " UniProtKB:Q13188: 8\r\n", " MGI:MGI:1351352: 8\r\n", " UniProtKB:Q13535: 7\r\n", " UniProtKB:Q5T4S7: 7\r\n", " UniProtKB:Q13616: 7\r\n", " UniProtKB:Q9H4B6: 7\r\n", " MGI:MGI:87916: 7\r\n", " UniProtKB:P59044: 7\r\n", " UniProtKB:A1A4Y4: 7\r\n", " UniProtKB:P50542: 7\r\n", " UniProtKB:P19838: 7\r\n", " MGI:MGI:1354954: 7\r\n", " MGI:MGI:1096335: 7\r\n", " UniProtKB:O95786: 7\r\n", " MGI:MGI:98797: 7\r\n", " MGI:MGI:96544: 7\r\n", " UniProtKB:Q09472: 7\r\n", " UniProtKB:Q16531: 7\r\n", " PomBase:SPAC664.01c: 7\r\n", " SGD:S000002674: 7\r\n", " UniProtKB:Q13619: 7\r\n", " MGI:MGI:97565: 7\r\n", " MGI:MGI:103202: 7\r\n", " UniProtKB:P14222: 7\r\n", " MGI:MGI:97592: 7\r\n", " MGI:MGI:97564: 7\r\n", "taxon:\r\n", " NCBITaxon:9606: 541\r\n", " NCBITaxon:10090: 185\r\n", " NCBITaxon:4896: 15\r\n", " NCBITaxon:7955: 14\r\n", " NCBITaxon:7227: 13\r\n", " NCBITaxon:559292: 6\r\n", " NCBITaxon:6239: 4\r\n", " NCBITaxon:9823: 4\r\n", " NCBITaxon:28576: 1\r\n", " NCBITaxon:1403190: 1\r\n", " NCBITaxon:8355: 1\r\n", " NCBITaxon:5074: 1\r\n", " NCBITaxon:1735992: 1\r\n", " NCBITaxon:229533: 1\r\n", " NCBITaxon:99287: 1\r\n", " NCBITaxon:8364: 1\r\n", " NCBITaxon:227321: 1\r\n", " NCBITaxon:602072: 1\r\n", " NCBITaxon:425011: 1\r\n", "\r\n" ] } ], "source": [ "!linkml-store -d mongodb://localhost:27017/gocams -c main fq -S activities.enabled_by,taxon -O yaml\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T22:40:26.211698Z", "start_time": "2024-07-01T22:40:23.599705Z" } }, "id": "db26d37f9e60283d" }, { "cell_type": "code", "execution_count": 30, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\r\n", " \"taxon+activities.molecular_function.term\": {\r\n", " \"('NCBITaxon:9606', 'GO:0004674')\": 280,\r\n", " \"('NCBITaxon:9606', 'GO:0061630')\": 167,\r\n", " \"('NCBITaxon:9606', 'GO:0030674')\": 88,\r\n", " \"('NCBITaxon:9606', 'GO:0003700')\": 86,\r\n", " \"('NCBITaxon:10090', 'GO:0003674')\": 82,\r\n", " \"('NCBITaxon:9606', 'GO:0060090')\": 76,\r\n", " \"('NCBITaxon:9606', 'GO:0005125')\": 73,\r\n", " \"('NCBITaxon:9606', 'GO:0004197')\": 68,\r\n", " \"('NCBITaxon:9606', 'GO:0140311')\": 65,\r\n", " \"('NCBITaxon:9606', 'GO:0035591')\": 64,\r\n", " \"('NCBITaxon:9606', 'GO:1990756')\": 54,\r\n", " \"('NCBITaxon:9606', 'GO:0004713')\": 54,\r\n", " \"('NCBITaxon:9606', 'GO:0043539')\": 51,\r\n", " \"('NCBITaxon:10090', 'GO:0048018')\": 48,\r\n", " \"('NCBITaxon:4896', 'GO:0003674')\": 41,\r\n", " \"('NCBITaxon:9606', 'GO:0003674')\": 39,\r\n", " \"('NCBITaxon:7955', 'GO:0003674')\": 38,\r\n", " \"('NCBITaxon:9606', 'GO:0043495')\": 37,\r\n", " \"('NCBITaxon:9606', 'GO:0022829')\": 37,\r\n", " \"('NCBITaxon:9606', 'GO:0140693')\": 37,\r\n", " \"('NCBITaxon:9606', 'GO:0048018')\": 32,\r\n", " \"('NCBITaxon:9606', 'GO:0005515')\": 31,\r\n", " \"('NCBITaxon:10090', 'GO:0004713')\": 31,\r\n", " \"('NCBITaxon:9606', 'GO:0140463')\": 28,\r\n", " \"('NCBITaxon:9606', 'GO:0038187')\": 28,\r\n", " \"('NCBITaxon:9606', 'GO:0160072')\": 27,\r\n", " \"('NCBITaxon:10090', 'GO:0004855')\": 26,\r\n", " \"('NCBITaxon:9606', 'GO:0038023')\": 23,\r\n", " \"('NCBITaxon:9606', 'GO:0015026')\": 23,\r\n", " \"('NCBITaxon:9606', 'GO:0004888')\": 23,\r\n", " \"('NCBITaxon:9606', 'GO:0003735')\": 23,\r\n", " \"('NCBITaxon:9606', 'GO:0005546')\": 21,\r\n", " \"('NCBITaxon:9606', 'GO:0019706')\": 20,\r\n", " \"('NCBITaxon:9606', 'GO:0004843')\": 19,\r\n", " \"('NCBITaxon:9606', 'GO:0140767')\": 18,\r\n", " \"('NCBITaxon:9606', 'GO:0000981')\": 18,\r\n", " \"('NCBITaxon:10090', 'GO:0008253')\": 18,\r\n", " \"('NCBITaxon:10090', 'GO:0035591')\": 18,\r\n", " \"('NCBITaxon:9606', 'GO:0003924')\": 18,\r\n", " \"('NCBITaxon:9606', 'GO:0061733')\": 17,\r\n", " \"('NCBITaxon:9606', 'GO:0003690')\": 17,\r\n", " \"('NCBITaxon:9606', 'GO:0005179')\": 17,\r\n", " \"('NCBITaxon:9606', 'GO:0030371')\": 17,\r\n", " \"('NCBITaxon:10090', 'GO:0003925')\": 17,\r\n", " \"('NCBITaxon:9606', 'GO:0061501')\": 17,\r\n", " \"('NCBITaxon:9606', 'GO:0005096')\": 17,\r\n", " \"('NCBITaxon:9606', 'GO:0140313')\": 15,\r\n", " \"('NCBITaxon:10090', 'GO:0004674')\": 15,\r\n", " \"('NCBITaxon:9606', 'GO:0004252')\": 14,\r\n", " \"('NCBITaxon:9606', 'GO:0003712')\": 14,\r\n", " \"('NCBITaxon:10090', 'GO:0004614')\": 14,\r\n", " \"('NCBITaxon:10090', 'GO:0051997')\": 13,\r\n", " \"('NCBITaxon:9606', 'GO:0003925')\": 13,\r\n", " \"('NCBITaxon:10090', 'GO:0004846')\": 13,\r\n", " \"('NCBITaxon:10090', 'GO:0008331')\": 13,\r\n", " \"('NCBITaxon:10090', 'GO:0033971')\": 13,\r\n", " \"('NCBITaxon:10090', 'GO:0004731')\": 13,\r\n", " \"('NCBITaxon:10090', 'GO:0004854')\": 13,\r\n", " \"('NCBITaxon:9606', 'GO:0019003')\": 12,\r\n", " \"('NCBITaxon:9606', 'GO:0008320')\": 12,\r\n", " \"('NCBITaxon:9606', 'GO:0001228')\": 12,\r\n", " \"('NCBITaxon:9606', 'GO:0061507')\": 12,\r\n", " \"('NCBITaxon:9606', 'GO:0003743')\": 12,\r\n", " \"('NCBITaxon:10090', 'GO:0005085')\": 11,\r\n", " \"('NCBITaxon:9606', 'GO:0004707')\": 11,\r\n", " \"('NCBITaxon:9606', 'GO:0034979')\": 11,\r\n", " \"('NCBITaxon:9606', 'GO:0008384')\": 11,\r\n", " \"('NCBITaxon:10090', 'GO:0001228')\": 11,\r\n", " \"('NCBITaxon:9606', 'GO:0005525')\": 11,\r\n", " \"('NCBITaxon:9606', 'GO:0004715')\": 11,\r\n", " \"('NCBITaxon:10090', 'GO:0030674')\": 11,\r\n", " \"('NCBITaxon:10090', 'GO:0003876')\": 10,\r\n", " \"('NCBITaxon:9606', 'GO:0061891')\": 10,\r\n", " \"('NCBITaxon:10090', 'GO:0004714')\": 10,\r\n", " \"('NCBITaxon:10090', 'GO:0004347')\": 10,\r\n", " \"('NCBITaxon:10090', 'GO:0003938')\": 10,\r\n", " \"('NCBITaxon:9606', 'GO:0140036')\": 10,\r\n", " \"('NCBITaxon:9606', 'GO:0004842')\": 10,\r\n", " \"('NCBITaxon:10090', 'GO:0008184')\": 10,\r\n", " \"('NCBITaxon:9606', 'GO:0140608')\": 10,\r\n", " \"('NCBITaxon:9606', 'GO:0016301')\": 10,\r\n", " \"('NCBITaxon:10090', 'GO:0005068')\": 9,\r\n", " \"('NCBITaxon:9606', 'GO:0005085')\": 9,\r\n", " \"('NCBITaxon:9606', 'GO:0061666')\": 9,\r\n", " \"('NCBITaxon:10090', 'GO:0005245')\": 9,\r\n", " \"('NCBITaxon:9606', 'GO:0004693')\": 9,\r\n", " \"('NCBITaxon:9606', 'GO:0003723')\": 9,\r\n", " \"('NCBITaxon:9606', 'GO:0004679')\": 9,\r\n", " \"('NCBITaxon:4896', 'GO:0004674')\": 9,\r\n", " \"('NCBITaxon:4896', 'GO:0140463')\": 9,\r\n", " \"('NCBITaxon:10090', 'GO:0004708')\": 9,\r\n", " \"('NCBITaxon:10090', 'GO:0004707')\": 9,\r\n", " \"('NCBITaxon:9606', 'GO:0004222')\": 9,\r\n", " \"('NCBITaxon:9606', 'GO:0061631')\": 8,\r\n", " \"('NCBITaxon:10090', 'GO:0038024')\": 8,\r\n", " \"('NCBITaxon:9606', 'GO:0061608')\": 8,\r\n", " \"('NCBITaxon:9606', 'GO:0005516')\": 8,\r\n", " \"('NCBITaxon:559292', 'GO:0061630')\": 8,\r\n", " \"('NCBITaxon:4896', 'GO:0005515')\": 8,\r\n", " \"('NCBITaxon:9606', 'GO:0004879')\": 8\r\n", " }\r\n", "}\r\n" ] } ], "source": [ "!linkml-store -d mongodb://localhost:27017/gocams -c main fq -S taxon+activities.molecular_function.term" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T22:57:10.561390Z", "start_time": "2024-07-01T22:57:07.692755Z" } }, "id": "2426dbc79e68031c" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false }, "id": "8855d7c935244ec1" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }