{ "cells": [ { "cell_type": "markdown", "source": [ "# How to index Phenopackets with LinkML-Store\n", "\n", "\n", "\n" ], "metadata": { "collapsed": false }, "id": "fc4794dd116ed21" }, { "cell_type": "markdown", "source": [ "## Use pystow to download phenopackets\n", "\n", "We will download from the Monarch Initiative [phenopacket-store](https://github.com/monarch-initiative/phenopacket-store)" ], "metadata": { "collapsed": false }, "id": "e19f50e1b2fc5d89" }, { "cell_type": "code", "execution_count": 1, "outputs": [], "source": [ "import pandas as pd\n", "import pystow\n", "import yaml\n", "\n", "path = pystow.ensure_untar(\"tmp\", \"phenopackets\", url=\" https://github.com/monarch-initiative/phenopacket-store/releases/latest/download/all_phenopackets.tgz\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:16.319576Z", "start_time": "2024-07-01T17:44:15.847793Z" } }, "id": "158d589d95a155e5" }, { "cell_type": "code", "execution_count": 2, "outputs": [ { "data": { "text/plain": "4876" }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# iterate over all *.json files in the phenopackets directory and parse to an object\n", "# we will recursively walk the path using os.walk ( we don't worry about loading yet)\n", "import os\n", "import json\n", "objs = []\n", "for root, dirs, files in os.walk(path):\n", " for file in files:\n", " if file.endswith(\".json\"):\n", " with open(os.path.join(root, file)) as stream:\n", " obj = json.load(stream)\n", " objs.append(obj)\n", "len(objs)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:17.206084Z", "start_time": "2024-07-01T17:44:16.320521Z" } }, "id": "142993c7e60551d1" }, { "cell_type": "markdown", "source": [ "## Creating a client and attaching to a database\n", "\n", "First we will create a client as normal:" ], "metadata": { "collapsed": false }, "id": "493c7599d2f40c27" }, { "cell_type": "code", "execution_count": 3, "outputs": [], "source": [ "from linkml_store import Client\n", "\n", "client = Client()" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:18.414269Z", "start_time": "2024-07-01T17:44:17.206497Z" } }, "id": "initial_id" }, { "cell_type": "markdown", "source": [ "Next we'll attach to a MongoDB instance. this assumes you have one running already.\n", "\n", "We will make a database called \"phenopackets\" and recreate it if it already exists\n", "\n", "(note for people running this notebook locally - if you happen to have a database with this name in your current mongo instance it will be deleted!)" ], "metadata": { "collapsed": false }, "id": "470f1cb70bf3641b" }, { "cell_type": "code", "execution_count": 4, "outputs": [], "source": [ "db = client.attach_database(\"mongodb://localhost:27017\", \"phenopackets\", recreate_if_exists=True)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:18.417829Z", "start_time": "2024-07-01T17:44:18.414991Z" } }, "id": "cc164c0acbe4c39d" }, { "cell_type": "markdown", "source": [ "## Creating a collection\n", "\n", "We'll create a simple test collection. The concept of collection in linkml-store maps directly to mongodb collections" ], "metadata": { "collapsed": false }, "id": "334ea2ced79828f7" }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false }, "id": "a0a98c5a5c9f0072" }, { "cell_type": "code", "execution_count": 5, "outputs": [], "source": [ "collection = db.create_collection(\"main\", recreate_if_exists=True)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:18.558922Z", "start_time": "2024-07-01T17:44:18.418674Z" } }, "id": "c3a79013f9359a9" }, { "cell_type": "markdown", "source": [ "## Inserting objects into the store\n", "\n", "We'll use the standard `insert` method to insert the phenopackets into the collection. At this stage there is no explicit schema." ], "metadata": { "collapsed": false }, "id": "207f35ee61edc14d" }, { "cell_type": "code", "execution_count": 6, "outputs": [], "source": [ "collection.insert(objs)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.683533Z", "start_time": "2024-07-01T17:44:18.608715Z" } }, "id": "4a09a78fe3c8dc33" }, { "cell_type": "markdown", "source": [ "## Check contents\n", "\n", "We can check the number of rows in the collection, to ensure everything was inserted correctly:" ], "metadata": { "collapsed": false }, "id": "47f933e901372da8" }, { "cell_type": "code", "execution_count": 7, "outputs": [ { "data": { "text/plain": "4876" }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.find({}, limit=1).num_rows" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.691539Z", "start_time": "2024-07-01T17:44:20.683847Z" } }, "id": "f505fdc8cc20196e" }, { "cell_type": "code", "execution_count": 8, "outputs": [], "source": [ "assert collection.find({}, limit=1).num_rows == len(objs)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.701250Z", "start_time": "2024-07-01T17:44:20.690885Z" } }, "id": "e6ae22c986b9ba5b" }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false }, "id": "adc134486070cf0d" }, { "cell_type": "markdown", "source": [ "Let's check with pandas just to make sure it looks as expected; we'll query for a specific OMIM disease:" ], "metadata": { "collapsed": false }, "id": "90e2e9793375431f" }, { "cell_type": "code", "execution_count": 9, "outputs": [ { "data": { "text/plain": " id \\\n0 PMID_28289718_Higgins-Patient-1 \n1 PMID_31173466_Suzuki-Patient-1 \n2 PMID_28289718_Higgins-Patient-2 \n\n subject \\\n0 {'id': 'Higgins-Patient-1', 'timeAtLastEncount... \n1 {'id': 'Suzuki-Patient-1', 'timeAtLastEncounte... \n2 {'id': 'Higgins-Patient-2', 'timeAtLastEncount... \n\n phenotypicFeatures \\\n0 [{'type': {'id': 'HP:0001714', 'label': 'Ventr... \n1 [{'type': {'id': 'HP:0001714', 'label': 'Ventr... \n2 [{'type': {'id': 'HP:0001714', 'label': 'Ventr... \n\n interpretations \\\n0 [{'id': 'Higgins-Patient-1', 'progressStatus':... \n1 [{'id': 'Suzuki-Patient-1', 'progressStatus': ... \n2 [{'id': 'Higgins-Patient-2', 'progressStatus':... \n\n diseases \\\n0 [{'term': {'id': 'OMIM:618499', 'label': 'Noon... \n1 [{'term': {'id': 'OMIM:618499', 'label': 'Noon... \n2 [{'term': {'id': 'OMIM:618499', 'label': 'Noon... \n\n metaData \n0 {'created': '2024-03-28T11:11:48.590163946Z', ... \n1 {'created': '2024-03-28T11:11:48.594725131Z', ... \n2 {'created': '2024-03-28T11:11:48.592718124Z', ... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
idsubjectphenotypicFeaturesinterpretationsdiseasesmetaData
0PMID_28289718_Higgins-Patient-1{'id': 'Higgins-Patient-1', 'timeAtLastEncount...[{'type': {'id': 'HP:0001714', 'label': 'Ventr...[{'id': 'Higgins-Patient-1', 'progressStatus':...[{'term': {'id': 'OMIM:618499', 'label': 'Noon...{'created': '2024-03-28T11:11:48.590163946Z', ...
1PMID_31173466_Suzuki-Patient-1{'id': 'Suzuki-Patient-1', 'timeAtLastEncounte...[{'type': {'id': 'HP:0001714', 'label': 'Ventr...[{'id': 'Suzuki-Patient-1', 'progressStatus': ...[{'term': {'id': 'OMIM:618499', 'label': 'Noon...{'created': '2024-03-28T11:11:48.594725131Z', ...
2PMID_28289718_Higgins-Patient-2{'id': 'Higgins-Patient-2', 'timeAtLastEncount...[{'type': {'id': 'HP:0001714', 'label': 'Ventr...[{'id': 'Higgins-Patient-2', 'progressStatus':...[{'term': {'id': 'OMIM:618499', 'label': 'Noon...{'created': '2024-03-28T11:11:48.592718124Z', ...
\n
" }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = collection.find({\"diseases.term.id\": \"OMIM:618499\"}, limit=3)\n", "qr.rows_dataframe" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.739844Z", "start_time": "2024-07-01T17:44:20.704008Z" } }, "id": "e763fe6cd50022e2" }, { "cell_type": "markdown", "source": [ "As expected, there are three rows with the OMIM disease 618499." ], "metadata": { "collapsed": false }, "id": "4a266efbcb405673" }, { "cell_type": "markdown", "source": [ "## Query faceting\n", "\n", "We will now demonstrate faceted queries, allowing us to count the number of instances of different categorical values or categorical value combinations.\n", "\n", "First we'll facet on the subject sex. We can use path notation, e.g. `subject.sex` here:" ], "metadata": { "collapsed": false }, "id": "d4749758585df35c" }, { "cell_type": "code", "execution_count": 10, "outputs": [ { "data": { "text/plain": "{'subject.sex': [('MALE', 1807), ('FEMALE', 1564)]}" }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.query_facets({}, facet_columns=[\"subject.sex\"])" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.776482Z", "start_time": "2024-07-01T17:44:20.738573Z" } }, "id": "9b7f01f14d36958b" }, { "cell_type": "markdown", "source": [ "We can also facet by the disease name/label. We'll restrict this to the top 20" ], "metadata": { "collapsed": false }, "id": "ea6e13f82ec50e62" }, { "cell_type": "code", "execution_count": 11, "outputs": [ { "data": { "text/plain": "{'diseases.term.label': [('Developmental and epileptic encephalopathy 4', 463),\n ('Developmental and epileptic encephalopathy 11', 342),\n ('KBG syndrome', 337),\n ('Leber congenital amaurosis 6', 191),\n ('Glass syndrome', 158),\n ('Holt-Oram syndrome', 103),\n ('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 95),\n ('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities',\n 73),\n ('Jacobsen syndrome', 69),\n ('Kabuki Syndrome 1', 65),\n ('Coffin-Siris syndrome 8', 65),\n ('Houge-Janssen syndrome 2', 60),\n ('ZTTK SYNDROME', 52),\n ('Seizures, benign familial infantile, 3', 51),\n ('Greig cephalopolysyndactyly syndrome', 51),\n ('Marfan syndrome', 50),\n ('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 50),\n ('Loeys-Dietz syndrome 3', 49),\n ('Developmental delay, dysmorphic facies, and brain anomalies', 49),\n ('Hypomagnesemia 3, renal', 46)]}" }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.query_facets({}, facet_columns=[\"diseases.term.label\"], facet_limit=20)\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.868905Z", "start_time": "2024-07-01T17:44:20.760393Z" } }, "id": "27857349279abc41" }, { "cell_type": "code", "execution_count": 12, "outputs": [ { "data": { "text/plain": "{'subject.timeAtLastEncounter.age.iso8601duration': [('P4Y', 131),\n ('P3Y', 114),\n ('P6Y', 100),\n ('P5Y', 97),\n ('P2Y', 95),\n ('P7Y', 85),\n ('P10Y', 82),\n ('P9Y', 77),\n ('P8Y', 71)]}" }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.query_facets({}, facet_columns=[\"subject.timeAtLastEncounter.age.iso8601duration\"], facet_limit=10)\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.870288Z", "start_time": "2024-07-01T17:44:20.805339Z" } }, "id": "86eea02b6c25c2cd" }, { "cell_type": "code", "execution_count": 13, "outputs": [ { "data": { "text/plain": "{'interpretations.diagnosis.genomicInterpretations.variantInterpretation.variationDescriptor.geneContext.symbol': [('STXBP1',\n 463),\n ('SCN2A', 393),\n ('ANKRD11', 337),\n ('RPGRIP1', 273),\n ('SATB2', 158),\n ('FBN1', 151),\n ('LMNA', 127),\n ('FBXL4', 117),\n ('TBX5', 103),\n ('SPTAN1', 85)]}" }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.query_facets({}, facet_columns=[\"interpretations.diagnosis.genomicInterpretations.variantInterpretation.variationDescriptor.geneContext.symbol\"], facet_limit=10)\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:20.946852Z", "start_time": "2024-07-01T17:44:20.813411Z" } }, "id": "10f2c971ed09c386" }, { "cell_type": "markdown", "source": [ "We can also facet on combinations:" ], "metadata": { "collapsed": false }, "id": "ee540382322111a9" }, { "cell_type": "code", "execution_count": 14, "outputs": [ { "ename": "AttributeError", "evalue": "'tuple' object has no attribute 'split'", "output_type": "error", "traceback": [ "\u001B[0;31m---------------------------------------------------------------------------\u001B[0m", "\u001B[0;31mAttributeError\u001B[0m Traceback (most recent call last)", "Cell \u001B[0;32mIn[14], line 1\u001B[0m\n\u001B[0;32m----> 1\u001B[0m fqr \u001B[38;5;241m=\u001B[39m \u001B[43mcollection\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mquery_facets\u001B[49m\u001B[43m(\u001B[49m\u001B[43m{\u001B[49m\u001B[43m}\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mfacet_columns\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43m[\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43msubject.sex\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mdiseases.term.label\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m)\u001B[49m\u001B[43m]\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mfacet_limit\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;241;43m20\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[1;32m 2\u001B[0m fqr\n", "File \u001B[0;32m~/repos/linkml-store/src/linkml_store/api/stores/mongodb/mongodb_collection.py:80\u001B[0m, in \u001B[0;36mMongoDBCollection.query_facets\u001B[0;34m(self, where, facet_columns, facet_limit, **kwargs)\u001B[0m\n\u001B[1;32m 77\u001B[0m logger\u001B[38;5;241m.\u001B[39mdebug(\u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mFaceting on \u001B[39m\u001B[38;5;132;01m{\u001B[39;00mcol\u001B[38;5;132;01m}\u001B[39;00m\u001B[38;5;124m\"\u001B[39m)\n\u001B[1;32m 79\u001B[0m \u001B[38;5;66;03m# Split the column into parts to handle nested fields\u001B[39;00m\n\u001B[0;32m---> 80\u001B[0m col_parts \u001B[38;5;241m=\u001B[39m \u001B[43mcol\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43msplit\u001B[49m(\u001B[38;5;124m'\u001B[39m\u001B[38;5;124m.\u001B[39m\u001B[38;5;124m'\u001B[39m)\n\u001B[1;32m 82\u001B[0m \u001B[38;5;66;03m# Initial pipeline without unwinding\u001B[39;00m\n\u001B[1;32m 83\u001B[0m facet_pipeline \u001B[38;5;241m=\u001B[39m [\n\u001B[1;32m 84\u001B[0m {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m$match\u001B[39m\u001B[38;5;124m\"\u001B[39m: where} \u001B[38;5;28;01mif\u001B[39;00m where \u001B[38;5;28;01melse\u001B[39;00m {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m$match\u001B[39m\u001B[38;5;124m\"\u001B[39m: {}},\n\u001B[1;32m 85\u001B[0m {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m$group\u001B[39m\u001B[38;5;124m\"\u001B[39m: {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m_id\u001B[39m\u001B[38;5;124m\"\u001B[39m: \u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m$\u001B[39m\u001B[38;5;132;01m{\u001B[39;00mcol\u001B[38;5;132;01m}\u001B[39;00m\u001B[38;5;124m\"\u001B[39m, \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mcount\u001B[39m\u001B[38;5;124m\"\u001B[39m: {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m$sum\u001B[39m\u001B[38;5;124m\"\u001B[39m: \u001B[38;5;241m1\u001B[39m}}},\n\u001B[1;32m 86\u001B[0m {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m$sort\u001B[39m\u001B[38;5;124m\"\u001B[39m: {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mcount\u001B[39m\u001B[38;5;124m\"\u001B[39m: \u001B[38;5;241m-\u001B[39m\u001B[38;5;241m1\u001B[39m}},\n\u001B[1;32m 87\u001B[0m {\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124m$limit\u001B[39m\u001B[38;5;124m\"\u001B[39m: facet_limit},\n\u001B[1;32m 88\u001B[0m ]\n", "\u001B[0;31mAttributeError\u001B[0m: 'tuple' object has no attribute 'split'" ] } ], "source": [ "fqr = collection.query_facets({}, facet_columns=[(\"subject.sex\", \"diseases.term.label\")], facet_limit=20)\n", "fqr\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:21.213068Z", "start_time": "2024-07-01T17:44:20.905949Z" } }, "id": "5eca26a67254d3d2" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "import pandas as pd\n", "def fqr_as_dfs(fqr: dict):\n", " dfs = []\n", " for k, vs in fqr.items():\n", " rows = []\n", " for obj, count in vs:\n", " row = {}\n", " for col, val in zip(k, obj.values()):\n", " row[col] = val[0] if isinstance(val, list) else val\n", " row[\"count\"] = count\n", " rows.append(row)\n", " df = pd.DataFrame(columns=list(k) + [\"count\"], data=rows)\n", " dfs.append(df)\n", " return dfs\n", "\n", "fqr_as_dfs(fqr)[0]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:44:21.214735Z", "start_time": "2024-07-01T17:44:21.213707Z" } }, "id": "854f55b91f350de2" }, { "cell_type": "markdown", "source": [ "## Semantic Search\n", "\n", "We will index phenopackets using a template that extracts the subject, phenotypic features and diseases.\n", "\n", "First we will create a textualization template for a phenopacket. We will keep it minimal for simplicity - this doesn't include treatments, families, etc." ], "metadata": { "collapsed": false }, "id": "648f05e75f250221" }, { "cell_type": "code", "execution_count": 15, "outputs": [], "source": [ "template = \"\"\"\n", "subject: {{subject}}\n", "phenotypes: {% for p in phenotypicFeatures %}{{p.type.label}}{% endfor %}\n", "diseases: {% for d in diseases %}{{d.term.label}}{% endfor %}\n", "\"\"\"" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:47:49.443591Z", "start_time": "2024-07-01T17:47:49.438727Z" } }, "id": "976095541027ce9e" }, { "cell_type": "markdown", "source": [ "Next we will create an indexer using the template. This will use the Jinja2 syntax for templating.\n", "We will also cache LLM embedding queries, so if we want to incrementally add new phenopackets we can avoid re-running the LLM embeddings calls." ], "metadata": { "collapsed": false }, "id": "76a71f8590bd5602" }, { "cell_type": "code", "execution_count": 16, "outputs": [], "source": [ "from linkml_store.index.implementations.llm_indexer import LLMIndexer\n", "\n", "index = LLMIndexer(\n", " name=\"ppkt\", \n", " cached_embeddings_database=\"tmp/llm_pheno_cache.db\",\n", " text_template=template,\n", " text_template_syntax=\"jinja2\",\n", ")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:47:53.522339Z", "start_time": "2024-07-01T17:47:53.491380Z" } }, "id": "e98f9d6eb4a5e385" }, { "cell_type": "markdown", "source": [ "We can test the template on the first row of the collection:" ], "metadata": { "collapsed": false }, "id": "e6c28d4d95b920ba" }, { "cell_type": "code", "execution_count": 17, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "subject: {'id': 'Higgins-Patient-1', 'timeAtLastEncounter': {'age': {'iso8601duration': 'P17Y'}}, 'sex': 'FEMALE'}\n", "phenotypes: Ventricular hypertrophyHeart murmurHypertrophic cardiomyopathyShort statureHypertelorismLow-set earsPosteriorly rotated earsGlobal developmental delayCognitive impairmentCardiac arrest\n", "diseases: Noonan syndrome-11\n" ] } ], "source": [ "print(index.object_to_text(qr.rows[0]))" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:47:54.695990Z", "start_time": "2024-07-01T17:47:54.689484Z" } }, "id": "16dce837e31c88f6" }, { "cell_type": "markdown", "source": [ "That looks as expected. We can now attach the indexer to the collection and index the collection:" ], "metadata": { "collapsed": false }, "id": "4fbd1fc091c4c7b" }, { "cell_type": "code", "execution_count": 18, "outputs": [], "source": [ "collection.attach_indexer(index, auto_index=True)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:51:20.344998Z", "start_time": "2024-07-01T17:47:55.274689Z" } }, "id": "18a0bd86de7f1d81" }, { "cell_type": "markdown", "source": [ "## Semantic Search\n", "\n", "Let's query based on text criteria:" ], "metadata": { "collapsed": false }, "id": "f49056b209918a9" }, { "cell_type": "code", "execution_count": 19, "outputs": [ { "data": { "text/plain": " score id \\\n0 0.824664 PMID_30658709_patient \n1 0.813827 PMID_36932076_Patient_1 \n2 0.804126 PMID_37303127_6 \n3 0.799738 PMID_36932076_Patient_3 \n4 0.799243 PMID_27536553_27536553_P3 \n\n subject \\\n0 {'id': 'patient', 'timeAtLastEncounter': {'age... \n1 {'id': 'Patient 1', 'timeAtLastEncounter': {'a... \n2 {'id': '6', 'timeAtLastEncounter': {'age': {'i... \n3 {'id': 'Patient 3', 'timeAtLastEncounter': {'a... \n4 {'id': '27536553_P3', 'timeAtLastEncounter': {... \n\n phenotypicFeatures \\\n0 [{'type': {'id': 'HP:0031956', 'label': 'Eleva... \n1 [{'type': {'id': 'HP:0000979', 'label': 'Purpu... \n2 [{'type': {'id': 'HP:0001397', 'label': 'Hepat... \n3 [{'type': {'id': 'HP:0001511', 'label': 'Intra... \n4 [{'type': {'id': 'HP:0001396', 'label': 'Chole... \n\n interpretations \\\n0 [{'id': 'patient', 'progressStatus': 'SOLVED',... \n1 [{'id': 'Patient 1', 'progressStatus': 'SOLVED... \n2 [{'id': '6', 'progressStatus': 'SOLVED', 'diag... \n3 [{'id': 'Patient 3', 'progressStatus': 'SOLVED... \n4 [{'id': '27536553_P3', 'progressStatus': 'SOLV... \n\n diseases \\\n0 [{'term': {'id': 'OMIM:615878', 'label': 'Chol... \n1 [{'term': {'id': 'OMIM:620376', 'label': 'Auto... \n2 [{'term': {'id': 'OMIM:151660', 'label': 'Lipo... \n3 [{'term': {'id': 'OMIM:620376', 'label': 'Auto... \n4 [{'term': {'id': 'OMIM:256810', 'label': 'Mito... \n\n metaData \n0 {'created': '2024-05-05T09:03:25.388371944Z', ... \n1 {'created': '2024-04-19T06:07:57.188061952Z', ... \n2 {'created': '2024-03-23T17:41:42.999521017Z', ... \n3 {'created': '2024-04-19T06:07:57.190312862Z', ... \n4 {'created': '2024-03-23T19:28:35.688389062Z', ... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
scoreidsubjectphenotypicFeaturesinterpretationsdiseasesmetaData
00.824664PMID_30658709_patient{'id': 'patient', 'timeAtLastEncounter': {'age...[{'type': {'id': 'HP:0031956', 'label': 'Eleva...[{'id': 'patient', 'progressStatus': 'SOLVED',...[{'term': {'id': 'OMIM:615878', 'label': 'Chol...{'created': '2024-05-05T09:03:25.388371944Z', ...
10.813827PMID_36932076_Patient_1{'id': 'Patient 1', 'timeAtLastEncounter': {'a...[{'type': {'id': 'HP:0000979', 'label': 'Purpu...[{'id': 'Patient 1', 'progressStatus': 'SOLVED...[{'term': {'id': 'OMIM:620376', 'label': 'Auto...{'created': '2024-04-19T06:07:57.188061952Z', ...
20.804126PMID_37303127_6{'id': '6', 'timeAtLastEncounter': {'age': {'i...[{'type': {'id': 'HP:0001397', 'label': 'Hepat...[{'id': '6', 'progressStatus': 'SOLVED', 'diag...[{'term': {'id': 'OMIM:151660', 'label': 'Lipo...{'created': '2024-03-23T17:41:42.999521017Z', ...
30.799738PMID_36932076_Patient_3{'id': 'Patient 3', 'timeAtLastEncounter': {'a...[{'type': {'id': 'HP:0001511', 'label': 'Intra...[{'id': 'Patient 3', 'progressStatus': 'SOLVED...[{'term': {'id': 'OMIM:620376', 'label': 'Auto...{'created': '2024-04-19T06:07:57.190312862Z', ...
40.799243PMID_27536553_27536553_P3{'id': '27536553_P3', 'timeAtLastEncounter': {...[{'type': {'id': 'HP:0001396', 'label': 'Chole...[{'id': '27536553_P3', 'progressStatus': 'SOLV...[{'term': {'id': 'OMIM:256810', 'label': 'Mito...{'created': '2024-03-23T19:28:35.688389062Z', ...
\n
" }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = collection.search(\"patients with liver diseases\")\n", "qr.rows_dataframe[0:5]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:51:21.913118Z", "start_time": "2024-07-01T17:51:20.344407Z" } }, "id": "1ddd4ac75719342d" }, { "cell_type": "markdown", "source": [ "Let's check the first one" ], "metadata": { "collapsed": false }, "id": "b54c088d3d69f8a3" }, { "cell_type": "code", "execution_count": 20, "outputs": [ { "data": { "text/plain": "(0.8246637496927007,\n {'id': 'PMID_30658709_patient',\n 'subject': {'id': 'patient',\n 'timeAtLastEncounter': {'age': {'iso8601duration': 'P1Y11M'}},\n 'sex': 'FEMALE'},\n 'phenotypicFeatures': [{'type': {'id': 'HP:0031956',\n 'label': 'Elevated circulating aspartate aminotransferase concentration'},\n 'onset': {'age': {'iso8601duration': 'P1Y11M'}}},\n {'type': {'id': 'HP:0031964',\n 'label': 'Elevated circulating alanine aminotransferase concentration'},\n 'onset': {'age': {'iso8601duration': 'P1Y11M'}}},\n {'type': {'id': 'HP:0003573', 'label': 'Increased total bilirubin'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0012202',\n 'label': 'Increased serum bile acid concentration'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0002908', 'label': 'Conjugated hyperbilirubinemia'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0001433', 'label': 'Hepatosplenomegaly'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0001510', 'label': 'Growth delay'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0000989', 'label': 'Pruritus'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0000952', 'label': 'Jaundice'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0100810', 'label': 'Pointed helix'},\n 'onset': {'age': {'iso8601duration': 'P6M'}}},\n {'type': {'id': 'HP:0002650', 'label': 'Scoliosis'}},\n {'type': {'id': 'HP:0003112',\n 'label': 'Abnormal circulating amino acid concentration'},\n 'excluded': True},\n {'type': {'id': 'HP:0001928', 'label': 'Abnormality of coagulation'},\n 'excluded': True},\n {'type': {'id': 'HP:0010701', 'label': 'Abnormal immunoglobulin level'},\n 'excluded': True},\n {'type': {'id': 'HP:0001627', 'label': 'Abnormal heart morphology'},\n 'excluded': True}],\n 'interpretations': [{'id': 'patient',\n 'progressStatus': 'SOLVED',\n 'diagnosis': {'disease': {'id': 'OMIM:615878',\n 'label': 'Cholestasis, progressive familial intrahepatic 4'},\n 'genomicInterpretations': [{'subjectOrBiosampleId': 'patient',\n 'interpretationStatus': 'CAUSATIVE',\n 'variantInterpretation': {'variationDescriptor': {'id': 'var_kKNGnjOxGXMbcoWzDGEJKVPIB',\n 'geneContext': {'valueId': 'HGNC:11828', 'symbol': 'TJP2'},\n 'expressions': [{'syntax': 'hgvs.c',\n 'value': 'NM_004817.4:c.2355+1G>C'},\n {'syntax': 'hgvs.g', 'value': 'NC_000009.12:g.69238790G>C'}],\n 'vcfRecord': {'genomeAssembly': 'hg38',\n 'chrom': 'chr9',\n 'pos': '69238790',\n 'ref': 'G',\n 'alt': 'C'},\n 'moleculeContext': 'genomic',\n 'allelicState': {'id': 'GENO:0000136', 'label': 'homozygous'}}}}]}}],\n 'diseases': [{'term': {'id': 'OMIM:615878',\n 'label': 'Cholestasis, progressive familial intrahepatic 4'},\n 'onset': {'ontologyClass': {'id': 'HP:0003593',\n 'label': 'Infantile onset'}}}],\n 'metaData': {'created': '2024-05-05T09:03:25.388371944Z',\n 'createdBy': 'ORCID:0000-0002-0736-9199',\n 'resources': [{'id': 'geno',\n 'name': 'Genotype Ontology',\n 'url': 'http://purl.obolibrary.org/obo/geno.owl',\n 'version': '2022-03-05',\n 'namespacePrefix': 'GENO',\n 'iriPrefix': 'http://purl.obolibrary.org/obo/GENO_'},\n {'id': 'hgnc',\n 'name': 'HUGO Gene Nomenclature Committee',\n 'url': 'https://www.genenames.org',\n 'version': '06/01/23',\n 'namespacePrefix': 'HGNC',\n 'iriPrefix': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/'},\n {'id': 'omim',\n 'name': 'An Online Catalog of Human Genes and Genetic Disorders',\n 'url': 'https://www.omim.org',\n 'version': 'January 4, 2023',\n 'namespacePrefix': 'OMIM',\n 'iriPrefix': 'https://www.omim.org/entry/'},\n {'id': 'so',\n 'name': 'Sequence types and features ontology',\n 'url': 'http://purl.obolibrary.org/obo/so.obo',\n 'version': '2021-11-22',\n 'namespacePrefix': 'SO',\n 'iriPrefix': 'http://purl.obolibrary.org/obo/SO_'},\n {'id': 'hp',\n 'name': 'human phenotype ontology',\n 'url': 'http://purl.obolibrary.org/obo/hp.owl',\n 'version': '2024-04-26',\n 'namespacePrefix': 'HP',\n 'iriPrefix': 'http://purl.obolibrary.org/obo/HP_'}],\n 'phenopacketSchemaVersion': '2.0',\n 'externalReferences': [{'id': 'PMID:30658709',\n 'reference': 'https://pubmed.ncbi.nlm.nih.gov/30658709',\n 'description': 'Novel compound heterozygote mutations of TJP2 in a Chinese child with progressive cholestatic liver disease'}]}})" }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr.ranked_rows[0]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:51:21.921326Z", "start_time": "2024-07-01T17:51:21.914718Z" } }, "id": "5a4fd8fe217fdf6b" }, { "cell_type": "markdown", "source": [ "We can combine semantic search with queries:" ], "metadata": { "collapsed": false }, "id": "4f38cf9889a15086" }, { "cell_type": "code", "execution_count": 21, "outputs": [ { "data": { "text/plain": " score id \\\n0 0.813827 PMID_36932076_Patient_1 \n1 0.799738 PMID_36932076_Patient_3 \n2 0.799243 PMID_27536553_27536553_P3 \n3 0.798670 PMID_29321044_Patient_3 \n4 0.798010 PMID_36517554_patient_1 \n\n subject \\\n0 {'id': 'Patient 1', 'timeAtLastEncounter': {'a... \n1 {'id': 'Patient 3', 'timeAtLastEncounter': {'a... \n2 {'id': '27536553_P3', 'timeAtLastEncounter': {... \n3 {'id': 'Patient 3', 'timeAtLastEncounter': {'a... \n4 {'id': 'patient 1', 'timeAtLastEncounter': {'a... \n\n phenotypicFeatures \\\n0 [{'type': {'id': 'HP:0000979', 'label': 'Purpu... \n1 [{'type': {'id': 'HP:0001511', 'label': 'Intra... \n2 [{'type': {'id': 'HP:0001396', 'label': 'Chole... \n3 [{'type': {'id': 'HP:0031956', 'label': 'Eleva... \n4 [{'type': {'id': 'HP:0002240', 'label': 'Hepat... \n\n interpretations \\\n0 [{'id': 'Patient 1', 'progressStatus': 'SOLVED... \n1 [{'id': 'Patient 3', 'progressStatus': 'SOLVED... \n2 [{'id': '27536553_P3', 'progressStatus': 'SOLV... \n3 [{'id': 'Patient 3', 'progressStatus': 'SOLVED... \n4 [{'id': 'patient 1', 'progressStatus': 'SOLVED... \n\n diseases \\\n0 [{'term': {'id': 'OMIM:620376', 'label': 'Auto... \n1 [{'term': {'id': 'OMIM:620376', 'label': 'Auto... \n2 [{'term': {'id': 'OMIM:256810', 'label': 'Mito... \n3 [{'term': {'id': 'OMIM:616829', 'label': 'Cong... \n4 [{'term': {'id': 'OMIM:620603', 'label': 'Immu... \n\n metaData \n0 {'created': '2024-04-19T06:07:57.188061952Z', ... \n1 {'created': '2024-04-19T06:07:57.190312862Z', ... \n2 {'created': '2024-03-23T19:28:35.688389062Z', ... \n3 {'created': '2024-05-11T06:05:50.632786035Z', ... \n4 {'created': '2024-03-29T11:25:36.649104833Z', ... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
scoreidsubjectphenotypicFeaturesinterpretationsdiseasesmetaData
00.813827PMID_36932076_Patient_1{'id': 'Patient 1', 'timeAtLastEncounter': {'a...[{'type': {'id': 'HP:0000979', 'label': 'Purpu...[{'id': 'Patient 1', 'progressStatus': 'SOLVED...[{'term': {'id': 'OMIM:620376', 'label': 'Auto...{'created': '2024-04-19T06:07:57.188061952Z', ...
10.799738PMID_36932076_Patient_3{'id': 'Patient 3', 'timeAtLastEncounter': {'a...[{'type': {'id': 'HP:0001511', 'label': 'Intra...[{'id': 'Patient 3', 'progressStatus': 'SOLVED...[{'term': {'id': 'OMIM:620376', 'label': 'Auto...{'created': '2024-04-19T06:07:57.190312862Z', ...
20.799243PMID_27536553_27536553_P3{'id': '27536553_P3', 'timeAtLastEncounter': {...[{'type': {'id': 'HP:0001396', 'label': 'Chole...[{'id': '27536553_P3', 'progressStatus': 'SOLV...[{'term': {'id': 'OMIM:256810', 'label': 'Mito...{'created': '2024-03-23T19:28:35.688389062Z', ...
30.798670PMID_29321044_Patient_3{'id': 'Patient 3', 'timeAtLastEncounter': {'a...[{'type': {'id': 'HP:0031956', 'label': 'Eleva...[{'id': 'Patient 3', 'progressStatus': 'SOLVED...[{'term': {'id': 'OMIM:616829', 'label': 'Cong...{'created': '2024-05-11T06:05:50.632786035Z', ...
40.798010PMID_36517554_patient_1{'id': 'patient 1', 'timeAtLastEncounter': {'a...[{'type': {'id': 'HP:0002240', 'label': 'Hepat...[{'id': 'patient 1', 'progressStatus': 'SOLVED...[{'term': {'id': 'OMIM:620603', 'label': 'Immu...{'created': '2024-03-29T11:25:36.649104833Z', ...
\n
" }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qr = collection.search(\"patients with liver diseases\", where={\"subject.sex\": \"MALE\"})\n", "qr.rows_dataframe[0:5]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T17:51:22.449068Z", "start_time": "2024-07-01T17:51:21.919329Z" } }, "id": "8a218f8f7688a2d3" }, { "cell_type": "markdown", "source": [ "## Validation\n", "\n", "Next we will demonstrate validation over a whole collection.\n", "\n", "Currently validating depends on a LinkML schema - we have previously copied this schema into the test folder.\n", "We will load the schema into the database object:" ], "metadata": { "collapsed": false }, "id": "41a14e7976a923b3" }, { "cell_type": "code", "execution_count": 23, "outputs": [], "source": [ "db.load_schema_view(\"../../tests/input/schemas/phenopackets_linkml/phenopackets.yaml\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T20:51:01.772229Z", "start_time": "2024-07-01T20:51:00.965187Z" } }, "id": "5294ee7927a372f1" }, { "cell_type": "markdown", "source": [ "Quick sanity check to ensure that worked:" ], "metadata": { "collapsed": false }, "id": "292d662d92bdfdb4" }, { "cell_type": "code", "execution_count": 24, "outputs": [ { "data": { "text/plain": "['Age',\n 'AgeRange',\n 'Dictionary',\n 'Evidence',\n 'ExternalReference',\n 'File',\n 'GestationalAge',\n 'OntologyClass',\n 'Procedure',\n 'TimeElement']" }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(db.schema_view.all_classes())[0:10]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T20:51:03.700448Z", "start_time": "2024-07-01T20:51:03.693511Z" } }, "id": "c211d3ce33b05fd5" }, { "cell_type": "code", "execution_count": 25, "outputs": [], "source": [ "collection.metadata.type = \"Phenopacket\"" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T20:51:05.816063Z", "start_time": "2024-07-01T20:51:05.811240Z" } }, "id": "7109f8da1228fe6a" }, { "cell_type": "code", "execution_count": 26, "outputs": [], "source": [ "from linkml_runtime.dumpers import yaml_dumper\n", "for r in db.iter_validate_database():\n", " # known issue - https://github.com/monarch-initiative/phenopacket-store/issues/97\n", " if \"is not of type 'integer'\" in r.message:\n", " continue\n", " print(r.message[0:100])\n", " print(r)\n", " raise ValueError(\"Unexpected validation error\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T20:51:16.321396Z", "start_time": "2024-07-01T20:51:07.100302Z" } }, "id": "bce050193361ecf2" }, { "cell_type": "markdown", "source": [ "## Command Line Usage\n", "\n", "We can also use the command line for all of the above operations.\n", "\n", "For example, feceted queries:" ], "metadata": { "collapsed": false }, "id": "8ff5109280b990e0" }, { "cell_type": "code", "execution_count": 27, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\r\n", " \"subject.sex\": {\r\n", " \"MALE\": 1807,\r\n", " \"FEMALE\": 1564\r\n", " }\r\n", "}\r\n" ] } ], "source": [ "!linkml-store -d mongodb://localhost:27017 -c main fq -S subject.sex" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T22:36:40.568130Z", "start_time": "2024-07-01T22:36:37.908083Z" } }, "id": "92208567bec477fb" }, { "cell_type": "code", "execution_count": 31, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "phenotypicFeatures.type.label:\r\n", " Global developmental delay: 1705\r\n", " Hypotonia: 1056\r\n", " Intellectual disability: 1028\r\n", " Seizure: 950\r\n", " Hypertelorism: 925\r\n", " Delayed speech and language development: 829\r\n", " Short stature: 806\r\n", " Microcephaly: 780\r\n", " Scoliosis: 702\r\n", " Feeding difficulties: 678\r\n", " Low-set ears: 598\r\n", " Autistic behavior: 519\r\n", " Motor delay: 518\r\n", " Downslanted palpebral fissures: 505\r\n", " Strabismus: 504\r\n", " Long philtrum: 500\r\n", " Ptosis: 498\r\n", " Patent foramen ovale: 469\r\n", " Anteverted nares: 461\r\n", " Hearing impairment: 451\r\n", " Epicanthus: 447\r\n", " Ventricular septal defect: 435\r\n", " Thick eyebrow: 433\r\n", " Cleft palate: 423\r\n", " Joint hypermobility: 388\r\n", " High palate: 383\r\n", " Triangular face: 369\r\n", " Micrognathia: 364\r\n", " Posteriorly rotated ears: 350\r\n", " Failure to thrive: 345\r\n", " Prominent forehead: 343\r\n", " Thin upper lip vermilion: 338\r\n", " Sleep abnormality: 331\r\n", " Wide nasal bridge: 331\r\n", " Infantile spasms: 325\r\n", " Long eyelashes: 325\r\n", " Pectus excavatum: 322\r\n", " Ataxia: 319\r\n", " Pes planus: 315\r\n", " Bilateral tonic-clonic seizure: 314\r\n", " Bulbous nose: 311\r\n", " Intellectual disability, severe: 306\r\n", " Nystagmus: 298\r\n", " Absent speech: 294\r\n", " Midface retrusion: 290\r\n", " Bicuspid aortic valve: 288\r\n", " Deeply set eye: 283\r\n", " Delayed ability to walk: 282\r\n", " Pulmonic stenosis: 280\r\n", " Cryptorchidism: 279\r\n", " Talipes equinovarus: 277\r\n", " Attention deficit hyperactivity disorder: 275\r\n", " Macrocephaly: 275\r\n", " Recurrent otitis media: 275\r\n", " Depressed nasal bridge: 273\r\n", " Abnormality of the hand: 273\r\n", " Autism: 270\r\n", " Macrodontia: 266\r\n", " Dystonia: 265\r\n", " Narrow forehead: 261\r\n", " Smooth philtrum: 249\r\n", " Microtia: 248\r\n", " Inguinal hernia: 247\r\n", " Upslanted palpebral fissure: 246\r\n", " Ventriculomegaly: 240\r\n", " Synophrys: 236\r\n", " Ectopia lentis: 234\r\n", " Cerebellar atrophy: 234\r\n", " Thin corpus callosum: 231\r\n", " EEG abnormality: 230\r\n", " Short philtrum: 226\r\n", " Arachnodactyly: 224\r\n", " Short neck: 223\r\n", " Highly arched eyebrow: 221\r\n", " Epileptic encephalopathy: 219\r\n", " Developmental regression: 218\r\n", " Generalized tonic seizure: 218\r\n", " Protruding ear: 217\r\n", " Atrial septal defect: 213\r\n", " Umbilical hernia: 213\r\n", " Cerebral atrophy: 212\r\n", " Atrioventricular canal defect: 206\r\n", " Low anterior hairline: 203\r\n", " Focal impaired awareness seizure: 199\r\n", " Mitral valve prolapse: 199\r\n", " Hypsarrhythmia: 198\r\n", " Delayed skeletal maturation: 198\r\n", " Intrauterine growth retardation: 196\r\n", " Spasticity: 192\r\n", " Hypoplasia of the corpus callosum: 192\r\n", " Growth delay: 186\r\n", " Aortic root aneurysm: 181\r\n", " Severe global developmental delay: 173\r\n", " Multifocal epileptiform discharges: 169\r\n", " Mandibular prognathia: 167\r\n", " Dysarthria: 167\r\n", " Blue sclerae: 166\r\n", " Patent ductus arteriosus: 166\r\n", " Proptosis: 164\r\n", " Cataract: 162\r\n", "\r\n" ] } ], "source": [ "!linkml-store -d mongodb://localhost:27017 -c main fq -S phenotypicFeatures.type.label -O yaml\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T22:37:48.003015Z", "start_time": "2024-07-01T22:37:45.210046Z" } }, "id": "db26d37f9e60283d" }, { "cell_type": "code", "execution_count": 36, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "diseases.term.label+subject.sex:\r\n", " ('KBG syndrome', 'MALE'): 175\r\n", " ('KBG syndrome', 'FEMALE'): 143\r\n", " ('Glass syndrome', 'MALE'): 90\r\n", " ('Glass syndrome', 'FEMALE'): 62\r\n", " ('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 'MALE'): 58\r\n", " ('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities', 'MALE'): 54\r\n", " ('Jacobsen syndrome', 'FEMALE'): 49\r\n", " ('Coffin-Siris syndrome 8', 'MALE'): 37\r\n", " ('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 'FEMALE'): 37\r\n", " ('Kabuki Syndrome 1', 'FEMALE'): 35\r\n", " ('Houge-Janssen syndrome 2', 'MALE'): 32\r\n", " ('Kabuki Syndrome 1', 'MALE'): 30\r\n", " ('Developmental delay, dysmorphic facies, and brain anomalies', 'FEMALE'): 29\r\n", " ('Intellectual developmental disorder, autosomal dominant 21', 'MALE'): 28\r\n", " ('Cardiac, facial, and digital anomalies with developmental delay', 'MALE'): 28\r\n", " ('Holt-Oram syndrome', 'FEMALE'): 28\r\n", " ('Developmental and epileptic encephalopathy 28', 'FEMALE'): 27\r\n", " ('Loeys-Dietz syndrome 3', 'MALE'): 27\r\n", " ('Loeys-Dietz syndrome 4', 'MALE'): 26\r\n", " ('Hypomagnesemia 3, renal', 'MALE'): 26\r\n", " ('Marfan syndrome', 'MALE'): 26\r\n", " ('ZTTK SYNDROME', 'MALE'): 26\r\n", " ('ZTTK SYNDROME', 'FEMALE'): 26\r\n", " ('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 'MALE'): 26\r\n", " ('Intellectual developmental disorder, X-linked 112', 'MALE'): 26\r\n", " ('Marfan syndrome', 'FEMALE'): 24\r\n", " ('Houge-Janssen syndrome 2', 'FEMALE'): 24\r\n", " ('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 'FEMALE'): 24\r\n", " ('Ectopia lentis, familial', 'MALE'): 24\r\n", " ('Coffin-Siris syndrome 8', 'FEMALE'): 24\r\n", " ('Cardiomyopathy, dilated, 1A', 'MALE'): 23\r\n", " ('Loeys-Dietz syndrome 5', 'MALE'): 23\r\n", " ('Loeys-Dietz syndrome 3', 'FEMALE'): 22\r\n", " ('Holt-Oram syndrome', 'MALE'): 22\r\n", " ('Mitochondrial complex IV deficiency, nuclear type 2', 'MALE'): 22\r\n", " ('Kufor-Rakeb syndrome', 'MALE'): 21\r\n", " ('Cardiomyopathy, dilated, 1A', 'FEMALE'): 21\r\n", " ('Loeys-Dietz syndrome 5', 'FEMALE'): 20\r\n", " ('Ehlers-Danlos syndrome, vascular type', 'FEMALE'): 20\r\n", " ('Ectopia lentis, familial', 'FEMALE'): 20\r\n", " ('Jacobsen syndrome', 'MALE'): 20\r\n", " ('Developmental delay, dysmorphic facies, and brain anomalies', 'MALE'): 20\r\n", " ('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities', 'FEMALE'): 19\r\n", " ('Hypomagnesemia 3, renal', 'FEMALE'): 19\r\n", " ('Spastic ataxia 8, autosomal recessive, with hypomyelinating leukodystrophy', 'MALE'): 18\r\n", " ('Anemia, sideroblastic, and spinocerebellar ataxia', 'MALE'): 18\r\n", " ('LEOPARD syndrome 1', 'MALE'): 18\r\n", " ('Acrofacial dysostosis 1, Nager type', 'FEMALE'): 18\r\n", " ('Intellectual developmental disorder, autosomal dominant 21', 'FEMALE'): 18\r\n", " ('Cardiac, facial, and digital anomalies with developmental delay', 'FEMALE'): 17\r\n", " ('Albinism, oculocutaneous, type IV', 'FEMALE'): 17\r\n", " ('Developmental and epileptic encephalopathy 28', 'MALE'): 16\r\n", " ('Developmental delay with or without epilepsy', 'MALE'): 16\r\n", " ('Aarskog-Scott syndrome', 'MALE'): 16\r\n", " ('Spastic paraplegia 91, autosomal dominant, with or without cerebellar ataxia', 'FEMALE'): 15\r\n", " ('Marfan lipodystrophy syndrome', 'FEMALE'): 15\r\n", " ('Ehlers-Danlos syndrome, vascular type', 'MALE'): 15\r\n", " ('Spastic ataxia 8, autosomal recessive, with hypomyelinating leukodystrophy', 'FEMALE'): 15\r\n", " ('Sulfite oxidase deficiency', 'MALE'): 14\r\n", " ('Noonan syndrome 1', 'MALE'): 14\r\n", " ('Albinism, oculocutaneous, type IV', 'MALE'): 13\r\n", " ('Developmental and epileptic encephalopathy 5', 'FEMALE'): 13\r\n", " ('Noonan syndrome 1', 'FEMALE'): 13\r\n", " ('Developmental and epileptic encephalopathy 112', 'FEMALE'): 13\r\n", " ('Neurodevelopmental disorder with motor and language delay, ocular defects, and brain abnormalities', 'FEMALE'): 13\r\n", " ('Loeys-Dietz syndrome 2', 'MALE'): 13\r\n", " ('LEOPARD syndrome 1', 'FEMALE'): 13\r\n", " ('Spastic paraplegia 91, autosomal dominant, with or without cerebellar ataxia', 'MALE'): 13\r\n", " ('Ataxia-pancytopenia syndrome', 'MALE'): 12\r\n", " ('Hypotonia, infantile, with psychomotor retardation and characteristic facies 3', 'FEMALE'): 12\r\n", " ('Autoinflammatory syndrome, familial, with or without immunodeficiency', 'FEMALE'): 12\r\n", " ('Neurodevelopmental disorder with or without anomalies of the brain, eye, or heart', 'MALE'): 12\r\n", " ('Kufor-Rakeb syndrome', 'FEMALE'): 12\r\n", " ('Neurodevelopmental disorder with or without variable brain abnormalities', 'MALE'): 11\r\n", " ('Hypotonia, infantile, with psychomotor retardation and characteristic facies 3', 'MALE'): 11\r\n", " ('Acrofacial dysostosis, Cincinnati type', 'MALE'): 11\r\n", " ('HMG-CoA synthase-2 deficiency', 'MALE'): 11\r\n", " ('Sulfite oxidase deficiency', 'FEMALE'): 11\r\n", " ('Noonan syndrome 2', 'FEMALE'): 11\r\n", " ('Autoimmune polyendocrinopathy syndrome , type I, with or without reversible metaphyseal dysplasia', 'FEMALE'): 11\r\n", " ('Noonan syndrome 6', 'MALE'): 10\r\n", " ('EZH1-related neurodevelopmental disorder', 'FEMALE'): 10\r\n", " ('Cornelia de Lange syndrome 6', 'MALE'): 10\r\n", " ('Neurodevelopmental disorder with progressive microcephaly, spasticity, and brain anomalies', 'MALE'): 10\r\n", " ('Loeys-Dietz syndrome 6', 'FEMALE'): 10\r\n", " ('Spastic paraplegia 76, autosomal recessive', 'FEMALE'): 10\r\n", " ('Coffin-Siris syndrome 3', 'FEMALE'): 10\r\n", " ('Parkinson disease 15, autosomal recessive', 'MALE'): 9\r\n", " ('Neurodevelopmental disorder with or without variable brain abnormalities', 'FEMALE'): 9\r\n", " ('Developmental and epileptic encephalopathy 5', 'MALE'): 9\r\n", " ('Noonan syndrome 2', 'MALE'): 9\r\n", " ('Multiple mitochondrial dysfunctions syndrome 4', 'FEMALE'): 9\r\n", " ('Contractural arachnodactyly, congenital', 'FEMALE'): 9\r\n", " ('Distal renal tubular acidosis 1', 'FEMALE'): 9\r\n", " ('Immunoskeletal dysplasia with neurodevelopmental abnormalitie', 'FEMALE'): 9\r\n", " ('Distal renal tubular acidosis 1', 'MALE'): 9\r\n", " ('Joubert syndrome 10', 'MALE'): 9\r\n", " ('Ataxia-pancytopenia syndrome', 'FEMALE'): 9\r\n", " ('Intellectual developmental disorder, autosomal dominant 70', 'MALE'): 9\r\n", " ('Muscular dystrophy, limb-girdle, autosomal recessive 28', 'MALE'): 9\r\n", "\r\n" ] } ], "source": [ "!linkml-store -d mongodb://localhost:27017 -c main fq -S diseases.term.label+subject.sex -O yaml\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-07-01T22:55:55.579209Z", "start_time": "2024-07-01T22:55:52.297365Z" } }, "id": "93d79d7857e40e34" }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false }, "id": "a5513e9619e0edd9" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }