How to index Phenopackets with LinkML-Store
Use pystow to download phenopackets
We will download from the Monarch Initiative phenopacket-store
[1]:
import pandas as pd
import pystow
import yaml
path = pystow.ensure_untar("tmp", "phenopackets", url=" https://github.com/monarch-initiative/phenopacket-store/releases/latest/download/all_phenopackets.tgz")
[2]:
# iterate over all *.json files in the phenopackets directory and parse to an object
# we will recursively walk the path using os.walk ( we don't worry about loading yet)
import os
import json
objs = []
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".json"):
with open(os.path.join(root, file)) as stream:
obj = json.load(stream)
objs.append(obj)
len(objs)
[2]:
4876
Creating a client and attaching to a database
First we will create a client as normal:
[3]:
from linkml_store import Client
client = Client()
Next we’ll attach to a MongoDB instance. this assumes you have one running already.
We will make a database called “phenopackets” and recreate it if it already exists
(note for people running this notebook locally - if you happen to have a database with this name in your current mongo instance it will be deleted!)
[4]:
db = client.attach_database("mongodb://localhost:27017", "phenopackets", recreate_if_exists=True)
Creating a collection
We’ll create a simple test collection. The concept of collection in linkml-store maps directly to mongodb collections
[5]:
collection = db.create_collection("main", recreate_if_exists=True)
Inserting objects into the store
We’ll use the standard insert
method to insert the phenopackets into the collection. At this stage there is no explicit schema.
[6]:
collection.insert(objs)
Check contents
We can check the number of rows in the collection, to ensure everything was inserted correctly:
[7]:
collection.find({}, limit=1).num_rows
[7]:
4876
[8]:
assert collection.find({}, limit=1).num_rows == len(objs)
Let’s check with pandas just to make sure it looks as expected; we’ll query for a specific OMIM disease:
[9]:
qr = collection.find({"diseases.term.id": "OMIM:618499"}, limit=3)
qr.rows_dataframe
[9]:
id | subject | phenotypicFeatures | interpretations | diseases | metaData | |
---|---|---|---|---|---|---|
0 | PMID_28289718_Higgins-Patient-1 | {'id': 'Higgins-Patient-1', 'timeAtLastEncount... | [{'type': {'id': 'HP:0001714', 'label': 'Ventr... | [{'id': 'Higgins-Patient-1', 'progressStatus':... | [{'term': {'id': 'OMIM:618499', 'label': 'Noon... | {'created': '2024-03-28T11:11:48.590163946Z', ... |
1 | PMID_31173466_Suzuki-Patient-1 | {'id': 'Suzuki-Patient-1', 'timeAtLastEncounte... | [{'type': {'id': 'HP:0001714', 'label': 'Ventr... | [{'id': 'Suzuki-Patient-1', 'progressStatus': ... | [{'term': {'id': 'OMIM:618499', 'label': 'Noon... | {'created': '2024-03-28T11:11:48.594725131Z', ... |
2 | PMID_28289718_Higgins-Patient-2 | {'id': 'Higgins-Patient-2', 'timeAtLastEncount... | [{'type': {'id': 'HP:0001714', 'label': 'Ventr... | [{'id': 'Higgins-Patient-2', 'progressStatus':... | [{'term': {'id': 'OMIM:618499', 'label': 'Noon... | {'created': '2024-03-28T11:11:48.592718124Z', ... |
As expected, there are three rows with the OMIM disease 618499.
Query faceting
We will now demonstrate faceted queries, allowing us to count the number of instances of different categorical values or categorical value combinations.
First we’ll facet on the subject sex. We can use path notation, e.g. subject.sex
here:
[10]:
collection.query_facets({}, facet_columns=["subject.sex"])
[10]:
{'subject.sex': [('MALE', 1807), ('FEMALE', 1564)]}
We can also facet by the disease name/label. We’ll restrict this to the top 20
[11]:
collection.query_facets({}, facet_columns=["diseases.term.label"], facet_limit=20)
[11]:
{'diseases.term.label': [('Developmental and epileptic encephalopathy 4', 463),
('Developmental and epileptic encephalopathy 11', 342),
('KBG syndrome', 337),
('Leber congenital amaurosis 6', 191),
('Glass syndrome', 158),
('Holt-Oram syndrome', 103),
('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 95),
('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities',
73),
('Jacobsen syndrome', 69),
('Coffin-Siris syndrome 8', 65),
('Kabuki Syndrome 1', 65),
('Houge-Janssen syndrome 2', 60),
('ZTTK SYNDROME', 52),
('Greig cephalopolysyndactyly syndrome', 51),
('Seizures, benign familial infantile, 3', 51),
('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 50),
('Marfan syndrome', 50),
('Developmental delay, dysmorphic facies, and brain anomalies', 49),
('Loeys-Dietz syndrome 3', 49),
('Hypomagnesemia 3, renal', 46)]}
[12]:
collection.query_facets({}, facet_columns=["subject.timeAtLastEncounter.age.iso8601duration"], facet_limit=10)
[12]:
{'subject.timeAtLastEncounter.age.iso8601duration': [('P4Y', 131),
('P3Y', 114),
('P6Y', 100),
('P5Y', 97),
('P2Y', 95),
('P7Y', 85),
('P10Y', 82),
('P9Y', 77),
('P8Y', 71)]}
[13]:
collection.query_facets({}, facet_columns=["interpretations.diagnosis.genomicInterpretations.variantInterpretation.variationDescriptor.geneContext.symbol"], facet_limit=10)
[13]:
{'interpretations.diagnosis.genomicInterpretations.variantInterpretation.variationDescriptor.geneContext.symbol': [('STXBP1',
463),
('SCN2A', 393),
('ANKRD11', 337),
('RPGRIP1', 273),
('SATB2', 158),
('FBN1', 151),
('LMNA', 127),
('FBXL4', 117),
('TBX5', 103),
('SPTAN1', 85)]}
We can also facet on combinations:
[14]:
fqr = collection.query_facets({}, facet_columns=[("subject.sex", "diseases.term.label")], facet_limit=20)
fqr
[14]:
{('subject.sex', 'diseases.term.label'): [(('MALE', 'KBG syndrome'), 175),
(('FEMALE', 'KBG syndrome'), 143),
(('MALE', 'Glass syndrome'), 90),
(('FEMALE', 'Glass syndrome'), 62),
(('MALE',
'Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)'),
58),
(('MALE',
'Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities'),
54),
(('FEMALE', 'Jacobsen syndrome'), 49),
(('MALE', 'Coffin-Siris syndrome 8'), 37),
(('FEMALE',
'Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)'),
37),
(('FEMALE', 'Kabuki Syndrome 1'), 35),
(('MALE', 'Houge-Janssen syndrome 2'), 32),
(('MALE', 'Kabuki Syndrome 1'), 30),
(('FEMALE', 'Developmental delay, dysmorphic facies, and brain anomalies'),
29),
(('FEMALE', 'Holt-Oram syndrome'), 28),
(('MALE', 'Intellectual developmental disorder, autosomal dominant 21'), 28),
(('MALE', 'Cardiac, facial, and digital anomalies with developmental delay'),
28),
(('FEMALE', 'Developmental and epileptic encephalopathy 28'), 27),
(('MALE', 'Loeys-Dietz syndrome 3'), 27),
(('MALE', 'ZTTK SYNDROME'), 26),
(('FEMALE', 'ZTTK SYNDROME'), 26)]}
[17]:
from linkml_store.utils.pandas_utils import facet_summary_to_dataframe_unmelted
facet_summary_to_dataframe_unmelted(fqr)
[17]:
subject.sex | diseases.term.label | Value | |
---|---|---|---|
0 | MALE | KBG syndrome | 175 |
1 | FEMALE | KBG syndrome | 143 |
2 | MALE | Glass syndrome | 90 |
3 | FEMALE | Glass syndrome | 62 |
4 | MALE | Mitochondrial DNA depletion syndrome 13 (encep... | 58 |
5 | MALE | Neurodevelopmental disorder with coarse facies... | 54 |
6 | FEMALE | Jacobsen syndrome | 49 |
7 | MALE | Coffin-Siris syndrome 8 | 37 |
8 | FEMALE | Mitochondrial DNA depletion syndrome 13 (encep... | 37 |
9 | FEMALE | Kabuki Syndrome 1 | 35 |
10 | MALE | Houge-Janssen syndrome 2 | 32 |
11 | MALE | Kabuki Syndrome 1 | 30 |
12 | FEMALE | Developmental delay, dysmorphic facies, and br... | 29 |
13 | FEMALE | Holt-Oram syndrome | 28 |
14 | MALE | Intellectual developmental disorder, autosomal... | 28 |
15 | MALE | Cardiac, facial, and digital anomalies with de... | 28 |
16 | FEMALE | Developmental and epileptic encephalopathy 28 | 27 |
17 | MALE | Loeys-Dietz syndrome 3 | 27 |
18 | MALE | ZTTK SYNDROME | 26 |
19 | FEMALE | ZTTK SYNDROME | 26 |
Semantic Search
We will index phenopackets using a template that extracts the subject, phenotypic features and diseases.
First we will create a textualization template for a phenopacket. We will keep it minimal for simplicity - this doesn’t include treatments, families, etc.
[18]:
template = """
subject: {{subject}}
phenotypes: {% for p in phenotypicFeatures %}{{p.type.label}}{% endfor %}
diseases: {% for d in diseases %}{{d.term.label}}{% endfor %}
"""
Next we will create an indexer using the template. This will use the Jinja2 syntax for templating. We will also cache LLM embedding queries, so if we want to incrementally add new phenopackets we can avoid re-running the LLM embeddings calls.
[19]:
from linkml_store.index.implementations.llm_indexer import LLMIndexer
index = LLMIndexer(
name="ppkt",
cached_embeddings_database="tmp/llm_pheno_cache.db",
text_template=template,
text_template_syntax="jinja2",
)
We can test the template on the first row of the collection:
[20]:
print(index.object_to_text(qr.rows[0]))
subject: {'id': 'Higgins-Patient-1', 'timeAtLastEncounter': {'age': {'iso8601duration': 'P17Y'}}, 'sex': 'FEMALE'}
phenotypes: Ventricular hypertrophyHeart murmurHypertrophic cardiomyopathyShort statureHypertelorismLow-set earsPosteriorly rotated earsGlobal developmental delayCognitive impairmentCardiac arrest
diseases: Noonan syndrome-11
That looks as expected. We can now attach the indexer to the collection and index the collection:
[21]:
collection.attach_indexer(index, auto_index=True)
Semantic Search
Let’s query based on text criteria:
[22]:
qr = collection.search("patients with liver diseases")
qr.rows_dataframe[0:5]
[22]:
score | id | subject | phenotypicFeatures | interpretations | diseases | metaData | |
---|---|---|---|---|---|---|---|
0 | 0.824639 | PMID_30658709_patient | {'id': 'patient', 'timeAtLastEncounter': {'age... | [{'type': {'id': 'HP:0031956', 'label': 'Eleva... | [{'id': 'patient', 'progressStatus': 'SOLVED',... | [{'term': {'id': 'OMIM:615878', 'label': 'Chol... | {'created': '2024-05-05T09:03:25.388371944Z', ... |
1 | 0.824639 | PMID_30658709_patient | {'id': 'patient', 'timeAtLastEncounter': {'age... | [{'type': {'id': 'HP:0031956', 'label': 'Eleva... | [{'id': 'patient', 'progressStatus': 'SOLVED',... | [{'term': {'id': 'OMIM:615878', 'label': 'Chol... | {'created': '2024-05-05T09:03:25.388371944Z', ... |
2 | 0.813770 | PMID_36932076_Patient_1 | {'id': 'Patient 1', 'timeAtLastEncounter': {'a... | [{'type': {'id': 'HP:0000979', 'label': 'Purpu... | [{'id': 'Patient 1', 'progressStatus': 'SOLVED... | [{'term': {'id': 'OMIM:620376', 'label': 'Auto... | {'created': '2024-04-19T06:07:57.188061952Z', ... |
3 | 0.813770 | PMID_36932076_Patient_1 | {'id': 'Patient 1', 'timeAtLastEncounter': {'a... | [{'type': {'id': 'HP:0000979', 'label': 'Purpu... | [{'id': 'Patient 1', 'progressStatus': 'SOLVED... | [{'term': {'id': 'OMIM:620376', 'label': 'Auto... | {'created': '2024-04-19T06:07:57.188061952Z', ... |
4 | 0.804126 | PMID_37303127_6 | {'id': '6', 'timeAtLastEncounter': {'age': {'i... | [{'type': {'id': 'HP:0001397', 'label': 'Hepat... | [{'id': '6', 'progressStatus': 'SOLVED', 'diag... | [{'term': {'id': 'OMIM:151660', 'label': 'Lipo... | {'created': '2024-03-23T17:41:42.999521017Z', ... |
Let’s check the first one
[23]:
qr.ranked_rows[0]
[23]:
(0.824638728366563,
{'id': 'PMID_30658709_patient',
'subject': {'id': 'patient',
'timeAtLastEncounter': {'age': {'iso8601duration': 'P1Y11M'}},
'sex': 'FEMALE'},
'phenotypicFeatures': [{'type': {'id': 'HP:0031956',
'label': 'Elevated circulating aspartate aminotransferase concentration'},
'onset': {'age': {'iso8601duration': 'P1Y11M'}}},
{'type': {'id': 'HP:0031964',
'label': 'Elevated circulating alanine aminotransferase concentration'},
'onset': {'age': {'iso8601duration': 'P1Y11M'}}},
{'type': {'id': 'HP:0003573', 'label': 'Increased total bilirubin'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0012202',
'label': 'Increased serum bile acid concentration'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0002908', 'label': 'Conjugated hyperbilirubinemia'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0001433', 'label': 'Hepatosplenomegaly'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0001510', 'label': 'Growth delay'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0000989', 'label': 'Pruritus'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0000952', 'label': 'Jaundice'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0100810', 'label': 'Pointed helix'},
'onset': {'age': {'iso8601duration': 'P6M'}}},
{'type': {'id': 'HP:0002650', 'label': 'Scoliosis'}},
{'type': {'id': 'HP:0003112',
'label': 'Abnormal circulating amino acid concentration'},
'excluded': True},
{'type': {'id': 'HP:0001928', 'label': 'Abnormality of coagulation'},
'excluded': True},
{'type': {'id': 'HP:0010701', 'label': 'Abnormal immunoglobulin level'},
'excluded': True},
{'type': {'id': 'HP:0001627', 'label': 'Abnormal heart morphology'},
'excluded': True}],
'interpretations': [{'id': 'patient',
'progressStatus': 'SOLVED',
'diagnosis': {'disease': {'id': 'OMIM:615878',
'label': 'Cholestasis, progressive familial intrahepatic 4'},
'genomicInterpretations': [{'subjectOrBiosampleId': 'patient',
'interpretationStatus': 'CAUSATIVE',
'variantInterpretation': {'variationDescriptor': {'id': 'var_kKNGnjOxGXMbcoWzDGEJKVPIB',
'geneContext': {'valueId': 'HGNC:11828', 'symbol': 'TJP2'},
'expressions': [{'syntax': 'hgvs.c',
'value': 'NM_004817.4:c.2355+1G>C'},
{'syntax': 'hgvs.g', 'value': 'NC_000009.12:g.69238790G>C'}],
'vcfRecord': {'genomeAssembly': 'hg38',
'chrom': 'chr9',
'pos': '69238790',
'ref': 'G',
'alt': 'C'},
'moleculeContext': 'genomic',
'allelicState': {'id': 'GENO:0000136', 'label': 'homozygous'}}}}]}}],
'diseases': [{'term': {'id': 'OMIM:615878',
'label': 'Cholestasis, progressive familial intrahepatic 4'},
'onset': {'ontologyClass': {'id': 'HP:0003593',
'label': 'Infantile onset'}}}],
'metaData': {'created': '2024-05-05T09:03:25.388371944Z',
'createdBy': 'ORCID:0000-0002-0736-9199',
'resources': [{'id': 'geno',
'name': 'Genotype Ontology',
'url': 'http://purl.obolibrary.org/obo/geno.owl',
'version': '2022-03-05',
'namespacePrefix': 'GENO',
'iriPrefix': 'http://purl.obolibrary.org/obo/GENO_'},
{'id': 'hgnc',
'name': 'HUGO Gene Nomenclature Committee',
'url': 'https://www.genenames.org',
'version': '06/01/23',
'namespacePrefix': 'HGNC',
'iriPrefix': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/'},
{'id': 'omim',
'name': 'An Online Catalog of Human Genes and Genetic Disorders',
'url': 'https://www.omim.org',
'version': 'January 4, 2023',
'namespacePrefix': 'OMIM',
'iriPrefix': 'https://www.omim.org/entry/'},
{'id': 'so',
'name': 'Sequence types and features ontology',
'url': 'http://purl.obolibrary.org/obo/so.obo',
'version': '2021-11-22',
'namespacePrefix': 'SO',
'iriPrefix': 'http://purl.obolibrary.org/obo/SO_'},
{'id': 'hp',
'name': 'human phenotype ontology',
'url': 'http://purl.obolibrary.org/obo/hp.owl',
'version': '2024-04-26',
'namespacePrefix': 'HP',
'iriPrefix': 'http://purl.obolibrary.org/obo/HP_'}],
'phenopacketSchemaVersion': '2.0',
'externalReferences': [{'id': 'PMID:30658709',
'reference': 'https://pubmed.ncbi.nlm.nih.gov/30658709',
'description': 'Novel compound heterozygote mutations of TJP2 in a Chinese child with progressive cholestatic liver disease'}]}})
We can combine semantic search with queries:
[24]:
qr = collection.search("patients with liver diseases", where={"subject.sex": "MALE"})
qr.rows_dataframe[0:5]
[24]:
score | id | subject | phenotypicFeatures | interpretations | diseases | metaData | |
---|---|---|---|---|---|---|---|
0 | 0.813827 | PMID_36932076_Patient_1 | {'id': 'Patient 1', 'timeAtLastEncounter': {'a... | [{'type': {'id': 'HP:0000979', 'label': 'Purpu... | [{'id': 'Patient 1', 'progressStatus': 'SOLVED... | [{'term': {'id': 'OMIM:620376', 'label': 'Auto... | {'created': '2024-04-19T06:07:57.188061952Z', ... |
1 | 0.813827 | PMID_36932076_Patient_1 | {'id': 'Patient 1', 'timeAtLastEncounter': {'a... | [{'type': {'id': 'HP:0000979', 'label': 'Purpu... | [{'id': 'Patient 1', 'progressStatus': 'SOLVED... | [{'term': {'id': 'OMIM:620376', 'label': 'Auto... | {'created': '2024-04-19T06:07:57.188061952Z', ... |
2 | 0.799738 | PMID_36932076_Patient_3 | {'id': 'Patient 3', 'timeAtLastEncounter': {'a... | [{'type': {'id': 'HP:0001511', 'label': 'Intra... | [{'id': 'Patient 3', 'progressStatus': 'SOLVED... | [{'term': {'id': 'OMIM:620376', 'label': 'Auto... | {'created': '2024-04-19T06:07:57.190312862Z', ... |
3 | 0.799738 | PMID_36932076_Patient_3 | {'id': 'Patient 3', 'timeAtLastEncounter': {'a... | [{'type': {'id': 'HP:0001511', 'label': 'Intra... | [{'id': 'Patient 3', 'progressStatus': 'SOLVED... | [{'term': {'id': 'OMIM:620376', 'label': 'Auto... | {'created': '2024-04-19T06:07:57.190312862Z', ... |
4 | 0.799243 | PMID_27536553_27536553_P3 | {'id': '27536553_P3', 'timeAtLastEncounter': {... | [{'type': {'id': 'HP:0001396', 'label': 'Chole... | [{'id': '27536553_P3', 'progressStatus': 'SOLV... | [{'term': {'id': 'OMIM:256810', 'label': 'Mito... | {'created': '2024-03-23T19:28:35.688389062Z', ... |
Validation
Next we will demonstrate validation over a whole collection.
Currently validating depends on a LinkML schema - we have previously copied this schema into the test folder. We will load the schema into the database object:
[25]:
db.load_schema_view("../../tests/input/schemas/phenopackets_linkml/phenopackets.yaml")
Quick sanity check to ensure that worked:
[26]:
list(db.schema_view.all_classes())[0:10]
[26]:
['Age',
'AgeRange',
'Dictionary',
'Evidence',
'ExternalReference',
'File',
'GestationalAge',
'OntologyClass',
'Procedure',
'TimeElement']
[27]:
collection.metadata.type = "Phenopacket"
[28]:
from linkml_runtime.dumpers import yaml_dumper
for r in db.iter_validate_database():
# known issue - https://github.com/monarch-initiative/phenopacket-store/issues/97
if "is not of type 'integer'" in r.message:
continue
print(r.message[0:100])
print(r)
raise ValueError("Unexpected validation error")
Command Line Usage
We can also use the command line for all of the above operations.
For example, feceted queries:
[29]:
!linkml-store -d mongodb://localhost:27017 -c main fq -S subject.sex
{
"subject.sex": {
"MALE": 1807,
"FEMALE": 1564
}
}
[30]:
!linkml-store -d mongodb://localhost:27017 -c main fq -S phenotypicFeatures.type.label -O yaml
phenotypicFeatures.type.label:
Global developmental delay: 1705
Hypotonia: 1056
Intellectual disability: 1028
Seizure: 950
Hypertelorism: 925
Delayed speech and language development: 829
Short stature: 806
Microcephaly: 780
Scoliosis: 702
Feeding difficulties: 678
Low-set ears: 598
Autistic behavior: 519
Motor delay: 518
Downslanted palpebral fissures: 505
Strabismus: 504
Long philtrum: 500
Ptosis: 498
Patent foramen ovale: 469
Anteverted nares: 461
Hearing impairment: 451
Epicanthus: 447
Ventricular septal defect: 435
Thick eyebrow: 433
Cleft palate: 423
Joint hypermobility: 388
High palate: 383
Triangular face: 369
Micrognathia: 364
Posteriorly rotated ears: 350
Failure to thrive: 345
Prominent forehead: 343
Thin upper lip vermilion: 338
Sleep abnormality: 331
Wide nasal bridge: 331
Infantile spasms: 325
Long eyelashes: 325
Pectus excavatum: 322
Ataxia: 319
Pes planus: 315
Bilateral tonic-clonic seizure: 314
Bulbous nose: 311
Intellectual disability, severe: 306
Nystagmus: 298
Absent speech: 294
Midface retrusion: 290
Bicuspid aortic valve: 288
Deeply set eye: 283
Delayed ability to walk: 282
Pulmonic stenosis: 280
Cryptorchidism: 279
Talipes equinovarus: 277
Attention deficit hyperactivity disorder: 275
Recurrent otitis media: 275
Macrocephaly: 275
Abnormality of the hand: 273
Depressed nasal bridge: 273
Autism: 270
Macrodontia: 266
Dystonia: 265
Narrow forehead: 261
Smooth philtrum: 249
Microtia: 248
Inguinal hernia: 247
Upslanted palpebral fissure: 246
Ventriculomegaly: 240
Synophrys: 236
Cerebellar atrophy: 234
Ectopia lentis: 234
Thin corpus callosum: 231
EEG abnormality: 230
Short philtrum: 226
Arachnodactyly: 224
Short neck: 223
Highly arched eyebrow: 221
Epileptic encephalopathy: 219
Developmental regression: 218
Generalized tonic seizure: 218
Protruding ear: 217
Atrial septal defect: 213
Umbilical hernia: 213
Cerebral atrophy: 212
Atrioventricular canal defect: 206
Low anterior hairline: 203
Mitral valve prolapse: 199
Focal impaired awareness seizure: 199
Delayed skeletal maturation: 198
Hypsarrhythmia: 198
Intrauterine growth retardation: 196
Hypoplasia of the corpus callosum: 192
Spasticity: 192
Growth delay: 186
Aortic root aneurysm: 181
Severe global developmental delay: 173
Multifocal epileptiform discharges: 169
Mandibular prognathia: 167
Dysarthria: 167
Patent ductus arteriosus: 166
Blue sclerae: 166
Proptosis: 164
Cataract: 162
[31]:
!linkml-store -d mongodb://localhost:27017 -c main fq -S diseases.term.label+subject.sex -O yaml
diseases.term.label+subject.sex:
('KBG syndrome', 'MALE'): 175
('KBG syndrome', 'FEMALE'): 143
('Glass syndrome', 'MALE'): 90
('Glass syndrome', 'FEMALE'): 62
('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 'MALE'): 58
('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities', 'MALE'): 54
('Jacobsen syndrome', 'FEMALE'): 49
('Coffin-Siris syndrome 8', 'MALE'): 37
('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 'FEMALE'): 37
('Kabuki Syndrome 1', 'FEMALE'): 35
('Houge-Janssen syndrome 2', 'MALE'): 32
('Kabuki Syndrome 1', 'MALE'): 30
('Developmental delay, dysmorphic facies, and brain anomalies', 'FEMALE'): 29
('Intellectual developmental disorder, autosomal dominant 21', 'MALE'): 28
('Holt-Oram syndrome', 'FEMALE'): 28
('Cardiac, facial, and digital anomalies with developmental delay', 'MALE'): 28
('Loeys-Dietz syndrome 3', 'MALE'): 27
('Developmental and epileptic encephalopathy 28', 'FEMALE'): 27
('ZTTK SYNDROME', 'FEMALE'): 26
('ZTTK SYNDROME', 'MALE'): 26
('Loeys-Dietz syndrome 4', 'MALE'): 26
('Marfan syndrome', 'MALE'): 26
('Hypomagnesemia 3, renal', 'MALE'): 26
('Intellectual developmental disorder, X-linked 112', 'MALE'): 26
('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 'MALE'): 26
('Marfan syndrome', 'FEMALE'): 24
('Ectopia lentis, familial', 'MALE'): 24
('Coffin-Siris syndrome 8', 'FEMALE'): 24
('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 'FEMALE'): 24
('Houge-Janssen syndrome 2', 'FEMALE'): 24
('Cardiomyopathy, dilated, 1A', 'MALE'): 23
('Loeys-Dietz syndrome 5', 'MALE'): 23
('Holt-Oram syndrome', 'MALE'): 22
('Mitochondrial complex IV deficiency, nuclear type 2', 'MALE'): 22
('Loeys-Dietz syndrome 3', 'FEMALE'): 22
('Cardiomyopathy, dilated, 1A', 'FEMALE'): 21
('Kufor-Rakeb syndrome', 'MALE'): 21
('Jacobsen syndrome', 'MALE'): 20
('Developmental delay, dysmorphic facies, and brain anomalies', 'MALE'): 20
('Ectopia lentis, familial', 'FEMALE'): 20
('Ehlers-Danlos syndrome, vascular type', 'FEMALE'): 20
('Loeys-Dietz syndrome 5', 'FEMALE'): 20
('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities', 'FEMALE'): 19
('Hypomagnesemia 3, renal', 'FEMALE'): 19
('Intellectual developmental disorder, autosomal dominant 21', 'FEMALE'): 18
('Acrofacial dysostosis 1, Nager type', 'FEMALE'): 18
('LEOPARD syndrome 1', 'MALE'): 18
('Anemia, sideroblastic, and spinocerebellar ataxia', 'MALE'): 18
('Spastic ataxia 8, autosomal recessive, with hypomyelinating leukodystrophy', 'MALE'): 18
('Albinism, oculocutaneous, type IV', 'FEMALE'): 17
('Cardiac, facial, and digital anomalies with developmental delay', 'FEMALE'): 17
('Developmental and epileptic encephalopathy 28', 'MALE'): 16
('Developmental delay with or without epilepsy', 'MALE'): 16
('Aarskog-Scott syndrome', 'MALE'): 16
('Ehlers-Danlos syndrome, vascular type', 'MALE'): 15
('Spastic paraplegia 91, autosomal dominant, with or without cerebellar ataxia', 'FEMALE'): 15
('Spastic ataxia 8, autosomal recessive, with hypomyelinating leukodystrophy', 'FEMALE'): 15
('Marfan lipodystrophy syndrome', 'FEMALE'): 15
('Noonan syndrome 1', 'MALE'): 14
('Sulfite oxidase deficiency', 'MALE'): 14
('Spastic paraplegia 91, autosomal dominant, with or without cerebellar ataxia', 'MALE'): 13
('Developmental and epileptic encephalopathy 112', 'FEMALE'): 13
('Noonan syndrome 1', 'FEMALE'): 13
('Albinism, oculocutaneous, type IV', 'MALE'): 13
('Neurodevelopmental disorder with motor and language delay, ocular defects, and brain abnormalities', 'FEMALE'): 13
('Developmental and epileptic encephalopathy 5', 'FEMALE'): 13
('LEOPARD syndrome 1', 'FEMALE'): 13
('Loeys-Dietz syndrome 2', 'MALE'): 13
('Kufor-Rakeb syndrome', 'FEMALE'): 12
('Ataxia-pancytopenia syndrome', 'MALE'): 12
('Autoinflammatory syndrome, familial, with or without immunodeficiency', 'FEMALE'): 12
('Neurodevelopmental disorder with or without anomalies of the brain, eye, or heart', 'MALE'): 12
('Hypotonia, infantile, with psychomotor retardation and characteristic facies 3', 'FEMALE'): 12
('Acrofacial dysostosis, Cincinnati type', 'MALE'): 11
('Noonan syndrome 2', 'FEMALE'): 11
('Sulfite oxidase deficiency', 'FEMALE'): 11
('HMG-CoA synthase-2 deficiency', 'MALE'): 11
('Hypotonia, infantile, with psychomotor retardation and characteristic facies 3', 'MALE'): 11
('Neurodevelopmental disorder with or without variable brain abnormalities', 'MALE'): 11
('Autoimmune polyendocrinopathy syndrome , type I, with or without reversible metaphyseal dysplasia', 'FEMALE'): 11
('Neurodevelopmental disorder with progressive microcephaly, spasticity, and brain anomalies', 'MALE'): 10
('Spastic paraplegia 76, autosomal recessive', 'FEMALE'): 10
('Coffin-Siris syndrome 3', 'FEMALE'): 10
('Noonan syndrome 6', 'MALE'): 10
('Loeys-Dietz syndrome 6', 'FEMALE'): 10
('Cornelia de Lange syndrome 6', 'MALE'): 10
('EZH1-related neurodevelopmental disorder', 'FEMALE'): 10
('Multiple mitochondrial dysfunctions syndrome 4', 'FEMALE'): 9
('Intellectual developmental disorder, autosomal dominant 70', 'MALE'): 9
('Neurodevelopmental disorder with or without variable brain abnormalities', 'FEMALE'): 9
('Developmental and epileptic encephalopathy 5', 'MALE'): 9
('Distal renal tubular acidosis 1', 'FEMALE'): 9
('Developmental and epileptic encephalopathy 112', 'MALE'): 9
('Noonan syndrome 2', 'MALE'): 9
('Parkinson disease 15, autosomal recessive', 'MALE'): 9
('Ataxia-pancytopenia syndrome', 'FEMALE'): 9
('Muscular dystrophy, limb-girdle, autosomal recessive 28', 'MALE'): 9
('Immunoskeletal dysplasia with neurodevelopmental abnormalitie', 'FEMALE'): 9
('Joubert syndrome 10', 'MALE'): 9
('Contractural arachnodactyly, congenital', 'FEMALE'): 9
Inference
[32]:
from linkml_store.inference import get_inference_engine
predictor = get_inference_engine("sklearn")
[33]:
predictor.load_and_split_data(collection)
[ ]:
predictor.config.target_attributes = ["diseases.term.label"]