How to index Phenopackets with LinkML-Store

Use pystow to download phenopackets

We will download from the Monarch Initiative phenopacket-store

[1]:
import pandas as pd
import pystow
import yaml

path = pystow.ensure_untar("tmp", "phenopackets", url=" https://github.com/monarch-initiative/phenopacket-store/releases/latest/download/all_phenopackets.tgz")
[2]:
# iterate over all *.json files in the phenopackets directory and parse to an object
# we will recursively walk the path using os.walk ( we don't worry about loading yet)
import os
import json
objs = []
for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith(".json"):
            with open(os.path.join(root, file)) as stream:
                obj = json.load(stream)
                objs.append(obj)
len(objs)
[2]:
4876

Creating a client and attaching to a database

First we will create a client as normal:

[3]:
from linkml_store import Client

client = Client()

Next we’ll attach to a MongoDB instance. this assumes you have one running already.

We will make a database called “phenopackets” and recreate it if it already exists

(note for people running this notebook locally - if you happen to have a database with this name in your current mongo instance it will be deleted!)

[4]:
db = client.attach_database("mongodb://localhost:27017", "phenopackets", recreate_if_exists=True)

Creating a collection

We’ll create a simple test collection. The concept of collection in linkml-store maps directly to mongodb collections

[5]:
collection = db.create_collection("main", recreate_if_exists=True)

Inserting objects into the store

We’ll use the standard insert method to insert the phenopackets into the collection. At this stage there is no explicit schema.

[6]:
collection.insert(objs)

Check contents

We can check the number of rows in the collection, to ensure everything was inserted correctly:

[7]:
collection.find({}, limit=1).num_rows
[7]:
4876
[8]:
assert collection.find({}, limit=1).num_rows == len(objs)

Let’s check with pandas just to make sure it looks as expected; we’ll query for a specific OMIM disease:

[9]:
qr = collection.find({"diseases.term.id": "OMIM:618499"}, limit=3)
qr.rows_dataframe
[9]:
id subject phenotypicFeatures interpretations diseases metaData
0 PMID_28289718_Higgins-Patient-1 {'id': 'Higgins-Patient-1', 'timeAtLastEncount... [{'type': {'id': 'HP:0001714', 'label': 'Ventr... [{'id': 'Higgins-Patient-1', 'progressStatus':... [{'term': {'id': 'OMIM:618499', 'label': 'Noon... {'created': '2024-03-28T11:11:48.590163946Z', ...
1 PMID_31173466_Suzuki-Patient-1 {'id': 'Suzuki-Patient-1', 'timeAtLastEncounte... [{'type': {'id': 'HP:0001714', 'label': 'Ventr... [{'id': 'Suzuki-Patient-1', 'progressStatus': ... [{'term': {'id': 'OMIM:618499', 'label': 'Noon... {'created': '2024-03-28T11:11:48.594725131Z', ...
2 PMID_28289718_Higgins-Patient-2 {'id': 'Higgins-Patient-2', 'timeAtLastEncount... [{'type': {'id': 'HP:0001714', 'label': 'Ventr... [{'id': 'Higgins-Patient-2', 'progressStatus':... [{'term': {'id': 'OMIM:618499', 'label': 'Noon... {'created': '2024-03-28T11:11:48.592718124Z', ...

As expected, there are three rows with the OMIM disease 618499.

Query faceting

We will now demonstrate faceted queries, allowing us to count the number of instances of different categorical values or categorical value combinations.

First we’ll facet on the subject sex. We can use path notation, e.g. subject.sex here:

[10]:
collection.query_facets({}, facet_columns=["subject.sex"])
[10]:
{'subject.sex': [('MALE', 1807), ('FEMALE', 1564)]}

We can also facet by the disease name/label. We’ll restrict this to the top 20

[11]:
collection.query_facets({}, facet_columns=["diseases.term.label"], facet_limit=20)

[11]:
{'diseases.term.label': [('Developmental and epileptic encephalopathy 4', 463),
  ('Developmental and epileptic encephalopathy 11', 342),
  ('KBG syndrome', 337),
  ('Leber congenital amaurosis 6', 191),
  ('Glass syndrome', 158),
  ('Holt-Oram syndrome', 103),
  ('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 95),
  ('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities',
   73),
  ('Jacobsen syndrome', 69),
  ('Coffin-Siris syndrome 8', 65),
  ('Kabuki Syndrome 1', 65),
  ('Houge-Janssen syndrome 2', 60),
  ('ZTTK SYNDROME', 52),
  ('Greig cephalopolysyndactyly syndrome', 51),
  ('Seizures, benign familial infantile, 3', 51),
  ('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 50),
  ('Marfan syndrome', 50),
  ('Developmental delay, dysmorphic facies, and brain anomalies', 49),
  ('Loeys-Dietz syndrome 3', 49),
  ('Hypomagnesemia 3, renal', 46)]}
[12]:
collection.query_facets({}, facet_columns=["subject.timeAtLastEncounter.age.iso8601duration"], facet_limit=10)

[12]:
{'subject.timeAtLastEncounter.age.iso8601duration': [('P4Y', 131),
  ('P3Y', 114),
  ('P6Y', 100),
  ('P5Y', 97),
  ('P2Y', 95),
  ('P7Y', 85),
  ('P10Y', 82),
  ('P9Y', 77),
  ('P8Y', 71)]}
[13]:
collection.query_facets({}, facet_columns=["interpretations.diagnosis.genomicInterpretations.variantInterpretation.variationDescriptor.geneContext.symbol"], facet_limit=10)

[13]:
{'interpretations.diagnosis.genomicInterpretations.variantInterpretation.variationDescriptor.geneContext.symbol': [('STXBP1',
   463),
  ('SCN2A', 393),
  ('ANKRD11', 337),
  ('RPGRIP1', 273),
  ('SATB2', 158),
  ('FBN1', 151),
  ('LMNA', 127),
  ('FBXL4', 117),
  ('TBX5', 103),
  ('SPTAN1', 85)]}

We can also facet on combinations:

[14]:
fqr = collection.query_facets({}, facet_columns=[("subject.sex", "diseases.term.label")], facet_limit=20)
fqr

[14]:
{('subject.sex', 'diseases.term.label'): [(('MALE', 'KBG syndrome'), 175),
  (('FEMALE', 'KBG syndrome'), 143),
  (('MALE', 'Glass syndrome'), 90),
  (('FEMALE', 'Glass syndrome'), 62),
  (('MALE',
    'Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)'),
   58),
  (('MALE',
    'Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities'),
   54),
  (('FEMALE', 'Jacobsen syndrome'), 49),
  (('MALE', 'Coffin-Siris syndrome 8'), 37),
  (('FEMALE',
    'Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)'),
   37),
  (('FEMALE', 'Kabuki Syndrome 1'), 35),
  (('MALE', 'Houge-Janssen syndrome 2'), 32),
  (('MALE', 'Kabuki Syndrome 1'), 30),
  (('FEMALE', 'Developmental delay, dysmorphic facies, and brain anomalies'),
   29),
  (('FEMALE', 'Holt-Oram syndrome'), 28),
  (('MALE', 'Intellectual developmental disorder, autosomal dominant 21'), 28),
  (('MALE', 'Cardiac, facial, and digital anomalies with developmental delay'),
   28),
  (('FEMALE', 'Developmental and epileptic encephalopathy 28'), 27),
  (('MALE', 'Loeys-Dietz syndrome 3'), 27),
  (('MALE', 'ZTTK SYNDROME'), 26),
  (('FEMALE', 'ZTTK SYNDROME'), 26)]}
[17]:
from linkml_store.utils.pandas_utils import facet_summary_to_dataframe_unmelted

facet_summary_to_dataframe_unmelted(fqr)
[17]:
subject.sex diseases.term.label Value
0 MALE KBG syndrome 175
1 FEMALE KBG syndrome 143
2 MALE Glass syndrome 90
3 FEMALE Glass syndrome 62
4 MALE Mitochondrial DNA depletion syndrome 13 (encep... 58
5 MALE Neurodevelopmental disorder with coarse facies... 54
6 FEMALE Jacobsen syndrome 49
7 MALE Coffin-Siris syndrome 8 37
8 FEMALE Mitochondrial DNA depletion syndrome 13 (encep... 37
9 FEMALE Kabuki Syndrome 1 35
10 MALE Houge-Janssen syndrome 2 32
11 MALE Kabuki Syndrome 1 30
12 FEMALE Developmental delay, dysmorphic facies, and br... 29
13 FEMALE Holt-Oram syndrome 28
14 MALE Intellectual developmental disorder, autosomal... 28
15 MALE Cardiac, facial, and digital anomalies with de... 28
16 FEMALE Developmental and epileptic encephalopathy 28 27
17 MALE Loeys-Dietz syndrome 3 27
18 MALE ZTTK SYNDROME 26
19 FEMALE ZTTK SYNDROME 26

Semantic Search

Let’s query based on text criteria:

[22]:
qr = collection.search("patients with liver diseases")
qr.rows_dataframe[0:5]
[22]:
score id subject phenotypicFeatures interpretations diseases metaData
0 0.824639 PMID_30658709_patient {'id': 'patient', 'timeAtLastEncounter': {'age... [{'type': {'id': 'HP:0031956', 'label': 'Eleva... [{'id': 'patient', 'progressStatus': 'SOLVED',... [{'term': {'id': 'OMIM:615878', 'label': 'Chol... {'created': '2024-05-05T09:03:25.388371944Z', ...
1 0.824639 PMID_30658709_patient {'id': 'patient', 'timeAtLastEncounter': {'age... [{'type': {'id': 'HP:0031956', 'label': 'Eleva... [{'id': 'patient', 'progressStatus': 'SOLVED',... [{'term': {'id': 'OMIM:615878', 'label': 'Chol... {'created': '2024-05-05T09:03:25.388371944Z', ...
2 0.813770 PMID_36932076_Patient_1 {'id': 'Patient 1', 'timeAtLastEncounter': {'a... [{'type': {'id': 'HP:0000979', 'label': 'Purpu... [{'id': 'Patient 1', 'progressStatus': 'SOLVED... [{'term': {'id': 'OMIM:620376', 'label': 'Auto... {'created': '2024-04-19T06:07:57.188061952Z', ...
3 0.813770 PMID_36932076_Patient_1 {'id': 'Patient 1', 'timeAtLastEncounter': {'a... [{'type': {'id': 'HP:0000979', 'label': 'Purpu... [{'id': 'Patient 1', 'progressStatus': 'SOLVED... [{'term': {'id': 'OMIM:620376', 'label': 'Auto... {'created': '2024-04-19T06:07:57.188061952Z', ...
4 0.804126 PMID_37303127_6 {'id': '6', 'timeAtLastEncounter': {'age': {'i... [{'type': {'id': 'HP:0001397', 'label': 'Hepat... [{'id': '6', 'progressStatus': 'SOLVED', 'diag... [{'term': {'id': 'OMIM:151660', 'label': 'Lipo... {'created': '2024-03-23T17:41:42.999521017Z', ...

Let’s check the first one

[23]:
qr.ranked_rows[0]
[23]:
(0.824638728366563,
 {'id': 'PMID_30658709_patient',
  'subject': {'id': 'patient',
   'timeAtLastEncounter': {'age': {'iso8601duration': 'P1Y11M'}},
   'sex': 'FEMALE'},
  'phenotypicFeatures': [{'type': {'id': 'HP:0031956',
     'label': 'Elevated circulating aspartate aminotransferase concentration'},
    'onset': {'age': {'iso8601duration': 'P1Y11M'}}},
   {'type': {'id': 'HP:0031964',
     'label': 'Elevated circulating alanine aminotransferase concentration'},
    'onset': {'age': {'iso8601duration': 'P1Y11M'}}},
   {'type': {'id': 'HP:0003573', 'label': 'Increased total bilirubin'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0012202',
     'label': 'Increased serum bile acid concentration'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0002908', 'label': 'Conjugated hyperbilirubinemia'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0001433', 'label': 'Hepatosplenomegaly'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0001510', 'label': 'Growth delay'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0000989', 'label': 'Pruritus'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0000952', 'label': 'Jaundice'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0100810', 'label': 'Pointed helix'},
    'onset': {'age': {'iso8601duration': 'P6M'}}},
   {'type': {'id': 'HP:0002650', 'label': 'Scoliosis'}},
   {'type': {'id': 'HP:0003112',
     'label': 'Abnormal circulating amino acid concentration'},
    'excluded': True},
   {'type': {'id': 'HP:0001928', 'label': 'Abnormality of coagulation'},
    'excluded': True},
   {'type': {'id': 'HP:0010701', 'label': 'Abnormal immunoglobulin level'},
    'excluded': True},
   {'type': {'id': 'HP:0001627', 'label': 'Abnormal heart morphology'},
    'excluded': True}],
  'interpretations': [{'id': 'patient',
    'progressStatus': 'SOLVED',
    'diagnosis': {'disease': {'id': 'OMIM:615878',
      'label': 'Cholestasis, progressive familial intrahepatic 4'},
     'genomicInterpretations': [{'subjectOrBiosampleId': 'patient',
       'interpretationStatus': 'CAUSATIVE',
       'variantInterpretation': {'variationDescriptor': {'id': 'var_kKNGnjOxGXMbcoWzDGEJKVPIB',
         'geneContext': {'valueId': 'HGNC:11828', 'symbol': 'TJP2'},
         'expressions': [{'syntax': 'hgvs.c',
           'value': 'NM_004817.4:c.2355+1G>C'},
          {'syntax': 'hgvs.g', 'value': 'NC_000009.12:g.69238790G>C'}],
         'vcfRecord': {'genomeAssembly': 'hg38',
          'chrom': 'chr9',
          'pos': '69238790',
          'ref': 'G',
          'alt': 'C'},
         'moleculeContext': 'genomic',
         'allelicState': {'id': 'GENO:0000136', 'label': 'homozygous'}}}}]}}],
  'diseases': [{'term': {'id': 'OMIM:615878',
     'label': 'Cholestasis, progressive familial intrahepatic 4'},
    'onset': {'ontologyClass': {'id': 'HP:0003593',
      'label': 'Infantile onset'}}}],
  'metaData': {'created': '2024-05-05T09:03:25.388371944Z',
   'createdBy': 'ORCID:0000-0002-0736-9199',
   'resources': [{'id': 'geno',
     'name': 'Genotype Ontology',
     'url': 'http://purl.obolibrary.org/obo/geno.owl',
     'version': '2022-03-05',
     'namespacePrefix': 'GENO',
     'iriPrefix': 'http://purl.obolibrary.org/obo/GENO_'},
    {'id': 'hgnc',
     'name': 'HUGO Gene Nomenclature Committee',
     'url': 'https://www.genenames.org',
     'version': '06/01/23',
     'namespacePrefix': 'HGNC',
     'iriPrefix': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/'},
    {'id': 'omim',
     'name': 'An Online Catalog of Human Genes and Genetic Disorders',
     'url': 'https://www.omim.org',
     'version': 'January 4, 2023',
     'namespacePrefix': 'OMIM',
     'iriPrefix': 'https://www.omim.org/entry/'},
    {'id': 'so',
     'name': 'Sequence types and features ontology',
     'url': 'http://purl.obolibrary.org/obo/so.obo',
     'version': '2021-11-22',
     'namespacePrefix': 'SO',
     'iriPrefix': 'http://purl.obolibrary.org/obo/SO_'},
    {'id': 'hp',
     'name': 'human phenotype ontology',
     'url': 'http://purl.obolibrary.org/obo/hp.owl',
     'version': '2024-04-26',
     'namespacePrefix': 'HP',
     'iriPrefix': 'http://purl.obolibrary.org/obo/HP_'}],
   'phenopacketSchemaVersion': '2.0',
   'externalReferences': [{'id': 'PMID:30658709',
     'reference': 'https://pubmed.ncbi.nlm.nih.gov/30658709',
     'description': 'Novel compound heterozygote mutations of TJP2 in a Chinese child with progressive cholestatic liver disease'}]}})

We can combine semantic search with queries:

[24]:
qr = collection.search("patients with liver diseases", where={"subject.sex": "MALE"})
qr.rows_dataframe[0:5]
[24]:
score id subject phenotypicFeatures interpretations diseases metaData
0 0.813827 PMID_36932076_Patient_1 {'id': 'Patient 1', 'timeAtLastEncounter': {'a... [{'type': {'id': 'HP:0000979', 'label': 'Purpu... [{'id': 'Patient 1', 'progressStatus': 'SOLVED... [{'term': {'id': 'OMIM:620376', 'label': 'Auto... {'created': '2024-04-19T06:07:57.188061952Z', ...
1 0.813827 PMID_36932076_Patient_1 {'id': 'Patient 1', 'timeAtLastEncounter': {'a... [{'type': {'id': 'HP:0000979', 'label': 'Purpu... [{'id': 'Patient 1', 'progressStatus': 'SOLVED... [{'term': {'id': 'OMIM:620376', 'label': 'Auto... {'created': '2024-04-19T06:07:57.188061952Z', ...
2 0.799738 PMID_36932076_Patient_3 {'id': 'Patient 3', 'timeAtLastEncounter': {'a... [{'type': {'id': 'HP:0001511', 'label': 'Intra... [{'id': 'Patient 3', 'progressStatus': 'SOLVED... [{'term': {'id': 'OMIM:620376', 'label': 'Auto... {'created': '2024-04-19T06:07:57.190312862Z', ...
3 0.799738 PMID_36932076_Patient_3 {'id': 'Patient 3', 'timeAtLastEncounter': {'a... [{'type': {'id': 'HP:0001511', 'label': 'Intra... [{'id': 'Patient 3', 'progressStatus': 'SOLVED... [{'term': {'id': 'OMIM:620376', 'label': 'Auto... {'created': '2024-04-19T06:07:57.190312862Z', ...
4 0.799243 PMID_27536553_27536553_P3 {'id': '27536553_P3', 'timeAtLastEncounter': {... [{'type': {'id': 'HP:0001396', 'label': 'Chole... [{'id': '27536553_P3', 'progressStatus': 'SOLV... [{'term': {'id': 'OMIM:256810', 'label': 'Mito... {'created': '2024-03-23T19:28:35.688389062Z', ...

Validation

Next we will demonstrate validation over a whole collection.

Currently validating depends on a LinkML schema - we have previously copied this schema into the test folder. We will load the schema into the database object:

[25]:
db.load_schema_view("../../tests/input/schemas/phenopackets_linkml/phenopackets.yaml")

Quick sanity check to ensure that worked:

[26]:
list(db.schema_view.all_classes())[0:10]
[26]:
['Age',
 'AgeRange',
 'Dictionary',
 'Evidence',
 'ExternalReference',
 'File',
 'GestationalAge',
 'OntologyClass',
 'Procedure',
 'TimeElement']
[27]:
collection.metadata.type = "Phenopacket"
[28]:
from linkml_runtime.dumpers import yaml_dumper
for r in db.iter_validate_database():
    # known issue - https://github.com/monarch-initiative/phenopacket-store/issues/97
    if "is not of type 'integer'" in r.message:
        continue
    print(r.message[0:100])
    print(r)
    raise ValueError("Unexpected validation error")

Command Line Usage

We can also use the command line for all of the above operations.

For example, feceted queries:

[29]:
!linkml-store -d mongodb://localhost:27017 -c main fq -S subject.sex
{
  "subject.sex": {
    "MALE": 1807,
    "FEMALE": 1564
  }
}
[30]:
!linkml-store -d mongodb://localhost:27017 -c main fq -S phenotypicFeatures.type.label -O yaml

phenotypicFeatures.type.label:
  Global developmental delay: 1705
  Hypotonia: 1056
  Intellectual disability: 1028
  Seizure: 950
  Hypertelorism: 925
  Delayed speech and language development: 829
  Short stature: 806
  Microcephaly: 780
  Scoliosis: 702
  Feeding difficulties: 678
  Low-set ears: 598
  Autistic behavior: 519
  Motor delay: 518
  Downslanted palpebral fissures: 505
  Strabismus: 504
  Long philtrum: 500
  Ptosis: 498
  Patent foramen ovale: 469
  Anteverted nares: 461
  Hearing impairment: 451
  Epicanthus: 447
  Ventricular septal defect: 435
  Thick eyebrow: 433
  Cleft palate: 423
  Joint hypermobility: 388
  High palate: 383
  Triangular face: 369
  Micrognathia: 364
  Posteriorly rotated ears: 350
  Failure to thrive: 345
  Prominent forehead: 343
  Thin upper lip vermilion: 338
  Sleep abnormality: 331
  Wide nasal bridge: 331
  Infantile spasms: 325
  Long eyelashes: 325
  Pectus excavatum: 322
  Ataxia: 319
  Pes planus: 315
  Bilateral tonic-clonic seizure: 314
  Bulbous nose: 311
  Intellectual disability, severe: 306
  Nystagmus: 298
  Absent speech: 294
  Midface retrusion: 290
  Bicuspid aortic valve: 288
  Deeply set eye: 283
  Delayed ability to walk: 282
  Pulmonic stenosis: 280
  Cryptorchidism: 279
  Talipes equinovarus: 277
  Attention deficit hyperactivity disorder: 275
  Recurrent otitis media: 275
  Macrocephaly: 275
  Abnormality of the hand: 273
  Depressed nasal bridge: 273
  Autism: 270
  Macrodontia: 266
  Dystonia: 265
  Narrow forehead: 261
  Smooth philtrum: 249
  Microtia: 248
  Inguinal hernia: 247
  Upslanted palpebral fissure: 246
  Ventriculomegaly: 240
  Synophrys: 236
  Cerebellar atrophy: 234
  Ectopia lentis: 234
  Thin corpus callosum: 231
  EEG abnormality: 230
  Short philtrum: 226
  Arachnodactyly: 224
  Short neck: 223
  Highly arched eyebrow: 221
  Epileptic encephalopathy: 219
  Developmental regression: 218
  Generalized tonic seizure: 218
  Protruding ear: 217
  Atrial septal defect: 213
  Umbilical hernia: 213
  Cerebral atrophy: 212
  Atrioventricular canal defect: 206
  Low anterior hairline: 203
  Mitral valve prolapse: 199
  Focal impaired awareness seizure: 199
  Delayed skeletal maturation: 198
  Hypsarrhythmia: 198
  Intrauterine growth retardation: 196
  Hypoplasia of the corpus callosum: 192
  Spasticity: 192
  Growth delay: 186
  Aortic root aneurysm: 181
  Severe global developmental delay: 173
  Multifocal epileptiform discharges: 169
  Mandibular prognathia: 167
  Dysarthria: 167
  Patent ductus arteriosus: 166
  Blue sclerae: 166
  Proptosis: 164
  Cataract: 162

[31]:
!linkml-store -d mongodb://localhost:27017 -c main fq -S diseases.term.label+subject.sex -O yaml

diseases.term.label+subject.sex:
  ('KBG syndrome', 'MALE'): 175
  ('KBG syndrome', 'FEMALE'): 143
  ('Glass syndrome', 'MALE'): 90
  ('Glass syndrome', 'FEMALE'): 62
  ('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 'MALE'): 58
  ('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities', 'MALE'): 54
  ('Jacobsen syndrome', 'FEMALE'): 49
  ('Coffin-Siris syndrome 8', 'MALE'): 37
  ('Mitochondrial DNA depletion syndrome 13 (encephalomyopathic type)', 'FEMALE'): 37
  ('Kabuki Syndrome 1', 'FEMALE'): 35
  ('Houge-Janssen syndrome 2', 'MALE'): 32
  ('Kabuki Syndrome 1', 'MALE'): 30
  ('Developmental delay, dysmorphic facies, and brain anomalies', 'FEMALE'): 29
  ('Intellectual developmental disorder, autosomal dominant 21', 'MALE'): 28
  ('Holt-Oram syndrome', 'FEMALE'): 28
  ('Cardiac, facial, and digital anomalies with developmental delay', 'MALE'): 28
  ('Loeys-Dietz syndrome 3', 'MALE'): 27
  ('Developmental and epileptic encephalopathy 28', 'FEMALE'): 27
  ('ZTTK SYNDROME', 'FEMALE'): 26
  ('ZTTK SYNDROME', 'MALE'): 26
  ('Loeys-Dietz syndrome 4', 'MALE'): 26
  ('Marfan syndrome', 'MALE'): 26
  ('Hypomagnesemia 3, renal', 'MALE'): 26
  ('Intellectual developmental disorder, X-linked 112', 'MALE'): 26
  ('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 'MALE'): 26
  ('Marfan syndrome', 'FEMALE'): 24
  ('Ectopia lentis, familial', 'MALE'): 24
  ('Coffin-Siris syndrome 8', 'FEMALE'): 24
  ('Mitochondrial DNA depletion syndrome 6 (hepatocerebral type)', 'FEMALE'): 24
  ('Houge-Janssen syndrome 2', 'FEMALE'): 24
  ('Cardiomyopathy, dilated, 1A', 'MALE'): 23
  ('Loeys-Dietz syndrome 5', 'MALE'): 23
  ('Holt-Oram syndrome', 'MALE'): 22
  ('Mitochondrial complex IV deficiency, nuclear type 2', 'MALE'): 22
  ('Loeys-Dietz syndrome 3', 'FEMALE'): 22
  ('Cardiomyopathy, dilated, 1A', 'FEMALE'): 21
  ('Kufor-Rakeb syndrome', 'MALE'): 21
  ('Jacobsen syndrome', 'MALE'): 20
  ('Developmental delay, dysmorphic facies, and brain anomalies', 'MALE'): 20
  ('Ectopia lentis, familial', 'FEMALE'): 20
  ('Ehlers-Danlos syndrome, vascular type', 'FEMALE'): 20
  ('Loeys-Dietz syndrome 5', 'FEMALE'): 20
  ('Neurodevelopmental disorder with coarse facies and mild distal skeletal abnormalities', 'FEMALE'): 19
  ('Hypomagnesemia 3, renal', 'FEMALE'): 19
  ('Intellectual developmental disorder, autosomal dominant 21', 'FEMALE'): 18
  ('Acrofacial dysostosis 1, Nager type', 'FEMALE'): 18
  ('LEOPARD syndrome 1', 'MALE'): 18
  ('Anemia, sideroblastic, and spinocerebellar ataxia', 'MALE'): 18
  ('Spastic ataxia 8, autosomal recessive, with hypomyelinating leukodystrophy', 'MALE'): 18
  ('Albinism, oculocutaneous, type IV', 'FEMALE'): 17
  ('Cardiac, facial, and digital anomalies with developmental delay', 'FEMALE'): 17
  ('Developmental and epileptic encephalopathy 28', 'MALE'): 16
  ('Developmental delay with or without epilepsy', 'MALE'): 16
  ('Aarskog-Scott syndrome', 'MALE'): 16
  ('Ehlers-Danlos syndrome, vascular type', 'MALE'): 15
  ('Spastic paraplegia 91, autosomal dominant, with or without cerebellar ataxia', 'FEMALE'): 15
  ('Spastic ataxia 8, autosomal recessive, with hypomyelinating leukodystrophy', 'FEMALE'): 15
  ('Marfan lipodystrophy syndrome', 'FEMALE'): 15
  ('Noonan syndrome 1', 'MALE'): 14
  ('Sulfite oxidase deficiency', 'MALE'): 14
  ('Spastic paraplegia 91, autosomal dominant, with or without cerebellar ataxia', 'MALE'): 13
  ('Developmental and epileptic encephalopathy 112', 'FEMALE'): 13
  ('Noonan syndrome 1', 'FEMALE'): 13
  ('Albinism, oculocutaneous, type IV', 'MALE'): 13
  ('Neurodevelopmental disorder with motor and language delay, ocular defects, and brain abnormalities', 'FEMALE'): 13
  ('Developmental and epileptic encephalopathy 5', 'FEMALE'): 13
  ('LEOPARD syndrome 1', 'FEMALE'): 13
  ('Loeys-Dietz syndrome 2', 'MALE'): 13
  ('Kufor-Rakeb syndrome', 'FEMALE'): 12
  ('Ataxia-pancytopenia syndrome', 'MALE'): 12
  ('Autoinflammatory syndrome, familial, with or without immunodeficiency', 'FEMALE'): 12
  ('Neurodevelopmental disorder with or without anomalies of the brain, eye, or heart', 'MALE'): 12
  ('Hypotonia, infantile, with psychomotor retardation and characteristic facies 3', 'FEMALE'): 12
  ('Acrofacial dysostosis, Cincinnati type', 'MALE'): 11
  ('Noonan syndrome 2', 'FEMALE'): 11
  ('Sulfite oxidase deficiency', 'FEMALE'): 11
  ('HMG-CoA synthase-2 deficiency', 'MALE'): 11
  ('Hypotonia, infantile, with psychomotor retardation and characteristic facies 3', 'MALE'): 11
  ('Neurodevelopmental disorder with or without variable brain abnormalities', 'MALE'): 11
  ('Autoimmune polyendocrinopathy syndrome , type I, with or without reversible metaphyseal dysplasia', 'FEMALE'): 11
  ('Neurodevelopmental disorder with progressive microcephaly, spasticity, and brain anomalies', 'MALE'): 10
  ('Spastic paraplegia 76, autosomal recessive', 'FEMALE'): 10
  ('Coffin-Siris syndrome 3', 'FEMALE'): 10
  ('Noonan syndrome 6', 'MALE'): 10
  ('Loeys-Dietz syndrome 6', 'FEMALE'): 10
  ('Cornelia de Lange syndrome 6', 'MALE'): 10
  ('EZH1-related neurodevelopmental disorder', 'FEMALE'): 10
  ('Multiple mitochondrial dysfunctions syndrome 4', 'FEMALE'): 9
  ('Intellectual developmental disorder, autosomal dominant 70', 'MALE'): 9
  ('Neurodevelopmental disorder with or without variable brain abnormalities', 'FEMALE'): 9
  ('Developmental and epileptic encephalopathy 5', 'MALE'): 9
  ('Distal renal tubular acidosis 1', 'FEMALE'): 9
  ('Developmental and epileptic encephalopathy 112', 'MALE'): 9
  ('Noonan syndrome 2', 'MALE'): 9
  ('Parkinson disease 15, autosomal recessive', 'MALE'): 9
  ('Ataxia-pancytopenia syndrome', 'FEMALE'): 9
  ('Muscular dystrophy, limb-girdle, autosomal recessive 28', 'MALE'): 9
  ('Immunoskeletal dysplasia with neurodevelopmental abnormalitie', 'FEMALE'): 9
  ('Joubert syndrome 10', 'MALE'): 9
  ('Contractural arachnodactyly, congenital', 'FEMALE'): 9

Inference

[32]:
from linkml_store.inference import get_inference_engine

predictor = get_inference_engine("sklearn")
[33]:
predictor.load_and_split_data(collection)

[ ]:
predictor.config.target_attributes = ["diseases.term.label"]