{ "cells": [ { "cell_type": "markdown", "id": "113e1f5d2f048e03", "metadata": {}, "source": [ "# Perform LLM Inference\n", "\n", "This notebook demonstrates how to perform inference using LLMs.\n", "\n", "Whereas the [previous RAG example](Perform-RAG-Inference.ipynb) used existing examples,\n", "this will perform de-novo inference using a schema.\n", "\n", "Note that linkml-store is a data-first framework, the main emphasis is not on AI or LLMs. However, it does support a pluggable **Inference** framework, and one of the integrations is a simple LLM-based inference engine.\n", "\n", "For this notebook, we will be using the command line interface, but the same can be done programmatically using the Python API." ] }, { "cell_type": "markdown", "id": "966de1b52f388b87", "metadata": {}, "source": [ "## Loading the data into duckdb\n", "\n", "For this we will take all uniprot \"caution\" free text comments for human proteins and load them into a duckdb database." ] }, { "cell_type": "code", "execution_count": 1, "id": "da1ed3b6811477ee", "metadata": { "jupyter": { "is_executing": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserted 2390 objects from ../../tests/input/uniprot/uniprot-comments.tsv into collection 'Entry'.\n" ] } ], "source": [ "%%bash\n", "mkdir -p tmp\n", "rm -rf tmp/up.ddb\n", "linkml-store -d duckdb:///tmp/up.ddb -c Entry insert ../../tests/input/uniprot/uniprot-comments.tsv" ] }, { "cell_type": "markdown", "id": "88191ea890186dc9", "metadata": {}, "source": [ "Let's check what this looks like by using `describe` and examining the first entry:" ] }, { "cell_type": "code", "execution_count": 2, "id": "af9d9160e75afed4", "metadata": { "jupyter": { "is_executing": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " count unique top freq\n", "category 2390 1 2390\n", "id 2390 2284 EFC2_HUMAN 4\n", "text 2390 1383 Could be the product of a pseudogene 259\n" ] } ], "source": [ "%%bash\n", "linkml-store -d tmp/up.ddb describe" ] }, { "cell_type": "markdown", "id": "8a5c5a6e-c327-412f-9fa7-3720528f7ce9", "metadata": {}, "source": [ "## Introspecting the schema\n", "\n", "Here we will use a ready-made LinkML schema that has the categories we want to assign as a LinkML enum, with\n", "examples (examples in schemas help humans *and* LLMs)" ] }, { "cell_type": "code", "execution_count": 4, "id": "9d75e40f-37b5-45b9-9bb7-ff69486348b7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "name: uniprot-comments\n", "id: http://example.org/uniprot-comments\n", "imports:\n", "- linkml:types\n", "prefixes:\n", " linkml:\n", " prefix_prefix: linkml\n", " prefix_reference: https://w3id.org/linkml/\n", " up:\n", " prefix_prefix: up\n", " prefix_reference: http://example.org/tuniprot-comments\n", "default_prefix: up\n", "default_range: string\n", "enums:\n", " CommentCategory:\n", " name: CommentCategory\n", " permissible_values:\n", " FUNCTION_DISPUTED:\n", " text: FUNCTION_DISPUTED\n", " description: A caution indicating that a previously reported function has\n", " been challenged or disproven in subsequent studies; may warrant GO NOT annotation\n", " examples:\n", " - value: FUNCTION_DISPUTED\n", " description: Originally described for its in vitro hydrolytic activity towards\n", " dGMP, dAMP and dIMP. However, this was not confirmed in vivo\n", " FUNCTION_PREDICTION_ONLY:\n", " text: FUNCTION_PREDICTION_ONLY\n", " description: A caution indicating function is based only on computational\n", " prediction or sequence similarity\n", " examples:\n", " - value: FUNCTION_PREDICTION_ONLY\n", " description: Predicted to be involved in X based on sequence similarity\n", " FUNCTION_LACKS_EVIDENCE:\n", " text: FUNCTION_LACKS_EVIDENCE\n", " description: A caution indicating insufficient experimental evidence to support\n", " predicted function\n", " examples:\n", " - value: FUNCTION_LACKS_EVIDENCE\n", " description: In contrast to other Macro-domain containing proteins, lacks\n", " ADP-ribose glycohydrolase activity\n", " FUNCTION_DEBATED:\n", " text: FUNCTION_DEBATED\n", " description: A caution about ongoing scientific debate regarding function;\n", " differs from DISPUTED in lacking clear evidence against\n", " examples:\n", " - value: FUNCTION_DEBATED\n", " description: Was initially thought to act as a major regulator of cardiac\n", " hypertrophy... However, while PDE5A regulates nitric-oxide-generated cGMP,\n", " nitric oxide signaling is often depressed by heart disease, limiting its\n", " effect\n", " LOCALIZATION_DISPUTED:\n", " text: LOCALIZATION_DISPUTED\n", " description: A caution about conflicting or uncertain cellular localization\n", " evidence\n", " examples:\n", " - value: LOCALIZATION_DISPUTED\n", " description: Cellular localization remains to be finally defined. While\n", " most authors have deduced a localization at the basolateral side, other\n", " studies demonstrated an apical localization\n", " NAMING_CONFUSION:\n", " text: NAMING_CONFUSION\n", " description: A caution about potential confusion with similarly named proteins\n", " or historical naming issues\n", " examples:\n", " - value: NAMING_CONFUSION\n", " description: This protein should not be confused with the conventional myosin-1\n", " (MYH1);Was termed importin alpha-4\n", " GENE_COPY_NUMBER:\n", " text: GENE_COPY_NUMBER\n", " description: A caution about gene duplication or copy number that might affect\n", " interpretation\n", " examples:\n", " - value: GENE_COPY_NUMBER\n", " description: Maps to a duplicated region on chromosome 15; the gene is present\n", " in at least 3 almost identical copies\n", " EXPRESSION_MECHANISM_UNCLEAR:\n", " text: EXPRESSION_MECHANISM_UNCLEAR\n", " description: A caution about unclear or unusual mechanisms of gene expression\n", " or protein production\n", " examples:\n", " - value: EXPRESSION_MECHANISM_UNCLEAR\n", " description: This peptide has been shown to be biologically active but is\n", " the product of a mitochondrial gene. The mechanisms allowing the production\n", " and secretion of the peptide remain unclear\n", " SEQUENCE_FEATURE_MISSING:\n", " text: SEQUENCE_FEATURE_MISSING\n", " description: A caution about missing or unexpected sequence features that\n", " might affect function\n", " examples:\n", " - value: SEQUENCE_FEATURE_MISSING\n", " description: No predictable signal peptide\n", " SPECIES_DIFFERENCE:\n", " text: SPECIES_DIFFERENCE\n", " description: A caution about significant functional or property differences\n", " between orthologs\n", " examples:\n", " - value: SPECIES_DIFFERENCE\n", " description: Affinity and capacity of the transporter for endogenous substrates\n", " vary among orthologs. For endogenous compounds such as dopamine, histamine,\n", " serotonin and thiamine, mouse ortholog display higher affinity\n", " PUBLICATION_CONFLICT:\n", " text: PUBLICATION_CONFLICT\n", " description: A caution about conflicting published evidence or interpretation\n", " examples:\n", " - value: PUBLICATION_CONFLICT\n", " description: Although initially reported to transport carnitine across the\n", " hepatocyte membrane, another study was unable to verify this finding\n", " CLAIMS_RETRACTED:\n", " text: CLAIMS_RETRACTED\n", " description: A caution about function claims that were retracted or withdrawn\n", " examples:\n", " - value: CLAIMS_RETRACTED\n", " description: Has been reported to enhance netrin-induced phosphorylation\n", " of PAK1 and FYN... This article has been withdrawn by the authors\n", " PROTEIN_IDENTITY:\n", " text: PROTEIN_IDENTITY\n", " description: A caution about uncertainty in the identity or existence of distinct\n", " protein products\n", " examples:\n", " - value: PROTEIN_IDENTITY\n", " description: It is not known whether the so-called human ASE1 and human\n", " CAST proteins represent two sides of a single gene product\n", " FUNCTION_UNCERTAIN_INITIATION:\n", " text: FUNCTION_UNCERTAIN_INITIATION\n", " description: A caution about uncertainty in translation initiation site\n", " examples:\n", " - value: FUNCTION_UNCERTAIN_INITIATION\n", " description: It is uncertain whether Met-1 or Met-37 is the initiator\n", " PSEUDOGENE_STATUS:\n", " text: PSEUDOGENE_STATUS\n", " description: A caution about whether the gene encodes a protein\n", " examples:\n", " - value: PSEUDOGENE_STATUS\n", " description: Could be the product of a pseudogene\n", "classes:\n", " Entry:\n", " name: Entry\n", " attributes:\n", " id:\n", " name: id\n", " identifier: true\n", " category:\n", " name: category\n", " range: CommentCategory\n", " text:\n", " name: text\n", " description: The text of the comment\n", "source_file: ../../tests/input/uniprot/schema.yaml\n", "\n" ] } ], "source": [ "%%bash\n", "linkml-store -S ../../tests/input/uniprot/schema.yaml -d tmp/up.ddb schema" ] }, { "cell_type": "code", "execution_count": 52, "id": "079bba40-cb67-47b0-ba06-b2a0d43e6a86", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserted 15 objects from ../../tests/input/uniprot/uniprot-caution-cv.csv into collection 'cv'.\n" ] } ], "source": [ "%%bash\n", "linkml-store -d tmp/up.ddb -c cv insert ../../tests/input/uniprot/uniprot-caution-cv.csv" ] }, { "cell_type": "code", "execution_count": 53, "id": "e87bfb30-9488-45d3-a133-6fac124a842e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TERM: FUNCTION_DISPUTED\n", "DEFINITION: A caution indicating that a previously reported function has been challenged\n", " or disproven in subsequent studies; may warrant GO NOT annotation\n", "EXAMPLES: Originally described for its in vitro hydrolytic activity towards dGMP,\n", " dAMP and dIMP. However, this was not confirmed in vivo\n", "---\n", "TERM: FUNCTION_PREDICTION_ONLY\n", "DEFINITION: A caution indicating function is based only on computational prediction\n", " or sequence similarity\n", "EXAMPLES: Predicted to be involved in X based on sequence similarity\n", "---\n", "TERM: FUNCTION_LACKS_EVIDENCE\n", "DEFINITION: A caution indicating insufficient experimental evidence to support predicted\n", " function\n", "EXAMPLES: In contrast to other Macro-domain containing proteins, lacks ADP-ribose\n", " glycohydrolase activity\n", "\n" ] } ], "source": [ "%%bash\n", "linkml-store -d tmp/up.ddb::cv query --limit 3 -O yaml" ] }, { "cell_type": "code", "execution_count": 54, "id": "45da9e5fd1353ccb", "metadata": { "ExecuteTime": { "end_time": "2025-02-04T14:34:37.309647Z", "start_time": "2024-08-21T22:53:31.595517Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TERM: FUNCTION_DISPUTED\n", "DEFINITION: A caution indicating that a previously reported function has been challenged\n", " or disproven in subsequent studies; may warrant GO NOT annotation\n", "EXAMPLES: Originally described for its in vitro hydrolytic activity towards dGMP,\n", " dAMP and dIMP. However, this was not confirmed in vivo\n", "---\n", "TERM: FUNCTION_PREDICTION_ONLY\n", "DEFINITION: A caution indicating function is based only on computational prediction\n", " or sequence similarity\n", "EXAMPLES: Predicted to be involved in X based on sequence similarity\n", "---\n", "TERM: FUNCTION_LACKS_EVIDENCE\n", "DEFINITION: A caution indicating insufficient experimental evidence to support predicted\n", " function\n", "EXAMPLES: In contrast to other Macro-domain containing proteins, lacks ADP-ribose\n", " glycohydrolase activity\n", "\n" ] } ], "source": [ "%%bash\n", "linkml-store -d tmp/up.ddb query --limit 3 -O yaml" ] }, { "cell_type": "code", "execution_count": 55, "id": "7e75f8d0-5fda-4846-a979-416ffc2c7ab7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2025-02-04 11:08:15,383 - linkml_store.api.client - INFO - Initializing database: phenopackets, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,383 - linkml_store.api.client - INFO - Initializing database: mgi, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,383 - linkml_store.api.client - INFO - Initializing database: nmdc, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,383 - linkml_store.api.client - INFO - Initializing database: amigo, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,383 - linkml_store.api.client - INFO - Initializing database: gocams, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,383 - linkml_store.api.client - INFO - Initializing database: cadsr, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: mixs, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: mondo, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: hpoa, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: hpoa_mongo, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: hpoa_kg, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: maxoa, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: refmet, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: neo4j, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: gold, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Initializing database: nmdc_duckdb, base_dir: /Users/cjm\n", "2025-02-04 11:08:15,384 - linkml_store.api.client - INFO - Creating/attaching database: tmp/up.ddb\n", "2025-02-04 11:08:15,385 - linkml_store.api.client - INFO - Initializing databases\n", "2025-02-04 11:08:15,385 - linkml_store.api.client - INFO - Attaching tmp/up.ddb\n", "2025-02-04 11:08:15,388 - linkml_store.api.database - INFO - Setting schema view for duckdb:///tmp/up.ddb\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "name: uniprot-comments\n", "id: http://example.org/uniprot-comments\n", "imports:\n", "- linkml:types\n", "prefixes:\n", " linkml:\n", " prefix_prefix: linkml\n", " prefix_reference: https://w3id.org/linkml/\n", " up:\n", " prefix_prefix: up\n", " prefix_reference: http://example.org/tuniprot-comments\n", "default_prefix: up\n", "default_range: string\n", "enums:\n", " CommentCategory:\n", " name: CommentCategory\n", " permissible_values:\n", " FUNCTION_DISPUTED:\n", " text: FUNCTION_DISPUTED\n", " description: A caution indicating that a previously reported function has\n", " been challenged or disproven in subsequent studies; may warrant GO NOT annotation\n", " examples:\n", " - value: FUNCTION_DISPUTED\n", " description: Originally described for its in vitro hydrolytic activity towards\n", " dGMP, dAMP and dIMP. However, this was not confirmed in vivo\n", " FUNCTION_PREDICTION_ONLY:\n", " text: FUNCTION_PREDICTION_ONLY\n", " description: A caution indicating function is based only on computational\n", " prediction or sequence similarity\n", " examples:\n", " - value: FUNCTION_PREDICTION_ONLY\n", " description: Predicted to be involved in X based on sequence similarity\n", " FUNCTION_LACKS_EVIDENCE:\n", " text: FUNCTION_LACKS_EVIDENCE\n", " description: A caution indicating insufficient experimental evidence to support\n", " predicted function\n", " examples:\n", " - value: FUNCTION_LACKS_EVIDENCE\n", " description: In contrast to other Macro-domain containing proteins, lacks\n", " ADP-ribose glycohydrolase activity\n", " FUNCTION_DEBATED:\n", " text: FUNCTION_DEBATED\n", " description: A caution about ongoing scientific debate regarding function;\n", " differs from DISPUTED in lacking clear evidence against\n", " examples:\n", " - value: FUNCTION_DEBATED\n", " description: Was initially thought to act as a major regulator of cardiac\n", " hypertrophy... However, while PDE5A regulates nitric-oxide-generated cGMP,\n", " nitric oxide signaling is often depressed by heart disease, limiting its\n", " effect\n", " LOCALIZATION_DISPUTED:\n", " text: LOCALIZATION_DISPUTED\n", " description: A caution about conflicting or uncertain cellular localization\n", " evidence\n", " examples:\n", " - value: LOCALIZATION_DISPUTED\n", " description: Cellular localization remains to be finally defined. While\n", " most authors have deduced a localization at the basolateral side, other\n", " studies demonstrated an apical localization\n", " NAMING_CONFUSION:\n", " text: NAMING_CONFUSION\n", " description: A caution about potential confusion with similarly named proteins\n", " or historical naming issues\n", " examples:\n", " - value: NAMING_CONFUSION\n", " description: This protein should not be confused with the conventional myosin-1\n", " (MYH1);Was termed importin alpha-4\n", " GENE_COPY_NUMBER:\n", " text: GENE_COPY_NUMBER\n", " description: A caution about gene duplication or copy number that might affect\n", " interpretation\n", " examples:\n", " - value: GENE_COPY_NUMBER\n", " description: Maps to a duplicated region on chromosome 15; the gene is present\n", " in at least 3 almost identical copies\n", " EXPRESSION_MECHANISM_UNCLEAR:\n", " text: EXPRESSION_MECHANISM_UNCLEAR\n", " description: A caution about unclear or unusual mechanisms of gene expression\n", " or protein production\n", " examples:\n", " - value: EXPRESSION_MECHANISM_UNCLEAR\n", " description: This peptide has been shown to be biologically active but is\n", " the product of a mitochondrial gene. The mechanisms allowing the production\n", " and secretion of the peptide remain unclear\n", " SEQUENCE_FEATURE_MISSING:\n", " text: SEQUENCE_FEATURE_MISSING\n", " description: A caution about missing or unexpected sequence features that\n", " might affect function\n", " examples:\n", " - value: SEQUENCE_FEATURE_MISSING\n", " description: No predictable signal peptide\n", " SPECIES_DIFFERENCE:\n", " text: SPECIES_DIFFERENCE\n", " description: A caution about significant functional or property differences\n", " between orthologs\n", " examples:\n", " - value: SPECIES_DIFFERENCE\n", " description: Affinity and capacity of the transporter for endogenous substrates\n", " vary among orthologs. For endogenous compounds such as dopamine, histamine,\n", " serotonin and thiamine, mouse ortholog display higher affinity\n", " PUBLICATION_CONFLICT:\n", " text: PUBLICATION_CONFLICT\n", " description: A caution about conflicting published evidence or interpretation\n", " examples:\n", " - value: PUBLICATION_CONFLICT\n", " description: Although initially reported to transport carnitine across the\n", " hepatocyte membrane, another study was unable to verify this finding\n", " CLAIMS_RETRACTED:\n", " text: CLAIMS_RETRACTED\n", " description: A caution about function claims that were retracted or withdrawn\n", " examples:\n", " - value: CLAIMS_RETRACTED\n", " description: Has been reported to enhance netrin-induced phosphorylation\n", " of PAK1 and FYN... This article has been withdrawn by the authors\n", " PROTEIN_IDENTITY:\n", " text: PROTEIN_IDENTITY\n", " description: A caution about uncertainty in the identity or existence of distinct\n", " protein products\n", " examples:\n", " - value: PROTEIN_IDENTITY\n", " description: It is not known whether the so-called human ASE1 and human\n", " CAST proteins represent two sides of a single gene product\n", " FUNCTION_UNCERTAIN_INITIATION:\n", " text: FUNCTION_UNCERTAIN_INITIATION\n", " description: A caution about uncertainty in translation initiation site\n", " examples:\n", " - value: FUNCTION_UNCERTAIN_INITIATION\n", " description: It is uncertain whether Met-1 or Met-37 is the initiator\n", " PSEUDOGENE_STATUS:\n", " text: PSEUDOGENE_STATUS\n", " description: A caution about whether the gene encodes a protein\n", " examples:\n", " - value: PSEUDOGENE_STATUS\n", " description: Could be the product of a pseudogene\n", "classes:\n", " Entry:\n", " name: Entry\n", " attributes:\n", " id:\n", " name: id\n", " identifier: true\n", " category:\n", " name: category\n", " range: CommentCategory\n", " text:\n", " name: text\n", " description: The text of the comment\n", "source_file: ../../tests/input/uniprot/schema.yaml\n", "\n" ] } ], "source": [ "%%bash\n", "linkml-store -S ../../tests/input/uniprot/schema.yaml -d tmp/up.ddb schema" ] }, { "cell_type": "markdown", "id": "5723b14db6ae067f", "metadata": {}, "source": [ "## Inferring a specific field" ] }, { "cell_type": "code", "execution_count": 5, "id": "e3b5b54814c56690", "metadata": { "ExecuteTime": { "end_time": "2025-02-04T14:34:37.310628Z", "start_time": "2024-08-21T22:55:41.459988Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "id: MOTSC_HUMAN\n", "category: EXPRESSION_MECHANISM_UNCLEAR\n", "text: This peptide has been shown to be biologically active but is the product of\n", " a mitochondrial gene. Usage of the mitochondrial genetic code yields tandem start\n", " and stop codons so translation must occur in the cytoplasm. The mechanisms allowing\n", " the production and secretion of the peptide remain unclear\n", "\n" ] } ], "source": [ "%%bash\n", "linkml-store -S ../../tests/input/uniprot/schema.yaml -d tmp/up.ddb -c Entry infer -t llm -T category --where \"id: MOTSC_HUMAN\"" ] }, { "cell_type": "markdown", "id": "c3826927-2022-451a-9cd9-ccf10e303bc3", "metadata": {}, "source": [ "## Inferring all rows\n", "\n", "Here we use a `--where` clause to query all rows in our collection and pass them through the inference engine" ] }, { "cell_type": "code", "execution_count": 63, "id": "970d3e14-b0d5-4468-90f3-0c347c9fe4e5", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "linkml-store -S ../../tests/input/uniprot/schema.yaml -d tmp/up.ddb -c Entry infer -t llm -T category --where \"{}\" -O csv -o tmp/up.predicted.csv" ] }, { "cell_type": "code", "execution_count": 6, "id": "684afb71-1737-4dbc-a7b2-d209fc56f1cb", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 7, "id": "1b82be5f-8c6b-4b51-b41c-5905c936a84d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "category | \n", "text | \n", "
---|---|---|---|
0 | \n", "MOTSC_HUMAN | \n", "EXPRESSION_MECHANISM_UNCLEAR | \n", "This peptide has been shown to be biologically... | \n", "
1 | \n", "POTB3_HUMAN | \n", "GENE_COPY_NUMBER | \n", "Maps to a duplicated region on chromosome 15; ... | \n", "
2 | \n", "MYO1C_HUMAN | \n", "NAMING_CONFUSION | \n", "Represents an unconventional myosin. This prot... | \n", "
3 | \n", "IMA4_HUMAN | \n", "NAMING_CONFUSION | \n", "Was termed importin alpha-4 | \n", "
4 | \n", "S22A1_HUMAN | \n", "LOCALIZATION_DISPUTED | \n", "Cellular localization of OCT1 in the intestine... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
95 | \n", "POK9_HUMAN | \n", "NaN | \n", "Truncated; frameshift leads to premature stop ... | \n", "
96 | \n", "UB2L3_HUMAN | \n", "PSEUDOGENE_STATUS | \n", "PubMed:10760570 reported that UBE2L1, UBE2L2 a... | \n", "
97 | \n", "CBX1_HUMAN | \n", "CLAIMS_RETRACTED | \n", "Was previously reported to interact with ASXL1... | \n", "
98 | \n", "H33_HUMAN | \n", "CLAIMS_RETRACTED | \n", "The original paper reporting lysine deaminatio... | \n", "
99 | \n", "RELB_HUMAN | \n", "FUNCTION_DISPUTED | \n", "Was originally thought to inhibit the transcri... | \n", "
100 rows × 3 columns
\n", "