Sample Data¶
Let's create a simple LinkML schema and some test data. First, the schema:
In [1]:
Copied!
%%bash
cat > /tmp/disease_schema.yaml << 'EOF'
id: https://example.org/disease
name: disease_schema
prefixes:
linkml: https://w3id.org/linkml/
imports:
- linkml:types
default_range: string
classes:
Disease:
attributes:
id:
identifier: true
name:
required: true
description:
recommended: true
synonyms:
multivalued: true
recommended: true
ontology_id:
recommended: true
EOF
echo "Schema created!"
%%bash
cat > /tmp/disease_schema.yaml << 'EOF'
id: https://example.org/disease
name: disease_schema
prefixes:
linkml: https://w3id.org/linkml/
imports:
- linkml:types
default_range: string
classes:
Disease:
attributes:
id:
identifier: true
name:
required: true
description:
recommended: true
synonyms:
multivalued: true
recommended: true
ontology_id:
recommended: true
EOF
echo "Schema created!"
Schema created!
Now let's create two data files - one with good compliance and one with poor compliance:
In [2]:
Copied!
%%bash
# Good compliance - all recommended fields populated
cat > /tmp/disease_good.yaml << 'EOF'
id: DISEASE:001
name: Asthma
description: A chronic respiratory condition characterized by inflammation of the airways
synonyms:
- bronchial asthma
- reactive airway disease
ontology_id: MONDO:0004979
EOF
# Poor compliance - missing recommended fields
cat > /tmp/disease_poor.yaml << 'EOF'
id: DISEASE:002
name: Unknown Disease
EOF
echo "Data files created!"
%%bash
# Good compliance - all recommended fields populated
cat > /tmp/disease_good.yaml << 'EOF'
id: DISEASE:001
name: Asthma
description: A chronic respiratory condition characterized by inflammation of the airways
synonyms:
- bronchial asthma
- reactive airway disease
ontology_id: MONDO:0004979
EOF
# Poor compliance - missing recommended fields
cat > /tmp/disease_poor.yaml << 'EOF'
id: DISEASE:002
name: Unknown Disease
EOF
echo "Data files created!"
Data files created!
Basic Usage¶
Run linkml-data-qc on a single file to see compliance scores:
In [3]:
Copied!
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease
Compliance Report: /tmp/disease_good.yaml Target Class: Disease Global Compliance: 100.0% (3/3) Weig
hted Compliance: 100.0% Summary by Slot: description: 100.0% ontology_id: 100.0% synonyms: 10
0.0%
Detailed Path Scores:
(root) (Disease): 100.0%
- description: OK
- synonyms: OK
- ontology_id: OK
Now let's check the poor compliance file:
In [4]:
Copied!
%%bash
linkml-data-qc /tmp/disease_poor.yaml \
-s /tmp/disease_schema.yaml \
-t Disease
%%bash
linkml-data-qc /tmp/disease_poor.yaml \
-s /tmp/disease_schema.yaml \
-t Disease
Compliance Report: /tmp/disease_poor.yaml Target Class: Disease Global Compliance: 0.0% (0/3) Weight
ed Compliance: 100.0% Summary by Slot: description: 0.0% ontology_id: 0.0% synonyms: 0.0% De
tailed Path Scores:
(root) (Disease): 0.0%
- description: MISSING
- synonyms: MISSING
- ontology_id: MISSING
In [5]:
Copied!
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
-f json
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
-f json
{
"file_path": "/tmp/disease_good.yaml",
"target_class": "Disease",
"schema_path": "/tmp/disea
se_schema.yaml", "global_compliance": 100.0, "weighted_compliance": 100.0, "total_checks": 3,
"total_populated": 3,
"path_scores": [
{
"path": "(root)",
"parent_class": "Dise
ase",
"item_count": 1,
"slot_scores": [
{
"path": "(root)",
"slot_name": "description",
"populated": 1,
"total": 1,
"percentage":
100.0
},
{
"path": "(root)",
"slot_name": "synonyms",
"populated": 1,
"total": 1,
"percentage": 100.0
},
{
"
path": "(root)",
"slot_name": "ontology_id",
"populated": 1,
"total":
1,
"percentage": 100.0
}
],
"overall_percentage": 100.0
}
],
"
aggregated_scores": [],
"threshold_violations": [],
"summary_by_slot": {
"description": 100.
0,
"synonyms": 100.0,
"ontology_id": 100.0
},
"recommended_slots": [
"synonyms",
"description",
"ontology_id"
],
"config_path": null,
"timestamp": "2025-12-06T20:07:47.96
9074" }
CSV Output¶
Use -f csv for spreadsheet-friendly output:
In [6]:
Copied!
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
-f csv
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
-f csv
file,path,class,slot,populated,total,percentage /tmp/disease_good.yaml,(root),Disease,description,1
,1,100.0 /tmp/disease_good.yaml,(root),Disease,synonyms,1,1,100.0 /tmp/disease_good.yaml,(root),Di
sease,ontology_id,1,1,100.0
Analyzing Multiple Files¶
You can analyze multiple files at once:
In [7]:
Copied!
%%bash
linkml-data-qc /tmp/disease_good.yaml /tmp/disease_poor.yaml \
-s /tmp/disease_schema.yaml \
-t Disease
%%bash
linkml-data-qc /tmp/disease_good.yaml /tmp/disease_poor.yaml \
-s /tmp/disease_schema.yaml \
-t Disease
Multi-File Compliance Report Files Analyzed: 2 Global Compliance: 50.0% Weighted Compliance: 100.0%
Summary by Slot (across all files): description: 50.0% ontology_id: 50.0% synonyms: 50.0% Su
mmary by Path (across all files): (root).description: 50.0% (root).ontology_id: 50.0% (root).s
ynonyms: 50.0% Per-File Compliance: /tmp/disease_good.yaml: 100.0% /tmp/disease_poor.yaml: 0.0%
Analyzing a Directory¶
Use a glob pattern to analyze all matching files in a directory:
In [8]:
Copied!
%%bash
linkml-data-qc /tmp \
-s /tmp/disease_schema.yaml \
-t Disease \
--pattern "disease_*.yaml"
%%bash
linkml-data-qc /tmp \
-s /tmp/disease_schema.yaml \
-t Disease \
--pattern "disease_*.yaml"
Multi-File Compliance Report Files Analyzed: 3 Global Compliance: 33.3% Weighted Compliance: 100.0%
Summary by Slot (across all files): description: 33.3% ontology_id: 33.3% synonyms: 33.3% Su
mmary by Path (across all files): (root).description: 33.3% (root).ontology_id: 33.3% (root).s
ynonyms: 33.3% Per-File Compliance: /tmp/disease_good.yaml: 100.0% /tmp/disease_poor.yaml: 0.0%
/tmp/disease_schema.yaml: 0.0%
Setting Compliance Thresholds¶
Use --min-compliance to set a minimum acceptable compliance level. The command will exit with code 1 if the data falls below the threshold:
In [9]:
Copied!
%%bash
# This should pass (100% >= 50%)
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
--min-compliance 50
echo "Exit code: $?"
%%bash
# This should pass (100% >= 50%)
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
--min-compliance 50
echo "Exit code: $?"
Compliance Report: /tmp/disease_good.yaml Target Class: Disease Global Compliance: 100.0% (3/3) Weig
hted Compliance: 100.0% Summary by Slot: description: 100.0% ontology_id: 100.0% synonyms: 10
0.0%
Detailed Path Scores:
(root) (Disease): 100.0%
- description: OK
- synonyms: OK
- ontology_id: OK
Exit code: 0
In [10]:
Copied!
%%bash
# This should fail (0% < 50%)
linkml-data-qc /tmp/disease_poor.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
--min-compliance 50 || echo "Exit code: $?"
%%bash
# This should fail (0% < 50%)
linkml-data-qc /tmp/disease_poor.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
--min-compliance 50 || echo "Exit code: $?"
Compliance Report: /tmp/disease_poor.yaml Target Class: Disease Global Compliance: 0.0% (0/3) Weight
ed Compliance: 100.0% Summary by Slot: description: 0.0% ontology_id: 0.0% synonyms: 0.0% De
tailed Path Scores:
(root) (Disease): 0.0%
- description: MISSING
- synonyms: MISSING
- ontology_id: MISSING
Compliance 0.0% is below threshold 50.0%
Exit code: 1
Saving Output to a File¶
Use -o to write the report to a file:
In [11]:
Copied!
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
-f json \
-o /tmp/compliance_report.json
echo "Report saved. Contents:"
cat /tmp/compliance_report.json
%%bash
linkml-data-qc /tmp/disease_good.yaml \
-s /tmp/disease_schema.yaml \
-t Disease \
-f json \
-o /tmp/compliance_report.json
echo "Report saved. Contents:"
cat /tmp/compliance_report.json
Report written to /tmp/compliance_report.json
Report saved. Contents:
{
"file_path": "/tmp/disease_good.yaml",
"target_class": "Disease",
"schema_path": "/tmp/disea
se_schema.yaml", "global_compliance": 100.0, "weighted_compliance": 100.0, "total_checks": 3,
"total_populated": 3,
"path_scores": [
{
"path": "(root)",
"parent_class": "Dise
ase",
"item_count": 1,
"slot_scores": [
{
"path": "(root)",
"slot_name": "description",
"populated": 1,
"total": 1,
"percentage":
100.0
},
{
"path": "(root)",
"slot_name": "synonyms",
"populated": 1,
"total": 1,
"percentage": 100.0
},
{
"
path": "(root)",
"slot_name": "ontology_id",
"populated": 1,
"total":
1,
"percentage": 100.0
}
],
"overall_percentage": 100.0
}
],
"
aggregated_scores": [],
"threshold_violations": [],
"summary_by_slot": {
"description": 100.
0,
"synonyms": 100.0,
"ontology_id": 100.0
},
"recommended_slots": [
"description",
"ontology_id",
"synonyms"
],
"config_path": null,
"timestamp": "2025-12-06T20:07:49.93
5050" }
Next Steps¶
- Learn about configuration files for weights and thresholds
- Set up CI/CD integration
- Explore the Python API for programmatic access