Python API Tutorial¶

While linkml-data-qc is primarily a CLI tool, it also provides a Python API for programmatic access.

Core Classes¶

The main classes are:

ComplianceAnalyzer - Analyzes data files for compliance
SchemaIntrospector - Extracts recommended slots from schemas
QCConfig - Configuration for weights and thresholds
Formatters (JSONFormatter, CSVFormatter, TextFormatter) - Output formatting

Setup¶

First, let's create the same test data as in the CLI tutorial:

In [1]:

Copied!





%%bash
# Create test schema
cat > /tmp/disease_schema.yaml << 'EOF'
id: https://example.org/disease
name: disease_schema
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string

classes:
  Disease:
    attributes:
      id:
        identifier: true
      name:
        required: true
      description:
        recommended: true
      synonyms:
        multivalued: true
        recommended: true
      ontology_id:
        recommended: true
EOF

# Create test data
cat > /tmp/disease_good.yaml << 'EOF'
id: DISEASE:001
name: Asthma
description: A chronic respiratory condition
synonyms:
  - bronchial asthma
ontology_id: MONDO:0004979
EOF

cat > /tmp/disease_poor.yaml << 'EOF'
id: DISEASE:002
name: Unknown Disease
EOF

echo "Test files created!"
%%bash
# Create test schema
cat > /tmp/disease_schema.yaml << 'EOF'
id: https://example.org/disease
name: disease_schema
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string

classes:
  Disease:
    attributes:
      id:
        identifier: true
      name:
        required: true
      description:
        recommended: true
      synonyms:
        multivalued: true
        recommended: true
      ontology_id:
        recommended: true
EOF

# Create test data
cat > /tmp/disease_good.yaml << 'EOF'
id: DISEASE:001
name: Asthma
description: A chronic respiratory condition
synonyms:
  - bronchial asthma
ontology_id: MONDO:0004979
EOF

cat > /tmp/disease_poor.yaml << 'EOF'
id: DISEASE:002
name: Unknown Disease
EOF

echo "Test files created!"

Test files created!

Basic Analysis¶

In [2]:

Copied!





from linkml_data_qc import ComplianceAnalyzer

# Create an analyzer with your schema
analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml")

# Analyze a file
report = analyzer.analyze_file("/tmp/disease_good.yaml", "Disease")

print(f"Global Compliance: {report.global_compliance}%")
print(f"Total Checks: {report.total_checks}")
print(f"Total Populated: {report.total_populated}")
from linkml_data_qc import ComplianceAnalyzer

# Create an analyzer with your schema
analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml")

# Analyze a file
report = analyzer.analyze_file("/tmp/disease_good.yaml", "Disease")

print(f"Global Compliance: {report.global_compliance}%")
print(f"Total Checks: {report.total_checks}")
print(f"Total Populated: {report.total_populated}")

Global Compliance: 100.0%
Total Checks: 3
Total Populated: 3

Accessing Detailed Results¶

In [3]:

Copied!





# Summary by slot shows compliance for each recommended field
print("Summary by Slot:")
for slot, percentage in report.summary_by_slot.items():
    print(f"  {slot}: {percentage}%")
# Summary by slot shows compliance for each recommended field
print("Summary by Slot:")
for slot, percentage in report.summary_by_slot.items():
    print(f"  {slot}: {percentage}%")

Summary by Slot:
  description: 100.0%
  synonyms: 100.0%
  ontology_id: 100.0%

In [4]:

Copied!





# Path scores show per-object compliance
print("Path Scores:")
for ps in report.path_scores:
    print(f"  {ps.path}: {ps.overall_percentage}%")
    for ss in ps.slot_scores:
        print(f"    {ss.slot_name}: {ss.populated}/{ss.total} ({ss.percentage}%)")
# Path scores show per-object compliance
print("Path Scores:")
for ps in report.path_scores:
    print(f"  {ps.path}: {ps.overall_percentage}%")
    for ss in ps.slot_scores:
        print(f"    {ss.slot_name}: {ss.populated}/{ss.total} ({ss.percentage}%)")

Path Scores:
  (root): 100.0%
    description: 1/1 (100.0%)
    synonyms: 1/1 (100.0%)
    ontology_id: 1/1 (100.0%)

Schema Introspection¶

In [5]:

Copied!

from linkml_data_qc import SchemaIntrospector

introspector = SchemaIntrospector("/tmp/disease_schema.yaml")

# Get all recommended slots in schema
print(f"Recommended slots: {introspector.recommended_slots}")

# Get class-specific info
class_info = introspector.get_class_slots("Disease")
print(f"\nDisease class recommended: {class_info.recommended_slots}")
from linkml_data_qc import SchemaIntrospector

introspector = SchemaIntrospector("/tmp/disease_schema.yaml")

# Get all recommended slots in schema
print(f"Recommended slots: {introspector.recommended_slots}")

# Get class-specific info
class_info = introspector.get_class_slots("Disease")
print(f"\nDisease class recommended: {class_info.recommended_slots}")

Recommended slots: {'ontology_id', 'synonyms', 'description'}

Disease class recommended: ['description', 'synonyms', 'ontology_id']

Using Configuration¶

In [6]:

Copied!





from linkml_data_qc import QCConfig, SlotQCConfig

# Create configuration with weights and thresholds
config = QCConfig(
    default_weight=1.0,
    slots={
        "ontology_id": SlotQCConfig(weight=2.0, min_compliance=80.0),
        "description": SlotQCConfig(weight=0.5),
    }
)

# Create analyzer with configuration
analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml", config)
report = analyzer.analyze_file("/tmp/disease_good.yaml", "Disease")

print(f"Global Compliance: {report.global_compliance}%")
print(f"Weighted Compliance: {report.weighted_compliance}%")
from linkml_data_qc import QCConfig, SlotQCConfig

# Create configuration with weights and thresholds
config = QCConfig(
    default_weight=1.0,
    slots={
        "ontology_id": SlotQCConfig(weight=2.0, min_compliance=80.0),
        "description": SlotQCConfig(weight=0.5),
    }
)

# Create analyzer with configuration
analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml", config)
report = analyzer.analyze_file("/tmp/disease_good.yaml", "Disease")

print(f"Global Compliance: {report.global_compliance}%")
print(f"Weighted Compliance: {report.weighted_compliance}%")

Global Compliance: 100.0%
Weighted Compliance: 100.0%

Checking for Violations¶

In [7]:

Copied!





# Analyze poor compliance file with strict threshold
config = QCConfig(
    slots={
        "description": SlotQCConfig(min_compliance=50.0),
    }
)

analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml", config)
report = analyzer.analyze_file("/tmp/disease_poor.yaml", "Disease")

if report.threshold_violations:
    print("Threshold Violations:")
    for v in report.threshold_violations:
        print(f"  {v.path}.{v.slot_name}: {v.actual_compliance}% < {v.min_required}%")
else:
    print("No violations!")
# Analyze poor compliance file with strict threshold
config = QCConfig(
    slots={
        "description": SlotQCConfig(min_compliance=50.0),
    }
)

analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml", config)
report = analyzer.analyze_file("/tmp/disease_poor.yaml", "Disease")

if report.threshold_violations:
    print("Threshold Violations:")
    for v in report.threshold_violations:
        print(f"  {v.path}.{v.slot_name}: {v.actual_compliance}% < {v.min_required}%")
else:
    print("No violations!")

No violations!

Formatting Output¶

In [8]:

Copied!

from linkml_data_qc import JSONFormatter, TextFormatter

analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml")
report = analyzer.analyze_file("/tmp/disease_good.yaml", "Disease")

# Text format
print(TextFormatter.format(report))
from linkml_data_qc import JSONFormatter, TextFormatter

analyzer = ComplianceAnalyzer("/tmp/disease_schema.yaml")
report = analyzer.analyze_file("/tmp/disease_good.yaml", "Disease")

# Text format
print(TextFormatter.format(report))

Compliance Report: /tmp/disease_good.yaml
Target Class: Disease
Global Compliance: 100.0% (3/3)
Weighted Compliance: 100.0%

Summary by Slot:
  description: 100.0%
  ontology_id: 100.0%
  synonyms: 100.0%

Detailed Path Scores:
  (root) (Disease): 100.0%
    - description: OK
    - synonyms: OK
    - ontology_id: OK

In [9]:

Copied!





# JSON format
import json
json_output = JSONFormatter.format(report)
print(json.dumps(json.loads(json_output), indent=2))
# JSON format
import json
json_output = JSONFormatter.format(report)
print(json.dumps(json.loads(json_output), indent=2))

{
  "file_path": "/tmp/disease_good.yaml",
  "target_class": "Disease",
  "schema_path": "/tmp/disease_schema.yaml",
  "global_compliance": 100.0,
  "weighted_compliance": 100.0,
  "total_checks": 3,
  "total_populated": 3,
  "path_scores": [
    {
      "path": "(root)",
      "parent_class": "Disease",
      "item_count": 1,
      "slot_scores": [
        {
          "path": "(root)",
          "slot_name": "description",
          "populated": 1,
          "total": 1,
          "percentage": 100.0
        },
        {
          "path": "(root)",
          "slot_name": "synonyms",
          "populated": 1,
          "total": 1,
          "percentage": 100.0
        },
        {
          "path": "(root)",
          "slot_name": "ontology_id",
          "populated": 1,
          "total": 1,
          "percentage": 100.0
        }
      ],
      "overall_percentage": 100.0
    }
  ],
  "aggregated_scores": [],
  "threshold_violations": [],
  "summary_by_slot": {
    "description": 100.0,
    "synonyms": 100.0,
    "ontology_id": 100.0
  },
  "recommended_slots": [
    "ontology_id",
    "synonyms",
    "description"
  ],
  "config_path": null,
  "timestamp": "2025-12-06T20:10:19.589874"
}

Multi-File Analysis¶

In [10]:

Copied!





from linkml_data_qc import analyze_directory, create_multi_file_report

# Analyze all matching files in a directory
reports = analyze_directory(
    schema_path="/tmp/disease_schema.yaml",
    data_dir="/tmp",
    target_class="Disease",
    pattern="disease_*.yaml"
)

# Create aggregated report
multi_report = create_multi_file_report(reports)

print(f"Files Analyzed: {multi_report.files_analyzed}")
print(f"Overall Compliance: {multi_report.global_compliance}%")
print("\nSummary by Slot:")
for slot, pct in multi_report.summary_by_slot.items():
    print(f"  {slot}: {pct}%")
from linkml_data_qc import analyze_directory, create_multi_file_report

# Analyze all matching files in a directory
reports = analyze_directory(
    schema_path="/tmp/disease_schema.yaml",
    data_dir="/tmp",
    target_class="Disease",
    pattern="disease_*.yaml"
)

# Create aggregated report
multi_report = create_multi_file_report(reports)

print(f"Files Analyzed: {multi_report.files_analyzed}")
print(f"Overall Compliance: {multi_report.global_compliance}%")
print("\nSummary by Slot:")
for slot, pct in multi_report.summary_by_slot.items():
    print(f"  {slot}: {pct}%")

Files Analyzed: 3
Overall Compliance: 33.33333333333333%

Summary by Slot:
  description: 33.33333333333333%
  synonyms: 33.33333333333333%
  ontology_id: 33.33333333333333%

When to Use the CLI vs Python API¶

Use the CLI when:

Running one-off compliance checks
Integrating with CI/CD pipelines
Generating reports for external tools

Use the Python API when:

Building custom analysis pipelines
Integrating with other Python tools
Needing programmatic access to detailed results
Building dashboards or visualizations