QC Dashboards and Visualizations¶

This tutorial shows how to generate visual QC dashboards from the command line.

Dashboard Options¶

--dashboard PATH - Generate a single PNG dashboard image
--dashboard-dir DIR - Generate a full HTML dashboard site (ideal for GitHub Pages)

Prerequisites¶

Dashboard generation requires optional visualization dependencies:

pip install linkml-data-qc[viz]

This installs matplotlib and seaborn for creating charts.

Setup: Create Sample Data¶

In [1]:

Copied!





%%bash
# Create a test schema
cat > /tmp/dashboard_schema.yaml << 'EOF'
id: https://example.org/dashboard-demo
name: dashboard_demo
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string

classes:
  Dataset:
    attributes:
      name:
        required: true
      description:
        recommended: true
      samples:
        multivalued: true
        range: Sample

  Sample:
    attributes:
      id:
        required: true
        identifier: true
      name:
        recommended: true
      tissue_type:
        recommended: true
      ontology_term:
        description: Ontology mapping for this sample
        recommended: true
      collection_date:
        recommended: true
EOF
echo "Schema created!"
%%bash
# Create a test schema
cat > /tmp/dashboard_schema.yaml << 'EOF'
id: https://example.org/dashboard-demo
name: dashboard_demo
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string

classes:
  Dataset:
    attributes:
      name:
        required: true
      description:
        recommended: true
      samples:
        multivalued: true
        range: Sample

  Sample:
    attributes:
      id:
        required: true
        identifier: true
      name:
        recommended: true
      tissue_type:
        recommended: true
      ontology_term:
        description: Ontology mapping for this sample
        recommended: true
      collection_date:
        recommended: true
EOF
echo "Schema created!"

Schema created!

In [2]:

Copied!





%%bash
# Create test data with varying compliance
cat > /tmp/dashboard_data.yaml << 'EOF'
name: Tissue Expression Study
description: Analysis of gene expression across tissue types
samples:
  - id: SAMP001
    name: Liver Sample 1
    tissue_type: liver
    ontology_term: UBERON:0002107
    collection_date: "2024-01-15"
  - id: SAMP002
    name: Heart Sample 1
    tissue_type: heart
    ontology_term: UBERON:0000948
    # Missing collection_date
  - id: SAMP003
    name: Brain Sample 1
    tissue_type: brain
    # Missing ontology_term and collection_date
  - id: SAMP004
    # Missing name, tissue_type
    ontology_term: UBERON:0000955
    collection_date: "2024-02-20"
  - id: SAMP005
    name: Kidney Sample 1
    # Missing tissue_type, ontology_term, collection_date
  - id: SAMP006
    name: Lung Sample 1
    tissue_type: lung
    ontology_term: UBERON:0002048
    collection_date: "2024-03-01"
  - id: SAMP007
    # All recommended fields missing except required id
  - id: SAMP008
    name: Spleen Sample 1
    tissue_type: spleen
    ontology_term: UBERON:0002106
    collection_date: "2024-03-15"
EOF
echo "Data created!"
%%bash
# Create test data with varying compliance
cat > /tmp/dashboard_data.yaml << 'EOF'
name: Tissue Expression Study
description: Analysis of gene expression across tissue types
samples:
  - id: SAMP001
    name: Liver Sample 1
    tissue_type: liver
    ontology_term: UBERON:0002107
    collection_date: "2024-01-15"
  - id: SAMP002
    name: Heart Sample 1
    tissue_type: heart
    ontology_term: UBERON:0000948
    # Missing collection_date
  - id: SAMP003
    name: Brain Sample 1
    tissue_type: brain
    # Missing ontology_term and collection_date
  - id: SAMP004
    # Missing name, tissue_type
    ontology_term: UBERON:0000955
    collection_date: "2024-02-20"
  - id: SAMP005
    name: Kidney Sample 1
    # Missing tissue_type, ontology_term, collection_date
  - id: SAMP006
    name: Lung Sample 1
    tissue_type: lung
    ontology_term: UBERON:0002048
    collection_date: "2024-03-01"
  - id: SAMP007
    # All recommended fields missing except required id
  - id: SAMP008
    name: Spleen Sample 1
    tissue_type: spleen
    ontology_term: UBERON:0002106
    collection_date: "2024-03-15"
EOF
echo "Data created!"

Data created!

Basic Dashboard Generation¶

Generate a dashboard with a single command using --dashboard:

In [3]:

Copied!





%%bash
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --dashboard /tmp/basic_dashboard.png

echo "\nDashboard saved:"
ls -lh /tmp/basic_dashboard.png
%%bash
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --dashboard /tmp/basic_dashboard.png

echo "\nDashboard saved:"
ls -lh /tmp/basic_dashboard.png

Compliance Report: /tmp/dashboard_data.yaml
Target Class: Dataset
Global Compliance: 100.0% (1/1)
We

ighted Compliance: 100.0%

Summary by Slot:
  description: 100.0%

Detailed Path Scores:
  (root) (D

ataset): 100.0%
    - description: OK

Dashboard saved to /tmp/basic_dashboard.png

\nDashboard saved:

-rw-r--r--@ 1 cjm  wheel    79K Dec  8 10:50 /tmp/basic_dashboard.png

Dashboard with Configuration¶

Use a configuration file to set weights and thresholds that will be reflected in the dashboard:

In [4]:

Copied!





%%bash
# Create QC configuration with weights and thresholds
cat > /tmp/dashboard_config.yaml << 'EOF'
default_weight: 1.0

slots:
  ontology_term:
    weight: 3.0           # Ontology mappings are critical
    min_compliance: 80.0  # At least 80% should have mappings
  tissue_type:
    weight: 2.0           # Tissue type is important
    min_compliance: 90.0
  name:
    weight: 1.5
    min_compliance: 95.0
  collection_date:
    weight: 1.0
    min_compliance: 70.0
EOF
echo "Configuration created!"
%%bash
# Create QC configuration with weights and thresholds
cat > /tmp/dashboard_config.yaml << 'EOF'
default_weight: 1.0

slots:
  ontology_term:
    weight: 3.0           # Ontology mappings are critical
    min_compliance: 80.0  # At least 80% should have mappings
  tissue_type:
    weight: 2.0           # Tissue type is important
    min_compliance: 90.0
  name:
    weight: 1.5
    min_compliance: 95.0
  collection_date:
    weight: 1.0
    min_compliance: 70.0
EOF
echo "Configuration created!"

Configuration created!

In [5]:

Copied!





%%bash
# Generate dashboard with configuration
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    -c /tmp/dashboard_config.yaml \
    --dashboard /tmp/configured_dashboard.png

echo "\nDashboard with thresholds saved:"
ls -lh /tmp/configured_dashboard.png
%%bash
# Generate dashboard with configuration
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    -c /tmp/dashboard_config.yaml \
    --dashboard /tmp/configured_dashboard.png

echo "\nDashboard with thresholds saved:"
ls -lh /tmp/configured_dashboard.png

Compliance Report: /tmp/dashboard_data.yaml
Target Class: Dataset
Global Compliance: 100.0% (1/1)
We

ighted Compliance: 100.0%
Config: /tmp/dashboard_config.yaml

Summary by Slot:
  description: 100.0%


Detailed Path Scores:
  (root) (Dataset): 100.0%
    - description: OK

Dashboard saved to /tmp/configured_dashboard.png

\nDashboard with thresholds saved:

-rw-r--r--  1 cjm  wheel    79K Dec  8 10:50 /tmp/configured_dashboard.png

Combined Output: Report + Dashboard¶

Generate both a JSON report and a dashboard in one command:

In [6]:

Copied!





%%bash
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    -c /tmp/dashboard_config.yaml \
    -f json \
    -o /tmp/report.json \
    --dashboard /tmp/full_dashboard.png

echo "Files generated:"
ls -lh /tmp/report.json /tmp/full_dashboard.png
%%bash
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    -c /tmp/dashboard_config.yaml \
    -f json \
    -o /tmp/report.json \
    --dashboard /tmp/full_dashboard.png

echo "Files generated:"
ls -lh /tmp/report.json /tmp/full_dashboard.png

Report written to /tmp/report.json

Dashboard saved to /tmp/full_dashboard.png

Files generated:

-rw-r--r--  1 cjm  wheel    79K Dec  8 10:50 /tmp/full_dashboard.png
-rw-r--r--  1 cjm  wheel   887B

 Dec  8 10:50 /tmp/report.json

Comparing Multiple Files¶

When analyzing multiple files, the dashboard shows a comparison view:

In [7]:

Copied!





%%bash
# Create a second dataset with better compliance
cat > /tmp/dashboard_data2.yaml << 'EOF'
name: Improved Tissue Study
description: Updated dataset with better annotation
samples:
  - id: SAMP101
    name: Liver Sample A
    tissue_type: liver
    ontology_term: UBERON:0002107
    collection_date: "2024-04-01"
  - id: SAMP102
    name: Heart Sample A
    tissue_type: heart
    ontology_term: UBERON:0000948
    collection_date: "2024-04-02"
  - id: SAMP103
    name: Brain Sample A
    tissue_type: brain
    ontology_term: UBERON:0000955
    collection_date: "2024-04-03"
  - id: SAMP104
    name: Kidney Sample A
    tissue_type: kidney
    ontology_term: UBERON:0002113
    # Missing collection_date
EOF
echo "Second dataset created!"
%%bash
# Create a second dataset with better compliance
cat > /tmp/dashboard_data2.yaml << 'EOF'
name: Improved Tissue Study
description: Updated dataset with better annotation
samples:
  - id: SAMP101
    name: Liver Sample A
    tissue_type: liver
    ontology_term: UBERON:0002107
    collection_date: "2024-04-01"
  - id: SAMP102
    name: Heart Sample A
    tissue_type: heart
    ontology_term: UBERON:0000948
    collection_date: "2024-04-02"
  - id: SAMP103
    name: Brain Sample A
    tissue_type: brain
    ontology_term: UBERON:0000955
    collection_date: "2024-04-03"
  - id: SAMP104
    name: Kidney Sample A
    tissue_type: kidney
    ontology_term: UBERON:0002113
    # Missing collection_date
EOF
echo "Second dataset created!"

Second dataset created!

In [8]:

Copied!





%%bash
# Compare both datasets
linkml-data-qc /tmp/dashboard_data.yaml /tmp/dashboard_data2.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --dashboard /tmp/comparison_dashboard.png

echo "\nComparison dashboard saved:"
ls -lh /tmp/comparison_dashboard.png
%%bash
# Compare both datasets
linkml-data-qc /tmp/dashboard_data.yaml /tmp/dashboard_data2.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --dashboard /tmp/comparison_dashboard.png

echo "\nComparison dashboard saved:"
ls -lh /tmp/comparison_dashboard.png

Multi-File Compliance Report
Files Analyzed: 2
Global Compliance: 100.0%
Weighted Compliance: 100.0%


Summary by Slot (across all files):
  description: 100.0%

Summary by Path (across all files):
  (

root).description: 100.0%

Per-File Compliance:
  /tmp/dashboard_data.yaml: 100.0%
  /tmp/dashboard_

data2.yaml: 100.0%

Dashboard saved to /tmp/comparison_dashboard.png

\nComparison dashboard saved:

-rw-r--r--  1 cjm  wheel    36K Dec  8 10:50 /tmp/comparison_dashboard.png

CI/CD Integration¶

Combine dashboard generation with CI/CD checks:

In [9]:

Copied!





%%bash
# Generate dashboard AND fail if below threshold
# (using the good dataset so this won't fail)
linkml-data-qc /tmp/dashboard_data2.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --min-compliance 70 \
    --dashboard /tmp/ci_dashboard.png

echo "\nCI check passed! Dashboard saved."
%%bash
# Generate dashboard AND fail if below threshold
# (using the good dataset so this won't fail)
linkml-data-qc /tmp/dashboard_data2.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --min-compliance 70 \
    --dashboard /tmp/ci_dashboard.png

echo "\nCI check passed! Dashboard saved."

Compliance Report: /tmp/dashboard_data2.yaml
Target Class: Dataset
Global Compliance: 100.0% (1/1)
W

eighted Compliance: 100.0%

Summary by Slot:
  description: 100.0%

Detailed Path Scores:
  (root) (

Dataset): 100.0%
    - description: OK

Dashboard saved to /tmp/ci_dashboard.png

\nCI check passed! Dashboard saved.

In [10]:

Copied!





%%bash
# This would fail CI due to threshold violations
# (but we capture the exit code to show it)
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    -c /tmp/dashboard_config.yaml \
    --fail-on-violations \
    --dashboard /tmp/failed_ci_dashboard.png || echo "\nCI check failed (as expected due to violations)"
%%bash
# This would fail CI due to threshold violations
# (but we capture the exit code to show it)
linkml-data-qc /tmp/dashboard_data.yaml \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    -c /tmp/dashboard_config.yaml \
    --fail-on-violations \
    --dashboard /tmp/failed_ci_dashboard.png || echo "\nCI check failed (as expected due to violations)"

Compliance Report: /tmp/dashboard_data.yaml
Target Class: Dataset
Global Compliance: 100.0% (1/1)
We

ighted Compliance: 100.0%
Config: /tmp/dashboard_config.yaml

Summary by Slot:
  description: 100.0%


Detailed Path Scores:
  (root) (Dataset): 100.0%
    - description: OK

Dashboard saved to /tmp/failed_ci_dashboard.png

Directory Analysis with Dashboard¶

Analyze all files in a directory and generate a summary dashboard:

In [11]:

Copied!





%%bash
# Create a directory with multiple data files
mkdir -p /tmp/datasets
cp /tmp/dashboard_data.yaml /tmp/datasets/study1.yaml
cp /tmp/dashboard_data2.yaml /tmp/datasets/study2.yaml

# Analyze directory
linkml-data-qc /tmp/datasets/ \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --pattern "*.yaml" \
    --dashboard /tmp/directory_dashboard.png

echo "\nDirectory analysis dashboard saved:"
ls -lh /tmp/directory_dashboard.png
%%bash
# Create a directory with multiple data files
mkdir -p /tmp/datasets
cp /tmp/dashboard_data.yaml /tmp/datasets/study1.yaml
cp /tmp/dashboard_data2.yaml /tmp/datasets/study2.yaml

# Analyze directory
linkml-data-qc /tmp/datasets/ \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --pattern "*.yaml" \
    --dashboard /tmp/directory_dashboard.png

echo "\nDirectory analysis dashboard saved:"
ls -lh /tmp/directory_dashboard.png

Multi-File Compliance Report
Files Analyzed: 2
Global Compliance: 100.0%
Weighted Compliance: 100.0%


Summary by Slot (across all files):
  description: 100.0%

Summary by Path (across all files):
  (

root).description: 100.0%

Per-File Compliance:
  /tmp/datasets/study1.yaml: 100.0%
  /tmp/datasets/

study2.yaml: 100.0%

Dashboard saved to /tmp/directory_dashboard.png

\nDirectory analysis dashboard saved:

-rw-r--r--  1 cjm  wheel    33K Dec  8 10:51 /tmp/directory_dashboard.png

HTML Dashboard Site (GitHub Pages)¶

For multi-file analysis, use --dashboard-dir to generate a complete HTML dashboard site:

Sortable file list - Files sorted by compliance (worst first) to prioritize curation
Priority badges - Top 10 worst files highlighted with priority numbers
Detailed charts - Individual slot breakdown for priority files
JSON export - Machine-readable reports.json for automation

In [12]:

Copied!





%%bash
# Generate HTML dashboard for directory
linkml-data-qc /tmp/datasets/ \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --pattern "*.yaml" \
    --dashboard-dir /tmp/html_dashboard

echo "\nHTML dashboard generated:"
ls -la /tmp/html_dashboard/
%%bash
# Generate HTML dashboard for directory
linkml-data-qc /tmp/datasets/ \
    -s /tmp/dashboard_schema.yaml \
    -t Dataset \
    --pattern "*.yaml" \
    --dashboard-dir /tmp/html_dashboard

echo "\nHTML dashboard generated:"
ls -la /tmp/html_dashboard/

Multi-File Compliance Report
Files Analyzed: 2
Global Compliance: 100.0%
Weighted Compliance: 100.0%


Summary by Slot (across all files):
  description: 100.0%

Summary by Path (across all files):
  (

root).description: 100.0%

Per-File Compliance:
  /tmp/datasets/study1.yaml: 100.0%
  /tmp/datasets/

study2.yaml: 100.0%

HTML dashboard generated at /tmp/html_dashboard/index.html

\nHTML dashboard generated:

total 208
drwxr-xr-x    7 cjm   wheel    224 Dec  8 10:51 .
drwxrwxrwt  172 root  whe

el   5504 Dec  8 10:51 ..
-rw-r--r--    1 cjm   wheel  33654 Dec  8 10:51 comparison

.png
-rw-r--r--    1 cjm   wheel  27637 Dec  8 10:51 detail_0.png
-rw-r--r--    1 cjm   wheel  28008

 Dec  8 10:51 detail_1.png
-rw-r--r--    1 cjm   wheel   6973 Dec  8 10:51 index.html
-rw-r--r--

1 cjm   wheel    344 Dec  8 10:51 reports.json

In [13]:

Copied!

%%bash
# View the JSON report for automation
cat /tmp/html_dashboard/reports.json
%%bash
# View the JSON report for automation
cat /tmp/html_dashboard/reports.json

[
  {
    "file": "study1.yaml",
    "global_compliance": 100.0,
    "weighted_compliance": 100.0,

   "total_checks": 1,
    "total_populated": 1,
    "violations": 0
  },
  {
    "file": "study2.yam

l",
    "global_compliance": 100.0,
    "weighted_compliance": 100.0,
    "total_checks": 1,
    "to

tal_populated": 1,
    "violations": 0
  }
]

The HTML dashboard can be directly deployed to GitHub Pages. The reports.json file contains all file metrics sorted by compliance, useful for:

CI/CD automation scripts
Custom downstream analysis
Integration with other reporting tools

Summary of Generated Dashboards¶

In [14]:

Copied!

%%bash
echo "All generated dashboards:"
ls -lh /tmp/*dashboard*.png 2>/dev/null || echo "No dashboards found"
%%bash
echo "All generated dashboards:"
ls -lh /tmp/*dashboard*.png 2>/dev/null || echo "No dashboards found"

All generated dashboards:

-rw-r--r--@ 1 cjm  wheel    79K Dec  8 10:50 /tmp/basic_dashboard.png
-rw-r--r--  1 cjm  wheel    79

K Dec  8 10:50 /tmp/ci_dashboard.png
-rw-r--r--  1 cjm  wheel    86K Dec  7 18:45 /tmp/cli_dashboard

.png
-rw-r--r--  1 cjm  wheel    36K Dec  8 10:50 /tmp/comparison_dashboard.png
-rw-r--r--  1 cjm  w

heel    79K Dec  8 10:50 /tmp/configured_dashboard.png
-rw-r--r--  1 cjm  wheel    33K Dec  8 10:51

/tmp/directory_dashboard.png
-rw-r--r--  1 cjm  wheel    79K Dec  8 10:51 /tmp/failed_ci_dashboard.p

ng
-rw-r--r--  1 cjm  wheel    79K Dec  8 10:50 /tmp/full_dashboard.png
-rw-r--r--  1 cjm  wheel

79K Dec  7 12:03 /tmp/qc_dashboard.png

Key CLI Options for Dashboards¶

Option	Description
`--dashboard PATH`	Generate single dashboard PNG image
`--dashboard-dir DIR`	Generate HTML dashboard site (for GitHub Pages)
`-c, --config PATH`	Config file with weights/thresholds (reflected in dashboard)
`-o, --output PATH`	Also save text/JSON/CSV report
`--min-compliance N`	Fail if compliance < N% (dashboard still generated)
`--fail-on-violations`	Fail if thresholds violated (dashboard still generated)

HTML Dashboard Structure¶

When using --dashboard-dir, the following files are generated:

dashboard/
├── index.html       # Main dashboard page
├── comparison.png   # Slot compliance across files
├── detail_0.png     # Detailed chart for worst file
├── detail_1.png     # Detailed chart for 2nd worst file
├── ...              # Up to 10 detail charts
└── reports.json     # All metrics for automation

Files are sorted by compliance (worst first) to help prioritize curation efforts.

CI/CD Example with HTML Dashboard¶

# GitHub Actions workflow
linkml-data-qc data/ \
    -s schema.yaml \
    -t Dataset \
    -c qc_config.yaml \
    --dashboard-dir artifacts/qc_dashboard \
    --fail-on-violations

# Deploy artifacts/qc_dashboard to GitHub Pages

This generates a complete dashboard site as a CI artifact, then fails if thresholds are violated.