QC Dashboards and Visualizations¶
This tutorial shows how to generate visual QC dashboards from the command line.
Dashboard Options¶
--dashboard PATH- Generate a single PNG dashboard image--dashboard-dir DIR- Generate a full HTML dashboard site (ideal for GitHub Pages)
Prerequisites¶
Dashboard generation requires optional visualization dependencies:
pip install linkml-data-qc[viz]
This installs matplotlib and seaborn for creating charts.
Setup: Create Sample Data¶
%%bash
# Create a test schema
cat > /tmp/dashboard_schema.yaml << 'EOF'
id: https://example.org/dashboard-demo
name: dashboard_demo
prefixes:
linkml: https://w3id.org/linkml/
imports:
- linkml:types
default_range: string
classes:
Dataset:
attributes:
name:
required: true
description:
recommended: true
samples:
multivalued: true
range: Sample
Sample:
attributes:
id:
required: true
identifier: true
name:
recommended: true
tissue_type:
recommended: true
ontology_term:
description: Ontology mapping for this sample
recommended: true
collection_date:
recommended: true
EOF
echo "Schema created!"
Schema created!
%%bash
# Create test data with varying compliance
cat > /tmp/dashboard_data.yaml << 'EOF'
name: Tissue Expression Study
description: Analysis of gene expression across tissue types
samples:
- id: SAMP001
name: Liver Sample 1
tissue_type: liver
ontology_term: UBERON:0002107
collection_date: "2024-01-15"
- id: SAMP002
name: Heart Sample 1
tissue_type: heart
ontology_term: UBERON:0000948
# Missing collection_date
- id: SAMP003
name: Brain Sample 1
tissue_type: brain
# Missing ontology_term and collection_date
- id: SAMP004
# Missing name, tissue_type
ontology_term: UBERON:0000955
collection_date: "2024-02-20"
- id: SAMP005
name: Kidney Sample 1
# Missing tissue_type, ontology_term, collection_date
- id: SAMP006
name: Lung Sample 1
tissue_type: lung
ontology_term: UBERON:0002048
collection_date: "2024-03-01"
- id: SAMP007
# All recommended fields missing except required id
- id: SAMP008
name: Spleen Sample 1
tissue_type: spleen
ontology_term: UBERON:0002106
collection_date: "2024-03-15"
EOF
echo "Data created!"
Data created!
Basic Dashboard Generation¶
Generate a dashboard with a single command using --dashboard:
%%bash
linkml-data-qc /tmp/dashboard_data.yaml \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
--dashboard /tmp/basic_dashboard.png
echo "\nDashboard saved:"
ls -lh /tmp/basic_dashboard.png
Compliance Report: /tmp/dashboard_data.yaml Target Class: Dataset Global Compliance: 100.0% (1/1) We
ighted Compliance: 100.0% Summary by Slot: description: 100.0% Detailed Path Scores: (root) (D
ataset): 100.0%
- description: OK
Dashboard saved to /tmp/basic_dashboard.png
\nDashboard saved:
-rw-r--r--@ 1 cjm wheel 79K Dec 8 10:50 /tmp/basic_dashboard.png
Dashboard with Configuration¶
Use a configuration file to set weights and thresholds that will be reflected in the dashboard:
%%bash
# Create QC configuration with weights and thresholds
cat > /tmp/dashboard_config.yaml << 'EOF'
default_weight: 1.0
slots:
ontology_term:
weight: 3.0 # Ontology mappings are critical
min_compliance: 80.0 # At least 80% should have mappings
tissue_type:
weight: 2.0 # Tissue type is important
min_compliance: 90.0
name:
weight: 1.5
min_compliance: 95.0
collection_date:
weight: 1.0
min_compliance: 70.0
EOF
echo "Configuration created!"
Configuration created!
%%bash
# Generate dashboard with configuration
linkml-data-qc /tmp/dashboard_data.yaml \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
-c /tmp/dashboard_config.yaml \
--dashboard /tmp/configured_dashboard.png
echo "\nDashboard with thresholds saved:"
ls -lh /tmp/configured_dashboard.png
Compliance Report: /tmp/dashboard_data.yaml Target Class: Dataset Global Compliance: 100.0% (1/1) We
ighted Compliance: 100.0% Config: /tmp/dashboard_config.yaml Summary by Slot: description: 100.0%
Detailed Path Scores:
(root) (Dataset): 100.0%
- description: OK
Dashboard saved to /tmp/configured_dashboard.png
\nDashboard with thresholds saved:
-rw-r--r-- 1 cjm wheel 79K Dec 8 10:50 /tmp/configured_dashboard.png
Combined Output: Report + Dashboard¶
Generate both a JSON report and a dashboard in one command:
%%bash
linkml-data-qc /tmp/dashboard_data.yaml \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
-c /tmp/dashboard_config.yaml \
-f json \
-o /tmp/report.json \
--dashboard /tmp/full_dashboard.png
echo "Files generated:"
ls -lh /tmp/report.json /tmp/full_dashboard.png
Report written to /tmp/report.json
Dashboard saved to /tmp/full_dashboard.png
Files generated:
-rw-r--r-- 1 cjm wheel 79K Dec 8 10:50 /tmp/full_dashboard.png -rw-r--r-- 1 cjm wheel 887B
Dec 8 10:50 /tmp/report.json
Comparing Multiple Files¶
When analyzing multiple files, the dashboard shows a comparison view:
%%bash
# Create a second dataset with better compliance
cat > /tmp/dashboard_data2.yaml << 'EOF'
name: Improved Tissue Study
description: Updated dataset with better annotation
samples:
- id: SAMP101
name: Liver Sample A
tissue_type: liver
ontology_term: UBERON:0002107
collection_date: "2024-04-01"
- id: SAMP102
name: Heart Sample A
tissue_type: heart
ontology_term: UBERON:0000948
collection_date: "2024-04-02"
- id: SAMP103
name: Brain Sample A
tissue_type: brain
ontology_term: UBERON:0000955
collection_date: "2024-04-03"
- id: SAMP104
name: Kidney Sample A
tissue_type: kidney
ontology_term: UBERON:0002113
# Missing collection_date
EOF
echo "Second dataset created!"
Second dataset created!
%%bash
# Compare both datasets
linkml-data-qc /tmp/dashboard_data.yaml /tmp/dashboard_data2.yaml \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
--dashboard /tmp/comparison_dashboard.png
echo "\nComparison dashboard saved:"
ls -lh /tmp/comparison_dashboard.png
Multi-File Compliance Report Files Analyzed: 2 Global Compliance: 100.0% Weighted Compliance: 100.0%
Summary by Slot (across all files): description: 100.0% Summary by Path (across all files): (
root).description: 100.0% Per-File Compliance: /tmp/dashboard_data.yaml: 100.0% /tmp/dashboard_
data2.yaml: 100.0%
Dashboard saved to /tmp/comparison_dashboard.png
\nComparison dashboard saved:
-rw-r--r-- 1 cjm wheel 36K Dec 8 10:50 /tmp/comparison_dashboard.png
CI/CD Integration¶
Combine dashboard generation with CI/CD checks:
%%bash
# Generate dashboard AND fail if below threshold
# (using the good dataset so this won't fail)
linkml-data-qc /tmp/dashboard_data2.yaml \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
--min-compliance 70 \
--dashboard /tmp/ci_dashboard.png
echo "\nCI check passed! Dashboard saved."
Compliance Report: /tmp/dashboard_data2.yaml Target Class: Dataset Global Compliance: 100.0% (1/1) W
eighted Compliance: 100.0% Summary by Slot: description: 100.0% Detailed Path Scores: (root) (
Dataset): 100.0%
- description: OK
Dashboard saved to /tmp/ci_dashboard.png
\nCI check passed! Dashboard saved.
%%bash
# This would fail CI due to threshold violations
# (but we capture the exit code to show it)
linkml-data-qc /tmp/dashboard_data.yaml \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
-c /tmp/dashboard_config.yaml \
--fail-on-violations \
--dashboard /tmp/failed_ci_dashboard.png || echo "\nCI check failed (as expected due to violations)"
Compliance Report: /tmp/dashboard_data.yaml Target Class: Dataset Global Compliance: 100.0% (1/1) We
ighted Compliance: 100.0% Config: /tmp/dashboard_config.yaml Summary by Slot: description: 100.0%
Detailed Path Scores:
(root) (Dataset): 100.0%
- description: OK
Dashboard saved to /tmp/failed_ci_dashboard.png
Directory Analysis with Dashboard¶
Analyze all files in a directory and generate a summary dashboard:
%%bash
# Create a directory with multiple data files
mkdir -p /tmp/datasets
cp /tmp/dashboard_data.yaml /tmp/datasets/study1.yaml
cp /tmp/dashboard_data2.yaml /tmp/datasets/study2.yaml
# Analyze directory
linkml-data-qc /tmp/datasets/ \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
--pattern "*.yaml" \
--dashboard /tmp/directory_dashboard.png
echo "\nDirectory analysis dashboard saved:"
ls -lh /tmp/directory_dashboard.png
Multi-File Compliance Report Files Analyzed: 2 Global Compliance: 100.0% Weighted Compliance: 100.0%
Summary by Slot (across all files): description: 100.0% Summary by Path (across all files): (
root).description: 100.0% Per-File Compliance: /tmp/datasets/study1.yaml: 100.0% /tmp/datasets/
study2.yaml: 100.0%
Dashboard saved to /tmp/directory_dashboard.png
\nDirectory analysis dashboard saved:
-rw-r--r-- 1 cjm wheel 33K Dec 8 10:51 /tmp/directory_dashboard.png
HTML Dashboard Site (GitHub Pages)¶
For multi-file analysis, use --dashboard-dir to generate a complete HTML dashboard site:
- Sortable file list - Files sorted by compliance (worst first) to prioritize curation
- Priority badges - Top 10 worst files highlighted with priority numbers
- Detailed charts - Individual slot breakdown for priority files
- JSON export - Machine-readable
reports.jsonfor automation
%%bash
# Generate HTML dashboard for directory
linkml-data-qc /tmp/datasets/ \
-s /tmp/dashboard_schema.yaml \
-t Dataset \
--pattern "*.yaml" \
--dashboard-dir /tmp/html_dashboard
echo "\nHTML dashboard generated:"
ls -la /tmp/html_dashboard/
Multi-File Compliance Report Files Analyzed: 2 Global Compliance: 100.0% Weighted Compliance: 100.0%
Summary by Slot (across all files): description: 100.0% Summary by Path (across all files): (
root).description: 100.0% Per-File Compliance: /tmp/datasets/study1.yaml: 100.0% /tmp/datasets/
study2.yaml: 100.0%
HTML dashboard generated at /tmp/html_dashboard/index.html
\nHTML dashboard generated:
total 208
drwxr-xr-x 7 cjm wheel 224 Dec 8 10:51 .
drwxrwxrwt 172 root whe
el 5504 Dec 8 10:51 ..
-rw-r--r-- 1 cjm wheel 33654 Dec 8 10:51 comparison
.png -rw-r--r-- 1 cjm wheel 27637 Dec 8 10:51 detail_0.png -rw-r--r-- 1 cjm wheel 28008
Dec 8 10:51 detail_1.png -rw-r--r-- 1 cjm wheel 6973 Dec 8 10:51 index.html -rw-r--r--
1 cjm wheel 344 Dec 8 10:51 reports.json
%%bash
# View the JSON report for automation
cat /tmp/html_dashboard/reports.json
[
{
"file": "study1.yaml",
"global_compliance": 100.0,
"weighted_compliance": 100.0,
"total_checks": 1,
"total_populated": 1,
"violations": 0
},
{
"file": "study2.yam
l",
"global_compliance": 100.0,
"weighted_compliance": 100.0,
"total_checks": 1,
"to
tal_populated": 1,
"violations": 0
}
]
The HTML dashboard can be directly deployed to GitHub Pages. The reports.json file contains all file metrics sorted by compliance, useful for:
- CI/CD automation scripts
- Custom downstream analysis
- Integration with other reporting tools
Summary of Generated Dashboards¶
%%bash
echo "All generated dashboards:"
ls -lh /tmp/*dashboard*.png 2>/dev/null || echo "No dashboards found"
All generated dashboards:
-rw-r--r--@ 1 cjm wheel 79K Dec 8 10:50 /tmp/basic_dashboard.png -rw-r--r-- 1 cjm wheel 79
K Dec 8 10:50 /tmp/ci_dashboard.png -rw-r--r-- 1 cjm wheel 86K Dec 7 18:45 /tmp/cli_dashboard
.png -rw-r--r-- 1 cjm wheel 36K Dec 8 10:50 /tmp/comparison_dashboard.png -rw-r--r-- 1 cjm w
heel 79K Dec 8 10:50 /tmp/configured_dashboard.png -rw-r--r-- 1 cjm wheel 33K Dec 8 10:51
/tmp/directory_dashboard.png -rw-r--r-- 1 cjm wheel 79K Dec 8 10:51 /tmp/failed_ci_dashboard.p
ng -rw-r--r-- 1 cjm wheel 79K Dec 8 10:50 /tmp/full_dashboard.png -rw-r--r-- 1 cjm wheel
79K Dec 7 12:03 /tmp/qc_dashboard.png
Key CLI Options for Dashboards¶
| Option | Description |
|---|---|
--dashboard PATH |
Generate single dashboard PNG image |
--dashboard-dir DIR |
Generate HTML dashboard site (for GitHub Pages) |
-c, --config PATH |
Config file with weights/thresholds (reflected in dashboard) |
-o, --output PATH |
Also save text/JSON/CSV report |
--min-compliance N |
Fail if compliance < N% (dashboard still generated) |
--fail-on-violations |
Fail if thresholds violated (dashboard still generated) |
HTML Dashboard Structure¶
When using --dashboard-dir, the following files are generated:
dashboard/
├── index.html # Main dashboard page
├── comparison.png # Slot compliance across files
├── detail_0.png # Detailed chart for worst file
├── detail_1.png # Detailed chart for 2nd worst file
├── ... # Up to 10 detail charts
└── reports.json # All metrics for automation
Files are sorted by compliance (worst first) to help prioritize curation efforts.
CI/CD Example with HTML Dashboard¶
# GitHub Actions workflow
linkml-data-qc data/ \
-s schema.yaml \
-t Dataset \
-c qc_config.yaml \
--dashboard-dir artifacts/qc_dashboard \
--fail-on-violations
# Deploy artifacts/qc_dashboard to GitHub Pages
This generates a complete dashboard site as a CI artifact, then fails if thresholds are violated.