linkml-term-validator
Validating LinkML schemas and datasets that depend on external ontology terms
A collection of LinkML ValidationPlugin implementations for validating ontology term references in schemas and data.
Key Features
- ✅ Three composable validation plugins for the LinkML validator framework
- ✅ Schema validation - Validates
meaningfields in enum permissible values - ✅ Dynamic enum validation - Validates data against
reachable_from,matches,concepts - ✅ Binding validation - Validates constraints on nested object fields
- ✅ Multi-level caching - In-memory + file-based for fast repeated validation
- ✅ Ontology Access Kit (OAK) integration - Supports multiple ontology sources
- ✅ AI hallucination prevention - Dual validation (ID + label) for AI-generated terms
Quick Start
Installation
pip install linkml-term-validator
Validate a Schema
Check that meaning fields reference valid ontology terms:
linkml-term-validator validate-schema schema.yaml
Validate Data
Validate data instances against dynamic enums and binding constraints:
linkml-term-validator validate-data data.yaml --schema schema.yaml
Documentation Quick Links
Getting Started
- Getting Started Tutorial - Interactive notebook for CLI basics
- CLI Reference - Complete command-line documentation
- Configuration - Configure ontology adapters and caching
Understanding Validation
- Validation Types - Schema, dynamic enum, and binding validation explained
- Anti-Hallucination Guardrails - Preventing AI from hallucinating ontology IDs
- Ontology Access - How OAK adapters work
Integration
- linkml-validate Integration - Use plugins with standard linkml-validate
- Python API - Programmatic usage
- Plugin Reference - Complete API documentation
Advanced Topics
- TSV/CSV Data Validation - Validating tabular data
- Advanced Usage - Custom configs, local files, troubleshooting
- Caching - Understanding the caching system
Use Cases
- Schema Quality Assurance - Catch typos and mismatches in ontology term references before publishing
- Data Validation - Ensure curated datasets only use valid, constrained ontology terms
- AI-Generated Content - Prevent language models from hallucinating fake ontology identifiers
- CI/CD Integration - Automated validation in continuous integration pipelines
- Flexible Constraints - Define valid terms via ontology queries rather than hardcoded lists