How-To: Validate Bindings with Dynamic Enums
This guide shows you how to validate ontology term bindings in nested objects, including validation against dynamic enum closures and catching fabricated term IDs.
Prerequisites
- linkml-term-validator installed
- A LinkML schema with bindings
- Data to validate
- (Optional) An oak_config.yaml for custom ontology adapters
Goal
Validate that:
- Term IDs in nested objects belong to the correct ontology branch (closure validation)
- Term IDs actually exist in the ontology (strict mode)
- (Optionally) Labels match the ontology's canonical labels
Step 1: Define Your Schema with Bindings
Create a schema with a reusable term class and bindings that constrain the id field to a dynamic enum:
# schema.yaml
id: https://example.org/gene-annotations
name: gene-annotations
prefixes:
GO: http://purl.obolibrary.org/obo/GO_
linkml: https://w3id.org/linkml/
default_prefix: gene-annotations
default_range: string
classes:
GeneAnnotation:
tree_root: true
attributes:
gene_id:
identifier: true
process:
description: Biological process annotation
range: Term
inlined: true
bindings:
- binds_value_of: id
range: BiologicalProcessEnum
Term:
description: Reusable ontology term class
attributes:
id:
description: CURIE (e.g., GO:0007049)
label:
description: Human-readable label
implements:
- rdfs:label
enums:
BiologicalProcessEnum:
description: GO biological processes
reachable_from:
source_ontology: sqlite:obo:go
source_nodes:
- GO:0008150 # biological_process
relationship_types:
- rdfs:subClassOf
include_self: true
Key points:
bindingson theprocessslot constrainsterm.idtoBiologicalProcessEnumreachable_fromdefines a dynamic enum based on ontology traversalimplements: [rdfs:label]marks the label field for validation
Step 2: Create Test Data
Valid data (valid_data.yaml):
gene_id: BRCA1
process:
id: GO:0007049 # cell cycle - IS a biological process
label: cell cycle
Invalid data - wrong branch (wrong_branch.yaml):
gene_id: BRCA1
process:
id: GO:0005634 # nucleus - NOT a biological process (it's a cellular component)
label: nucleus
Invalid data - fabricated ID (fabricated.yaml):
gene_id: BRCA1
process:
id: GO:9999999 # Doesn't exist in GO!
label: made up term
Step 3: Run Validation
Basic Binding Validation
# Validates against the dynamic enum closure
linkml-term-validator validate-data valid_data.yaml -s schema.yaml -t GeneAnnotation
Output:
Validation passed
Catch Wrong Branch
linkml-term-validator validate-data wrong_branch.yaml -s schema.yaml -t GeneAnnotation
Output:
Validation failed with 1 issue(s):
ERROR: Value 'GO:0005634' not in dynamic enum (expanded from ontology) 'BiologicalProcessEnum'
path: process
slot: process
field: id
allowed_values: 29688 terms
Catch Fabricated IDs (Strict Mode)
By default, strict mode is enabled. This catches fabricated term IDs:
linkml-term-validator validate-data fabricated.yaml -s schema.yaml -t GeneAnnotation
Output:
Validation failed with 2 issue(s):
ERROR: Value 'GO:9999999' not in dynamic enum (expanded from ontology) 'BiologicalProcessEnum'
...
ERROR: Term 'GO:9999999' not found in ontology
...
prefix: GO (configured in oak_config)
Disable Strict Mode (Lenient)
If you want to skip the existence check:
linkml-term-validator validate-data fabricated.yaml -s schema.yaml -t GeneAnnotation --lenient
This will only report the closure error, not the "term not found" error.
Step 4: Add Label Validation (Anti-Hallucination)
Enable label validation to catch mismatched labels:
linkml-term-validator validate-data data.yaml -s schema.yaml -t GeneAnnotation --labels
If the data has:
process:
id: GO:0007049
label: DNA repair # WRONG! Should be "cell cycle"
Output:
WARNING: Label mismatch for GO:0007049
Expected: "DNA repair"
Ontology: "cell cycle"
Step 5: Use Custom Ontology Configuration
For local ontologies or specific adapters, create an oak_config.yaml:
# oak_config.yaml
ontology_adapters:
GO: sqlite:obo:go
HP: sqlite:obo:hp
MONDO: sqlite:obo:mondo
# For local OBO files:
# TEST: simpleobo:path/to/test.obo
Then reference it:
linkml-term-validator validate-data data.yaml -s schema.yaml -t GeneAnnotation \
--config oak_config.yaml
Python API
from linkml.validator import Validator
from linkml.validator.loaders import YamlLoader
from linkml_term_validator.plugins import BindingValidationPlugin
# Create plugin with options
plugin = BindingValidationPlugin(
validate_labels=True, # Check labels match ontology
strict=True, # Fail on non-existent terms (default)
cache_labels=True, # Cache lookups to disk
cache_dir="cache",
oak_config_path="oak_config.yaml",
)
# Create validator
validator = Validator(
schema="schema.yaml",
validation_plugins=[plugin]
)
# Validate
loader = YamlLoader("data.yaml")
report = validator.validate_source(loader, target_class="GeneAnnotation")
# Check results
if len(report.results) == 0:
print("Validation passed!")
else:
for result in report.results:
print(f"{result.severity.name}: {result.message}")
Common Issues and Solutions
Issue: "Term not found" for valid terms
Cause: The prefix isn't configured in oak_config.yaml
Solution: Add the prefix to your oak_config.yaml:
ontology_adapters:
GO: sqlite:obo:go # Add this line
Issue: Dynamic enum expansion is slow
Cause: Large ontologies take time to traverse
Solutions:
1. Enable caching with --cache-dir cache
2. Use more specific source nodes (lower in the hierarchy)
3. Consider using a local sqlite database instead of downloading
Issue: Unknown prefix warnings
Cause: Data contains prefixes not in your oak_config.yaml
Solution: Either add the prefix to oak_config.yaml or ignore if expected:
ontology_adapters:
KNOWN_PREFIX: sqlite:obo:known
# Unknown prefixes will be skipped with a warning
Validation Summary
| Validation Type | What it Catches | Flag |
|---|---|---|
| Binding + Dynamic Enum | Terms outside ontology branch | (default) |
| Strict Mode | Fabricated/non-existent term IDs | --no-lenient (default) |
| Label Validation | Mismatched labels | --labels |
Next Steps
- Binding Validation Reference - Complete reference
- Bindings Explained - Conceptual overview
- Anti-Hallucination Guardrails - Using validation with AI
- Configuration - oak_config.yaml options