Validating Text Against DOIs
This guide shows how to validate supporting text against publications using Digital Object Identifiers (DOIs).
Overview
DOIs are persistent identifiers for digital objects, commonly used for journal articles. The validator fetches publication metadata from the Crossref API when you provide a DOI reference.
Basic Usage
Validate a Single Quote
linkml-reference-validator validate text \
"Nanometre-scale thermometry" \
DOI:10.1038/nature12373
Output:
Validating text against DOI:10.1038/nature12373...
Text: Nanometre-scale thermometry
Result:
Valid: True
Message: Supporting text validated successfully in DOI:10.1038/nature12373
DOI Format
DOIs should be prefixed with DOI::
DOI:10.1038/nature12373
DOI:10.1126/science.1234567
DOI:10.1016/j.cell.2023.01.001
The DOI itself follows the standard format: 10.prefix/suffix
Pre-caching DOIs
For offline validation or to speed up repeated validations:
linkml-reference-validator cache reference DOI:10.1038/nature12373
Output:
Fetching DOI:10.1038/nature12373...
Successfully cached DOI:10.1038/nature12373
Title: Nanometre-scale thermometry in a living cell
Authors: G. Kucsko, P. C. Maurer, N. Y. Yao
Content type: abstract_only
Content length: 1234 characters
Cached references are stored in references_cache/ as markdown files with YAML frontmatter.
Using DOIs in Data Files
DOIs work the same as PMIDs in LinkML data files:
schema.yaml:
id: https://example.org/my-schema
name: my-schema
prefixes:
linkml: https://w3id.org/linkml/
classes:
Statement:
attributes:
id:
identifier: true
supporting_text:
slot_uri: linkml:excerpt
reference:
slot_uri: linkml:authoritative_reference
data.yaml:
- id: stmt1
supporting_text: Nanometre-scale thermometry
reference: DOI:10.1038/nature12373
- id: stmt2
supporting_text: MUC1 oncoprotein blocks nuclear targeting
reference: PMID:16888623
Validate:
linkml-reference-validator validate data \
data.yaml \
--schema schema.yaml \
--target-class Statement
You can mix DOIs and PMIDs in the same data file.
Repairing DOI References
The repair command also works with DOIs:
linkml-reference-validator repair text \
"Nanometre scale thermometry" \
DOI:10.1038/nature12373
DOI vs PMID: When to Use Each
| Feature | PMID | DOI |
|---|---|---|
| Source | NCBI PubMed | Crossref |
| Coverage | Biomedical literature | All scholarly content |
| Full text | Via PMC when available | Metadata only |
| Abstract | Usually available | Depends on publisher |
Use PMID when: - Working with biomedical/life science literature - Full text access is important - The article is indexed in PubMed
Use DOI when: - The article is not in PubMed - Working with non-biomedical journals - The DOI is more readily available
Content Availability
Unlike PMIDs which often provide abstracts, DOI metadata from Crossref may have limited content:
- Title: Always available
- Authors: Usually available
- Abstract: Depends on publisher policy
- Full text: Not available via Crossref
If the abstract is not available, validation will be limited to matching against the title and other metadata.
Troubleshooting
"Content type: unavailable"
This means Crossref returned metadata but no abstract. The DOI was fetched successfully, but validation may fail if your text doesn't match the title.
Solution: Consider using the PMID if the article is in PubMed.
"Failed to fetch DOI"
The DOI may be invalid or the Crossref API may be temporarily unavailable.
Check:
1. Verify the DOI format (should be 10.prefix/suffix)
2. Test the DOI at https://doi.org/YOUR_DOI
3. Try again later if Crossref is rate-limiting
Rate Limiting
The validator automatically respects Crossref rate limits. For bulk operations, consider:
- Pre-caching references before validation
- Using a polite pool (add your email in config for higher limits)
See Also
- Quickstart - Getting started with validation
- CLI Reference - Complete command documentation
- Validating OBO Files - Working with ontology files