YARRRML¶
YARRRML is a YAML-friendly syntax for RML mappings.
Note
Minimal generator. JSON-first. Good starting point for hand-tuning.
Example Output¶
Given a simple schema:
classes:
Person:
attributes:
id: {identifier: true}
name: {}
The generator produces YARRRML like:
prefixes:
ex: https://example.org/test#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
mappings:
Person:
sources:
- [data.json~jsonpath, '$[*]']
s: ex:$(id)
po:
- p: a
o: ex:Person
- p: ex:name
o: $(name)
CSV / TSV sources¶
Besides JSON, CSV/TSV is supported. The key differences:
No iterator is required for CSV/TSV (each row is a candidate).
sourcesmust be expressed as a list of lists for compatibility with common engines (e.g. Morph-KGC):mappings: Person: sources: - ['people.csv~csv'] # note the inner list s: ex:$(id) po: - p: a o: ex:Person - p: ex:name o: $(name)
Values come directly from columns via
$(column_name).For object slots (non-inlined references), IRIs are emitted:
- p: ex:employer o: value: $(employer) type: iri
Multivalued object slots are emitted as lists of IRIs:
- p: ex:friends o: - value: $(friends[*]) type: iri
TSV works via the same formulation (
~csv). Most engines auto-detect the tab separator for.tsvfiles. If an engine requires explicit delimiter/CSVW options, that is currently out of scope and can be handled manually in post-editing.
Source inference¶
If a file path is passed without a formulation suffix, the generator infers it automatically:
*.json→~jsonpath*.csv/*.tsv→~csv
Examples:
# JSON (iterator required)
linkml generate yarrrml schema.yaml > mappings.yml
linkml generate yarrrml schema.yaml --source data.json~jsonpath
linkml generate yarrrml schema.yaml --source data.json~jsonpath --iterator-template "$.{Class}[*]"
# CSV / TSV (no iterator)
linkml generate yarrrml schema.yaml --source people.csv
linkml generate yarrrml schema.yaml --source people.tsv~csv
# CLI alias (short form)
gen-yarrrml schema.yaml --source data.csv~csv > mappings.yml
Overview¶
one mapping per LinkML class
prefixes come from the schema (
prefixesalways emitted)if the schema has no default prefix, the generator automatically adds:
ex: https://example.org/default#
ensuring that all CURIEs can expand correctly
when
class_uriorslot_uriare defined, they are used verbatim forrdf:typeand predicate IRIs (including full IRIs if present)subject from identifier slot (else key; else safe fallback)
pofor all class attributes (slot aliases respected)emits
a(rdf:type) as CURIEs or IRIs (depending on availability), and automatically aggregates mixin classes into this arrayemits predicate-object mappings for identifier slots if they have an explicit
slot_uriJSON iterators are determined by the schema structure: - If a
tree_rootclass exists, the default iterator is the root object:sources: [[data.json~jsonpath, '$']]- If notree_rootis defined (flat arrays), the default iterator covers all items:sources: [[data.json~jsonpath, '$[*]']]preserves explicit XSD datatypes for slots (e.g., datatype: xsd:integer)
CSV/TSV:
sources: [[path~csv]](no iterator), values via$(column)a top-level
mappings:section is always included, even for minimal schemas
Command Line¶
linkml generate yarrrml path/to/schema.yaml > mappings.yml
# CSV instead of JSON:
linkml generate yarrrml path/to/schema.yaml --source data.csv~csv
# class-based JSON arrays:
linkml generate yarrrml path/to/schema.yaml --iterator-template "$.{Class}[*]"
# or short alias:
gen-yarrrml path/to/schema.yaml --source data.csv~csv > mappings.yml
Docs¶
CLI¶
gen-yarrrml¶
Generate YARRRML mappings from a LinkML schema.
gen-yarrrml [OPTIONS] YAMLFILE
Options
- --source <source>¶
YARRRML source shorthand, e.g., data.json~jsonpath or data.csv~csv (TSV works too)
- --iterator-template <iterator_template>¶
JSONPath iterator template; supports {Class}, default: “$[*]”
- -V, --version¶
Show the version and exit.
- -f, --format <format>¶
Output format
- Default:
'yml'- Options:
yml | yaml
- --metadata, --no-metadata¶
Include metadata in output
- Default:
True
- --useuris, --metauris¶
Use class and slot URIs over model uris
- Default:
True
- -im, --importmap <importmap>¶
Import mapping file
- --log_level <log_level>¶
Logging level
- Default:
'WARNING'- Options:
CRITICAL | ERROR | WARNING | INFO | DEBUG
- -v, --verbose¶
Verbosity. Takes precedence over –log_level.
- --mergeimports, --no-mergeimports¶
Merge imports into source file (default=mergeimports)
- --stacktrace, --no-stacktrace¶
Print a stack trace when an error occurs
- Default:
False
Arguments
- YAMLFILE¶
Required argument
Code¶
Limitations¶
JSON-first by default
One source per mapping
Classes without an identifier are assigned a fallback subject:
ex:<Class>/$(subject_id)Object slots: -
inlined: false→ emitted as IRIs -inlined: true→ emitted using YARRRMLmapping+condition(join) patternInlined objects without an identifier are assigned a synthetic IRI using the parent’s ID (e.g.
ex:Child_$(parent_id)). This ensures graph connectivity and avoids broken/orphaned triples during lifting, as some YARRRML implementations fail to execute joins (condition: equal) on Blank Nodes.However, multivalued inlined objects (lists) still strictly require an identifier.
An inline class without an identifier can only be used in a single owning class.
Iterators not derived from JSON Schema
No per-slot JSONPath/CSV expressions or functions
CSV/TSV supported via
--source; delimiter/custom CSVW options are not yet exposedThe deep scan iterator (
$..slot_name) used for inlined objects will grab all properties with that exact name, even if they are nested deeper inside each other. This can cause mapping collisions.Object slots that are not explicitly inlined (
inlined: falseor by default) require their target class to have an identifier (identifier: true). If you attempt to link to a class without an ID via standard IRI reference, the generator will raise an error.