.. _generators/yarrrml:
YARRRML
=======
`YARRRML `_ is a YAML-friendly syntax for RML mappings.
.. note::
Minimal generator. JSON-first. Good starting point for hand-tuning.
Example Output
--------------
Given a simple schema:
.. code-block:: yaml
classes:
Person:
attributes:
id: {identifier: true}
name: {}
The generator produces YARRRML like:
.. code-block:: yaml
prefixes:
ex: https://example.org/test#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
mappings:
Person:
sources:
- - data.json~jsonpath
- $.items[*]
s: ex:$(id)
po:
- p: rdf:type
o: ex:Person
- p: ex:name
o: $(name)
CSV / TSV sources
-----------------
Besides JSON, CSV/TSV is supported. The key differences:
- No iterator is required for CSV/TSV (each row is a candidate).
- ``sources`` must be expressed as a **list of lists** for compatibility with common engines (e.g. Morph-KGC):
.. code-block:: yaml
mappings:
Person:
sources:
- ['people.csv~csv'] # note the inner list
s: ex:$(id)
po:
- p: rdf:type
o: ex:Person
- p: ex:name
o: $(name)
- Values come directly from columns via ``$(column_name)``.
- For object slots (non-inlined references), IRIs are emitted:
.. code-block:: yaml
- p: ex:employer
o:
value: $(employer)
type: iri
- Multivalued object slots are emitted as lists of IRIs:
.. code-block:: yaml
- p: ex:friends
o:
- value: $(friends[*])
type: iri
- TSV works via the same formulation (``~csv``). Most engines auto-detect the tab separator for ``.tsv`` files. If an engine requires explicit delimiter/CSVW options, that is currently out of scope and can be handled manually in post-editing.
Source inference
----------------
If a file path is passed without a formulation suffix, the generator infers it automatically:
- ``*.json`` → ``~jsonpath``
- ``*.csv`` / ``*.tsv`` → ``~csv``
Examples:
.. code:: bash
# JSON (iterator required)
linkml generate yarrrml schema.yaml > mappings.yml
linkml generate yarrrml schema.yaml --source data.json~jsonpath
linkml generate yarrrml schema.yaml --source data.json~jsonpath --iterator-template "$.{Class}[*]"
# CSV / TSV (no iterator)
linkml generate yarrrml schema.yaml --source people.csv
linkml generate yarrrml schema.yaml --source people.tsv~csv
# CLI alias (short form)
gen-yarrrml schema.yaml --source data.csv~csv > mappings.yml
Overview
--------
- one mapping per LinkML class
- prefixes come from the schema (``prefixes`` always emitted)
- if the schema has **no default prefix**, the generator automatically adds:
.. code-block:: yaml
ex: https://example.org/default#
ensuring that all CURIEs can expand correctly
- when ``class_uri`` or ``slot_uri`` are defined, they are used **verbatim** for
``rdf:type`` and predicate IRIs (including full IRIs if present)
- subject from identifier slot (else key; else safe fallback)
- ``po`` for all class attributes (slot aliases respected)
- emits ``rdf:type`` as CURIEs or IRIs (depending on availability)
- JSON by default: ``sources: [[data.json~jsonpath, $.items[*]]]``
- CSV/TSV: ``sources: [[path~csv]]`` (no iterator), values via ``$(column)``
- a top-level ``mappings:`` section is **always** included, even for minimal schemas
Command Line
------------
.. code:: bash
linkml generate yarrrml path/to/schema.yaml > mappings.yml
# CSV instead of JSON:
linkml generate yarrrml path/to/schema.yaml --source data.csv~csv
# class-based JSON arrays:
linkml generate yarrrml path/to/schema.yaml --iterator-template "$.{Class}[*]"
# or short alias:
gen-yarrrml path/to/schema.yaml --source data.csv~csv > mappings.yml
Docs
----
CLI
^^^
.. click:: linkml.generators.yarrrmlgen:cli
:prog: gen-yarrrml
:nested: short
Code
^^^^
.. currentmodule:: linkml.generators.yarrrmlgen
.. autoclass:: YarrrmlGenerator
:members: serialize
Limitations
-----------
- JSON-first by default
- One source per mapping
- Classes without an identifier are **assigned a fallback subject**: ``ex:/$(subject_id)``
- Object slots:
- ``inlined: false`` → emitted as IRIs
- ``inlined: true`` → emitted using YARRRML ``mapping`` + ``condition`` (join) pattern
- Inline classes require an identifier (or key) to support join-based linking
- An inline class can only be used in a single owning class.
Multiple inline usages of the same class are not supported,
as each mapping can define only one source/iterator.
- Iterators not derived from JSON Schema
- No per-slot JSONPath/CSV expressions or functions
- CSV/TSV supported via ``--source``; delimiter/custom CSVW options are not yet exposed