YARRRML

YARRRML is a YAML-friendly syntax for RML mappings.

Note

Minimal generator. JSON-first. Good starting point for hand-tuning.

Example Output

Given a simple schema:

classes:
  Person:
    attributes:
      id: {identifier: true}
      name: {}

The generator produces YARRRML like:

prefixes:
  ex: https://example.org/test#
  rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
mappings:
  Person:
    sources:
      - [data.json~jsonpath, '$[*]']
    s: ex:$(id)
    po:
      - p: a
        o: ex:Person
      - p: ex:name
        o: $(name)

CSV / TSV sources

Besides JSON, CSV/TSV is supported. The key differences:

  • No iterator is required for CSV/TSV (each row is a candidate).

  • sources must be expressed as a list of lists for compatibility with common engines (e.g. Morph-KGC):

    mappings:
      Person:
        sources:
          - ['people.csv~csv']   # note the inner list
        s: ex:$(id)
        po:
          - p: a
            o: ex:Person
          - p: ex:name
            o: $(name)
    
  • Values come directly from columns via $(column_name).

  • For object slots (non-inlined references), IRIs are emitted:

    - p: ex:employer
      o:
        value: $(employer)
        type: iri
    
  • Multivalued object slots are emitted as lists of IRIs:

    - p: ex:friends
      o:
        - value: $(friends[*])
          type: iri
    
  • TSV works via the same formulation (~csv). Most engines auto-detect the tab separator for .tsv files. If an engine requires explicit delimiter/CSVW options, that is currently out of scope and can be handled manually in post-editing.

Source inference

If a file path is passed without a formulation suffix, the generator infers it automatically:

  • *.json~jsonpath

  • *.csv / *.tsv~csv

Examples:

# JSON (iterator required)
linkml generate yarrrml schema.yaml > mappings.yml
linkml generate yarrrml schema.yaml --source data.json~jsonpath
linkml generate yarrrml schema.yaml --source data.json~jsonpath --iterator-template "$.{Class}[*]"

# CSV / TSV (no iterator)
linkml generate yarrrml schema.yaml --source people.csv
linkml generate yarrrml schema.yaml --source people.tsv~csv

# CLI alias (short form)
gen-yarrrml schema.yaml --source data.csv~csv > mappings.yml

Overview

  • one mapping per LinkML class

  • prefixes come from the schema (prefixes always emitted)

  • if the schema has no default prefix, the generator automatically adds:

    ex: https://example.org/default#
    

    ensuring that all CURIEs can expand correctly

  • when class_uri or slot_uri are defined, they are used verbatim for rdf:type and predicate IRIs (including full IRIs if present)

  • subject from identifier slot (else key; else safe fallback)

  • po for all class attributes (slot aliases respected)

  • emits a (rdf:type) as CURIEs or IRIs (depending on availability), and automatically aggregates mixin classes into this array

  • emits predicate-object mappings for identifier slots if they have an explicit slot_uri

  • JSON iterators are determined by the schema structure: - If a tree_root class exists, the default iterator is the root object: sources: [[data.json~jsonpath, '$']] - If no tree_root is defined (flat arrays), the default iterator covers all items: sources: [[data.json~jsonpath, '$[*]']]

  • preserves explicit XSD datatypes for slots (e.g., datatype: xsd:integer)

  • CSV/TSV: sources: [[path~csv]] (no iterator), values via $(column)

  • a top-level mappings: section is always included, even for minimal schemas

Command Line

linkml generate yarrrml path/to/schema.yaml > mappings.yml
# CSV instead of JSON:
linkml generate yarrrml path/to/schema.yaml --source data.csv~csv
# class-based JSON arrays:
linkml generate yarrrml path/to/schema.yaml --iterator-template "$.{Class}[*]"
# or short alias:
gen-yarrrml path/to/schema.yaml --source data.csv~csv > mappings.yml

Docs

CLI

gen-yarrrml

Generate YARRRML mappings from a LinkML schema.

gen-yarrrml [OPTIONS] YAMLFILE

Options

--source <source>

YARRRML source shorthand, e.g., data.json~jsonpath or data.csv~csv (TSV works too)

--iterator-template <iterator_template>

JSONPath iterator template; supports {Class}, default: “$[*]”

-V, --version

Show the version and exit.

-f, --format <format>

Output format

Default:

'yml'

Options:

yml | yaml

--metadata, --no-metadata

Include metadata in output

Default:

True

--useuris, --metauris

Use class and slot URIs over model uris

Default:

True

-im, --importmap <importmap>

Import mapping file

--log_level <log_level>

Logging level

Default:

'WARNING'

Options:

CRITICAL | ERROR | WARNING | INFO | DEBUG

-v, --verbose

Verbosity. Takes precedence over –log_level.

--mergeimports, --no-mergeimports

Merge imports into source file (default=mergeimports)

--stacktrace, --no-stacktrace

Print a stack trace when an error occurs

Default:

False

Arguments

YAMLFILE

Required argument

Code

class linkml.generators.yarrrmlgen.YarrrmlGenerator(schema: str | TextIO | SchemaDefinition, format: str = 'yml', **kwargs)[source]
serialize(**args) str[source]

Generate output in the required format

Parameters:

kwargs – Generator specific parameters

Returns:

Generated output

Limitations

  • JSON-first by default

  • One source per mapping

  • Classes without an identifier are assigned a fallback subject: ex:<Class>/$(subject_id)

  • Object slots: - inlined: false → emitted as IRIs - inlined: true → emitted using YARRRML mapping + condition (join) pattern

  • Inlined objects without an identifier are assigned a synthetic IRI using the parent’s ID (e.g. ex:Child_$(parent_id)). This ensures graph connectivity and avoids broken/orphaned triples during lifting, as some YARRRML implementations fail to execute joins (condition: equal) on Blank Nodes.

  • However, multivalued inlined objects (lists) still strictly require an identifier.

  • An inline class without an identifier can only be used in a single owning class.

  • Iterators not derived from JSON Schema

  • No per-slot JSONPath/CSV expressions or functions

  • CSV/TSV supported via --source; delimiter/custom CSVW options are not yet exposed

  • The deep scan iterator ($..slot_name) used for inlined objects will grab all properties with that exact name, even if they are nested deeper inside each other. This can cause mapping collisions.

  • Object slots that are not explicitly inlined (inlined: false or by default) require their target class to have an identifier (identifier: true). If you attempt to link to a class without an ID via standard IRI reference, the generator will raise an error.