Skip to content

Translations of LinkML to Datalog

LinkML is primarily a data modeling language in the vein of JSON-Schema, UML, or a shape language like SHACL. The core is deliberately simple and does not have complex semantics.

LinkML is also designed to be flexible, and there are extensions to the language that allow for more expressivity.

Inheritance

See: inheritance

is_a and mixin slots are used in inference of categories.

E.g. given:

classes:
  Person:
    is_a: NamedThing

the following datalog is exported:

NamedThing(i) :- Person(i)

This means someone querying data for instances of NamedThing would get instances of Person

of course, a transitive hierarchy can be specified.

Ranges

See: ranges

ranges are translated into validation checks.

E.g. given:

classes:
  Person:
    attributes:
      sibling_of: Person

We get:

validation_result('sh:ClassConstraintComponent', ...) :-
  Person(i), sibling_of(i,j), ! Person(j).

Inverses

Slots can be declared as inverses:

sibling_of:
    inverse: sibling_of

This will generate

sibling_of(i,j) :- sibling_of(j,i).

Transitive closures over

LinkML 1.2 will introduce transitive_form_of, to declare that one slot (e.g. ancestor_of) is the transitive form of another slot (e.g parent_of)

For now, you can get the same semantics from an annotation:

ancestor_of:
  annotations:
    transitive_closure_of: parent_of

This will generate

ancestor_of(i,j) :- parent_of(j,i).
ancestor_of(i,j) :- parent_of(j,z), ancestor_of(z,j).

Association classes

Compilation to datalog will also handle associative classes (e.g. reified statements). This is very useful when we want to be able to model associations such as familiar relationships or events such as marriage as first-class entities, but also have the convenience of a direct link:

given:

classes:
  Relationship:
    class_uri: rdf:Statement  ## REIFICATION
    slots:
      - started_at_time
      - ended_at_time
      - related_to
      - type
    slot_usage:
      related_to:
        slot_uri: rdf:object
      type:
        slot_uri: rdf:predicate

  FamilialRelationship:
    is_a: Relationship
    slot_usage:
      type:
        range: FamilialRelationshipType
        required: true
      related to:
        range: Person
        required: true

slots:
  sibling_of:
    inverse: sibling_of
    slot_uri: famrel:01

enums:
  FamilialRelationshipType:
    permissible_values:
      SIBLING_OF:
        meaning: famrel:01
      PARENT_OF:
        meaning: famrel:02
      CHILD_OF:
        meaning: famrel:03

this will assert a de-reified triple:

triple(i, p, v) :-
        triple(i, _container_prop, r),
        related_to(r, v),
        type(r, p).

Such that if you have instance data

id: P:002
    has_familial_relationships:
      - related_to: P:001
        type: SIBLING_OF

There will be an inferred fact:

sibling_of(P:002, P:001)

Slot logical characteristics

Additional characteristics can be specified as annotations

Supported:

  • transitive
  • reflexive
  • transitive_closure_of

Example:

  ancestor_of:
    annotations:
      transitive_closure_of: parent_of

these will be added as bona-fide metamodel slots in LinkML 1.2.

A special annotation is classified_from, this can be used to auto-classify using an enum based on another slot

slots:
  age_category:
    range: AgeCategory
    annotations:
      classified_from: age_in_years

enums:
  AgeCategory:
    permissible_values:
      adult:
        meaning: HsapDv:0000087
        annotations:
          expr: v >= 19
      infant:
        meaning: HsapDv:0000083
        annotations:
          expr: v >= 0, v <= 2
      adolescent:
        meaning: HsapDv:0000086
        annotations:
          expr: v >= 13, v <= 18

This can be used to auto-assign enums based on numeric values for age.

In LinkML 1.2 this will be done via classification rules.

Other metamodel translations

"informative" parts of the model intended for humans are not translated to datalog, as they have no logical entailments

But there are other constructs coming in LinkML 1.2

  • slot relational characteristics: transitivity, symmetry, ...
  • rich expression language
  • conditional rules
  • classification rules

These will all have direct translations for datalog. For now it is necessary to manually encoded datalog rules for these.