LinkML-Map tutorial
This tutorial walks through basic programmatic use of the LinkML-Map framework. This is intended for Python developers - note that many of the operations here can also be performed at the command line.
import yaml
Creating an example schema
We will use a LinkML SchemaBuilder object to progressively build up a schema, adding additional features as we go.
We'll start with a simple Person
schema, with a few single valued scalar slots:
from linkml.utils.schema_builder import SchemaBuilder
from linkml_runtime.linkml_model import SlotDefinition
sb = SchemaBuilder()
sb.add_class("Person", slots=[SlotDefinition("family_name", range="string"),
SlotDefinition("given_name", range="string"),
SlotDefinition("age_in_years", range="integer"),
SlotDefinition("height_in_cm", range="float"),
])
sb.add_defaults()
print(yaml.dump(sb.as_dict(), sort_keys=False))
Creating a Transformer Session object
We will use a Session
object which conveniently wraps a number of different capabilities;
the first of these capabilities is to map (transform) data objects from one schema to another
(implicit) schema using a transformer specification).
Our initial transformer specification will be a trivial isomorphic one that:
- maps the
Person
class to anIndividual
class - passes through
name
fields as-is - renames measurement fields (
age_in_years
andheight_in_cm
toage
andheight
)
from linkml_map.session import Session
session = Session()
session.set_source_schema(sb.as_dict())
# Transformer specification (in YAML)
session.set_object_transformer("""
class_derivations:
Individual:
populated_from: Person
slot_derivations:
family_name:
populated_from: family_name
given_name:
populated_from: given_name
age:
populated_from: age_in_years
height:
populated_from: height_in_cm
""")
Visualizing transformer specifications
We can visualize the transformer specification using graphviz:
session.graphviz()
Transforming objects
We'll next make a simple Person
object. Note that for simplicity we are specifying this
using a Python dictionary. The framework also works with objects instantiating either
Pydantic or Dataclasses classes (use the transform_object
method instead of transform
).
obj = {
"given_name": "Jane",
"family_name": "Doe",
"age_in_years": 42,
"height_in_cm": 180.0,
}
session.transform(obj)
This does what we expect - it renames the two fields, but leaves all values intact.
Note that because we use using dictionaries here, the renaming of the class has no effect, as this is implicit with JSON/dictionaries.
TODO: docs on type designator fields
For command line users, the same thing can be achieved with the map-data
command.
Deriving target schemas
LinkML-Transformer is intended as a declarative framework, in contrast to writing Python transformation code. This allows tools to introspect mappings and perform other kinds of inference. An example of this is deriving the (implicit) target schema
Here we use the target_schema
method on the session object to derive the target schema:
from linkml_runtime.dumpers import yaml_dumper
print(yaml_dumper.dumps(session.target_schema))
As expected, this is isomorphic to the original (source) schema, with fields and classes renamed.
Using Expressions
In addition to renaming fields, we can derive field values via evaluation of function expressions.
You are encouraged to follow the subset of Python defined by the LinkML expression language. This provides both safety, and declarativity. However, if you need to, you can include arbitrary Python code, provided you configure the session to allow this.
We'll keep the original schema, and will provide a new Transformer specification, giving an example of both string manipulation functions and arithmetic functions; the latter perform unit conversions (later on we will see more flexible and declarative ways to perform unit conversions).
session.set_object_transformer("""
class_derivations:
Individual:
populated_from: Person
slot_derivations:
name:
expr: "{given_name} + ' ' + {family_name}"
description: Concatenating given and family names
note this is a bad assumption for names in general,
this is just for demonstration
age_in_months:
expr: age_in_years * 12
height_in_meters:
expr: height_in_cm / 100
""")
Note that when we visualize this specification, dotted lines are shown indicating a relationship between source and target that is different from direct copy:
session.graphviz()
Now we'll transform the same object as before, and see the results:
session.transform(obj)
As expected, we concatenated the name fields, and converted the age and height fields to different units.
Let's take a look at the derived schema for this new transformation:
print(yaml_dumper.dumps(session.target_schema))
Note that at this time, deriving ranges using expressions is not supported, so the two measurement fields
are erroneously typed as having the default_range
of string
. However, in principle, if you use the
linkml subset of Python it should be possible to infer the range of the derived field, and this may be added
in future versions. Currently the tool chain is at an early stage of development.
Unit conversions
Next we will look at a different way of doing unit conversions. The LinkML specification allows
schemas to explicitly declare the units of slots, so let's modify our schema to do this, adding
a UCUM code for our height_in_cm
slot:
from linkml_runtime.linkml_model.units import UnitOfMeasure
sb.schema.slots['height_in_cm'].unit = UnitOfMeasure(ucum_code='cm')
session.set_source_schema(sb.as_dict())
print(yaml.dump(sb.as_dict(), sort_keys=False))
Adding target_unit to transformer specification
We will create a new transformer specification, focusing on the height_in_cm
field. We will
transform this into a height_in_meters
field, and will use the target_unit
field to specify
the target unit.
session.set_object_transformer("""
class_derivations:
Individual:
populated_from: Person
slot_derivations:
name:
expr: "{given_name} + ' ' + {family_name}"
height_in_meters:
populated_from: height_in_cm
unit_conversion:
target_unit: m
""")
session.transform(obj)
Units in derived schema
Next we'll look at the derived target schema, and as expected we see that it has inferred
the target unit for the height_in_meters
field:
print(yaml_dumper.dumps(session.target_schema))
Tabular serialization
slot = sb.add_slot("aliases", multivalued=True, range="string", replace_if_present=True)
sb.schema.classes['Person'].slots.append(slot.name)
session.set_source_schema(sb.as_dict())
session.set_object_transformer("""
class_derivations:
Individual:
populated_from: Person
slot_derivations:
family_name:
populated_from: family_name
given_name:
populated_from: given_name
age:
populated_from: age_in_years
height:
populated_from: height_in_cm
aliases:
populated_from: aliases
stringification:
delimiter: '|'
""")
obj = {
"given_name": "Jane",
"family_name": "Doe",
"age_in_years": 42,
"height_in_cm": 180.0,
"aliases": ["Jane", "Janie", "Janey"]
}
flattened = session.transform(obj)
flattened
this can easily be serialized to a CSV/TSV
Reverse transform
If a transform does not contain one-way functions, it can be reversed.
In this case, reversing the transform allows us to map from the tabular form back to the richer original representation.
session.reverse_transform(flattened)