Extension Functions
linkml-map ships a curated set of safe built-in functions for use in
expressions. When you need a function that isn't built in,
you can register your own — without forking linkml-map or wrapping it in a
custom Python harness — by tagging plain Python functions with
@safe_function and pointing the CLI at the file.
Quick example
A user-supplied my_helpers.py:
from linkml_map.utils.extensions import safe_function
@safe_function
def normalize_taxon_id(s: str) -> str | None:
"""Strip the 'NCBI:' prefix and pad to 8 digits."""
if not s:
return None
raw = s.removeprefix("NCBI:").strip()
return f"NCBI:{int(raw):08d}"
Then in a trans-spec:
# required_extensions: my_helpers.py (convention; see below)
class_derivations:
Organism:
populated_from: SourceOrganism
slot_derivations:
tax_id:
expr: "normalize_taxon_id(taxon)"
And at the command line:
linkml-map map-data -s schema.yaml -T transform.yaml \
--functions ./my_helpers.py \
data.tsv -o out.jsonl
The flag is repeatable: pass --functions (or the short form -F) once per
extension file.
The @safe_function contract
Applying @safe_function is a declaration by the author that the function
is:
- Pure — no I/O, no network calls, no global state mutation
- Bounded-time — deterministic and fast; runs once per row in a transform
- Deterministic — same inputs produce same outputs
linkml-map does not verify these properties. The name "safe" reflects what
you are declaring about the function, not what linkml-map enforces. This is
the same posture as typing.final or @SafeVarargs in other ecosystems.
The trust model is identical to pip install: anything in a module you import
will run. If you're importing a third-party extension, treat it like any other
dependency.
When NOT to use extensions
Extensions are not an escape hatch for putting transformation logic in Python.
They exist for named atomic operations that read cleaner as a name than as
an expression chain — slugify(name) instead of
replace(replace(lower(strip(name)), ' ', '_'), ',', '').
If the function you're tempted to write is more than a few lines of pure data manipulation, ask first whether it belongs in the trans-spec or in the source/target schema. The declarative spec is the documentation of what the transformation does; pulling logic out into Python hides it from review.
Reserved names
A handful of names are injected per-call by the transformer (currently slot,
used inside expressions to reference a previously derived target slot). An
extension cannot define a function with one of these names — it would be
silently shadowed at evaluation time. load_extensions raises
ExtensionError on the attempt so the conflict shows up at load time
rather than as silent wrong behavior.
Override semantics
A @safe_function may shadow a built-in if you explicitly say so:
@safe_function(override=True)
def lower(s: str) -> str:
return s.casefold() # locale-aware, replaces the built-in str.lower
- Without
override=True: collision with a built-in raisesExtensionErrorat load time — protects against accidental shadowing from a typo. - With
override=Truebut no matching built-in: logged as a warning (still loaded) — useful as a typo catcher for the override case. - Collision between two extensions: always an error. Pick one.
There is no CLI flag to enable overrides. The decision lives on the function declaration, where the author is responsible for it.
List-style functions
By default, scalar functions distribute over lists and propagate None
(slugify([a, b, None]) → [slugify(a), slugify(b), None]). For functions
that legitimately accept a list as their first argument (aggregators, etc.),
opt out:
@safe_function(distributes=False)
def median(items: list[float]) -> float:
sorted_items = sorted(items)
return sorted_items[len(sorted_items) // 2]
Required-extension convention
A trans-spec that references an extension function won't run without
--functions. The runtime error is clear (Unknown function 'foo'. (If this
is a custom function, pass it via --functions <path>.)), but it's still
runtime. Until linkml-map gains a declarative required_extensions: key, the
convention is to note the dependency in a header comment on the spec:
# required_extensions:
# - my_helpers.py
#
id: https://example.org/my-transform
class_derivations:
...
Programmatic use
Python callers can skip the CLI and set extensions directly on the transformer:
from linkml_map.transformer.object_transformer import ObjectTransformer
from linkml_map.utils.extensions import load_extensions
ext = load_extensions(["./my_helpers.py"])
tr = ObjectTransformer(extension_functions=ext)
extension_functions accepts any dict[str, Callable], so you can also bypass
the loader entirely and hand-build the dict if you prefer (skipping the
decorator-tagging step).
API reference
Extension surface for downstream-supplied safe functions.
Trans-spec authors register custom functions into the eval namespace by writing
a Python module and tagging functions with :func:safe_function. linkml-map
loads these via the -F/--functions CLI flag (repeatable) or the
extension_functions kwarg on :class:~linkml_map.transformer.object_transformer.ObjectTransformer.
The decorator is a declaration by the author that the function is pure,
bounded-time, and free of I/O. linkml-map does not verify this — the safety
boundary is the named namespace, not sandboxed execution. Same posture as
:func:typing.final.
Example
A user-supplied my_helpers.py::
from linkml_map.utils.extensions import safe_function
@safe_function
def slugify(s, separator="_"):
...
@safe_function(override=True) # explicit shadowing of a built-in
def lower(s):
...
@safe_function(distributes=False) # list-style; opts out of scalar distribution
def my_aggregator(items):
...
Then::
linkml-map map-data ... --functions ./my_helpers.py
Semantics
- Collision between two extensions → :class:
ExtensionError. - Collision with a built-in without
override=True→ :class:ExtensionError. override=Truedeclared but no matching built-in →logging.warning.- Missing extension file → :class:
ExtensionError.
For the contract authors are declaring, see docs/api/extensions.md.
ExtensionError
Bases: Exception
Raised when loading an extension function module fails.
Source code in src/linkml_map/utils/extensions.py
67 68 | |
load_extensions(paths)
Load tagged functions from a list of file paths into one merged dict.
Applies the scalar-distributing wrapper to functions declared with
distributes=True (the default), so they broadcast over lists and
propagate None consistently with the built-ins.
:param paths: Iterable of file paths to .py modules with tagged functions.
:returns: Mapping of name → callable ready to merge into
ObjectTransformer.extension_functions.
:raises ExtensionError: On missing file, name collision between extensions,
or attempt to shadow a built-in without override=True.
Source code in src/linkml_map/utils/extensions.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | |
safe_function(func=None, *, override=False, distributes=True)
Tag a function for inclusion in the safe-function namespace.
Applying this decorator is a declaration by the author that the function is pure, bounded-time, and free of I/O. linkml-map does not verify these properties.
Usable bare or with kwargs::
@safe_function
def slugify(s): ...
@safe_function(override=True)
def lower(s): ...
:param override: Allow shadowing a built-in of the same name. Without this,
a collision with a built-in raises :class:ExtensionError at load time.
:param distributes: Apply the scalar-distributing wrapper (broadcasts over
lists, propagates None). Default True; set False for
functions that accept a list as their first argument.
Source code in src/linkml_map/utils/extensions.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |