Extension Functions

linkml-map ships a curated set of safe built-in functions for use in expressions. When you need a function that isn't built in, you can register your own — without forking linkml-map or wrapping it in a custom Python harness — by tagging plain Python functions with @safe_function and pointing the CLI at the file.

Quick example

A user-supplied my_helpers.py:

from linkml_map.utils.extensions import safe_function


@safe_function
def normalize_taxon_id(s: str) -> str | None:
    """Strip the 'NCBI:' prefix and pad to 8 digits."""
    if not s:
        return None
    raw = s.removeprefix("NCBI:").strip()
    return f"NCBI:{int(raw):08d}"

Then in a trans-spec:

# required_extensions: my_helpers.py  (convention; see below)
class_derivations:
  Organism:
    populated_from: SourceOrganism
    slot_derivations:
      tax_id:
        expr: "normalize_taxon_id(taxon)"

And at the command line:

linkml-map map-data -s schema.yaml -T transform.yaml \
  --functions ./my_helpers.py \
  data.tsv -o out.jsonl

The flag is repeatable: pass --functions (or the short form -F) once per extension file.

The @safe_function contract

Applying @safe_function is a declaration by the author that the function is:

linkml-map does not verify these properties. The name "safe" reflects what you are declaring about the function, not what linkml-map enforces. This is the same posture as typing.final or @SafeVarargs in other ecosystems.

The trust model is identical to pip install: anything in a module you import will run. If you're importing a third-party extension, treat it like any other dependency.

When NOT to use extensions

Extensions are not an escape hatch for putting transformation logic in Python. They exist for named atomic operations that read cleaner as a name than as an expression chain — slugify(name) instead of replace(replace(lower(strip(name)), ' ', '_'), ',', '').

If the function you're tempted to write is more than a few lines of pure data manipulation, ask first whether it belongs in the trans-spec or in the source/target schema. The declarative spec is the documentation of what the transformation does; pulling logic out into Python hides it from review.

Reserved names

A handful of names are injected per-call by the transformer (currently slot, used inside expressions to reference a previously derived target slot). An extension cannot define a function with one of these names — it would be silently shadowed at evaluation time. load_extensions raises ExtensionError on the attempt so the conflict shows up at load time rather than as silent wrong behavior.

Override semantics

A @safe_function may shadow a built-in if you explicitly say so:

@safe_function(override=True)
def lower(s: str) -> str:
    return s.casefold()  # locale-aware, replaces the built-in str.lower

There is no CLI flag to enable overrides. The decision lives on the function declaration, where the author is responsible for it.

List-style functions

By default, scalar functions distribute over lists and propagate None (slugify([a, b, None])[slugify(a), slugify(b), None]). For functions that legitimately accept a list as their first argument (aggregators, etc.), opt out:

@safe_function(distributes=False)
def median(items: list[float]) -> float:
    sorted_items = sorted(items)
    return sorted_items[len(sorted_items) // 2]

Required-extension convention

A trans-spec that references an extension function won't run without --functions. The runtime error is clear (Unknown function 'foo'. (If this is a custom function, pass it via --functions <path>.)), but it's still runtime. Until linkml-map gains a declarative required_extensions: key, the convention is to note the dependency in a header comment on the spec:

# required_extensions:
#   - my_helpers.py
#
id: https://example.org/my-transform
class_derivations:
  ...

Programmatic use

Python callers can skip the CLI and set extensions directly on the transformer:

from linkml_map.transformer.object_transformer import ObjectTransformer
from linkml_map.utils.extensions import load_extensions

ext = load_extensions(["./my_helpers.py"])
tr = ObjectTransformer(extension_functions=ext)

extension_functions accepts any dict[str, Callable], so you can also bypass the loader entirely and hand-build the dict if you prefer (skipping the decorator-tagging step).

API reference

Extension surface for downstream-supplied safe functions.

Trans-spec authors register custom functions into the eval namespace by writing a Python module and tagging functions with :func:safe_function. linkml-map loads these via the -F/--functions CLI flag (repeatable) or the extension_functions kwarg on :class:~linkml_map.transformer.object_transformer.ObjectTransformer.

The decorator is a declaration by the author that the function is pure, bounded-time, and free of I/O. linkml-map does not verify this — the safety boundary is the named namespace, not sandboxed execution. Same posture as :func:typing.final.

Example

A user-supplied my_helpers.py::

from linkml_map.utils.extensions import safe_function

@safe_function
def slugify(s, separator="_"):
    ...

@safe_function(override=True)  # explicit shadowing of a built-in
def lower(s):
    ...

@safe_function(distributes=False)  # list-style; opts out of scalar distribution
def my_aggregator(items):
    ...

Then::

linkml-map map-data ... --functions ./my_helpers.py

Semantics

  • Collision between two extensions → :class:ExtensionError.
  • Collision with a built-in without override=True → :class:ExtensionError.
  • override=True declared but no matching built-in → logging.warning.
  • Missing extension file → :class:ExtensionError.

For the contract authors are declaring, see docs/api/extensions.md.

ExtensionError

Bases: Exception

Raised when loading an extension function module fails.

Source code in src/linkml_map/utils/extensions.py
67
68
class ExtensionError(Exception):
    """Raised when loading an extension function module fails."""

load_extensions(paths)

Load tagged functions from a list of file paths into one merged dict.

Applies the scalar-distributing wrapper to functions declared with distributes=True (the default), so they broadcast over lists and propagate None consistently with the built-ins.

:param paths: Iterable of file paths to .py modules with tagged functions. :returns: Mapping of name → callable ready to merge into ObjectTransformer.extension_functions. :raises ExtensionError: On missing file, name collision between extensions, or attempt to shadow a built-in without override=True.

Source code in src/linkml_map/utils/extensions.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
def load_extensions(paths: Iterable[str | Path]) -> dict[str, Callable]:
    """Load tagged functions from a list of file paths into one merged dict.

    Applies the scalar-distributing wrapper to functions declared with
    ``distributes=True`` (the default), so they broadcast over lists and
    propagate ``None`` consistently with the built-ins.

    :param paths: Iterable of file paths to ``.py`` modules with tagged functions.
    :returns: Mapping of ``name → callable`` ready to merge into
        ``ObjectTransformer.extension_functions``.
    :raises ExtensionError: On missing file, name collision between extensions,
        or attempt to shadow a built-in without ``override=True``.
    """
    merged: dict[str, Callable] = {}
    sources: dict[str, Path] = {}

    for raw_path in paths:
        path = Path(raw_path).resolve()
        module = _load_module_from_path(path)
        tagged = _collect_tagged_functions(module)

        for name, meta in tagged.items():
            if name in _RESERVED_NAMES:
                msg = (
                    f"Extension name {name!r} from {path} is reserved — the "
                    f"transformer injects it per-call, so it would silently "
                    f"shadow your extension. Pick a different name."
                )
                raise ExtensionError(msg)
            if name in merged:
                msg = f"Extension name collision: {name!r} defined in both {sources[name]} and {path}"
                raise ExtensionError(msg)
            if name in FUNCTIONS and not meta["override"]:
                msg = (
                    f"Extension {name!r} from {path} shadows a built-in of the "
                    f"same name. Declare ``@safe_function(override=True)`` if intentional."
                )
                raise ExtensionError(msg)
            if meta["override"] and name not in FUNCTIONS:
                logger.warning(
                    "Extension %r from %s declared override=True but no built-in %r exists",
                    name,
                    path,
                    name,
                )

            fn = meta["func"]
            if meta["distributes"]:
                fn = _distributing(fn)
            merged[name] = fn
            sources[name] = path

    return merged

safe_function(func=None, *, override=False, distributes=True)

Tag a function for inclusion in the safe-function namespace.

Applying this decorator is a declaration by the author that the function is pure, bounded-time, and free of I/O. linkml-map does not verify these properties.

Usable bare or with kwargs::

@safe_function
def slugify(s): ...

@safe_function(override=True)
def lower(s): ...

:param override: Allow shadowing a built-in of the same name. Without this, a collision with a built-in raises :class:ExtensionError at load time. :param distributes: Apply the scalar-distributing wrapper (broadcasts over lists, propagates None). Default True; set False for functions that accept a list as their first argument.

Source code in src/linkml_map/utils/extensions.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def safe_function(
    func: Callable | None = None,
    *,
    override: bool = False,
    distributes: bool = True,
) -> Callable:
    """Tag a function for inclusion in the safe-function namespace.

    Applying this decorator is a **declaration by the author** that the
    function is pure, bounded-time, and free of I/O. linkml-map does not
    verify these properties.

    Usable bare or with kwargs::

        @safe_function
        def slugify(s): ...

        @safe_function(override=True)
        def lower(s): ...

    :param override: Allow shadowing a built-in of the same name. Without this,
        a collision with a built-in raises :class:`ExtensionError` at load time.
    :param distributes: Apply the scalar-distributing wrapper (broadcasts over
        lists, propagates ``None``). Default ``True``; set ``False`` for
        functions that accept a list as their first argument.
    """

    def _tag(fn: Callable) -> Callable:
        setattr(fn, _SAFE_FUNCTION_ATTR, {"override": override, "distributes": distributes})
        return fn

    if func is not None:
        # Bare ``@safe_function`` with no parentheses.
        return _tag(func)
    return _tag