Validation using Schemas
Validation is a procedure that takes as input a LinkML instance and a schema, and will run a collection of checks against that instance.
The validation procedure will produce output that can be used to determine if the instance is structurally and semantically valid according to the schema.
The formal specification of the validation procedure takes as input a derived schema mD:
Formally, the validation is performed against the abstract instance model, but validator
implementations may choose to perform additional validation checks against the concrete
serialization. This includes checking collections using inlined_as_dict
.
Actual implementations may choose to perform this composition or work directly on the asserted schema.
The following holds for any validation procedure:
- The output MUST include a boolean slot indicating whether the input can be demonstrated to be false
- The output SHOULD include additional information indicating types of validation errors and where they occur
- The output SHOULD be conformant with the LinkML validation schema.
- The output MAY also return cases where recommendations are not adhered to
- The output MAY also be combined with parsing to yield the precise location (i.e. line numbers) in the source serialization document where problems are found.
- The procedure MAY restrict validation to defined subsets (profiles) of the Metamodel
- The procedure SHOULD return in its payload an indication of which profile and version is used.
Types of checks
Check | Type | Description | SHACL |
---|---|---|---|
Required |
ERROR | MinCountConstraintComponent |
|
Recommended |
WARNING | MinCountConstraintComponent |
|
Singlevalued |
ERROR | MaxCountConstraintComponent |
|
Multivalued |
ERROR | MinCountConstraintComponent |
|
Inlined |
ERROR | ||
Referenced |
ERROR | ||
ClassRange |
ERROR | class matches range | ClassConstraintComponent |
Datatype |
ERROR | datatype matches range | DatatypeConstraintComponent |
NodeKind |
ERROR | range metatype | NodeKindConstraintComponent |
MinimumValue |
ERROR | MinInclusiveConstraintComponent |
|
MaximumValue |
ERROR | MaxInclusiveConstraintComponent |
|
Pattern |
ERROR | PatternConstraintComponent |
|
EqualsExpression |
INFERENCE | EqualsConstraintComponent |
|
StringSerialization |
INFERENCE | EqualsConstraintComponent |
|
TypeDesignator |
INFERENCE |
For the INFERENCE
type, the validation procedure MAY fill in missing values in the instance.
There is only an error if the inferred value is not consistent with the asserted value.
Validation procedure for instances
Validate(i, m, t):
mD = DeriveSchema(m)
s = new SlotDefinition(range=t)
for check in Checks:
if (s,i) matches check:
yield check
if i == <Class>(a1, ..., an):
for s' = i' in a1, ..., an:
Validate(i', mD, DerivedSlot(s', <Class>))
Matches are performed against the tables below. The element i
is checked against
the Element
column. If the comparison type value T is =
, this must be identical. If i
is a Collection,
then the match is performed against all members of the collection.
The notation [...]
indicates a collection with at least one value.
Core checks
The following core checks apply to multiple instance definition types.
T | Element | Check Name | Fail Condition |
---|---|---|---|
= |
None |
Required |
<slot>.required=True |
= |
[] |
Required |
<slot>.required=True |
= |
None |
Recommended |
<slot>.required=True |
= |
[] |
Recommended |
<slot>.required=True |
= |
[...] |
Singlevalued |
<slot>.multivalued=False |
= |
[...] |
UniqueKey |
not Unique([...] ) |
= |
<V> <V> != [...] <V> != None |
Multivalued |
<slot>.multivalued=True |
in |
<Type>&<Value> |
NodeKind |
<slot>.range ∉ m.types |
in |
<Enum>[<PermissibleValue>] |
NodeKind |
<slot>.range ∉ m.enums |
in |
<Class>&<Reference> |
Inlined |
<slot>.range.inlined=True |
in |
<Class>(<Assignments>) |
Referenced |
<slot>.range.inlined=False |
in |
<Class>(<Assignments>) |
NodeKind |
<slot>.range ∉ m.classes |
The condition Unique checks a list of objects for uniqueness.
For each ClassDefinition
in the list, the primary key value is calculated, and this is assumed to be unique.
Deprecation checks
The following checks match multiple different instance definition types and check for usage of deprecated elements.
T | Element | Check | Fail Condition |
---|---|---|---|
in |
DeprecatedSlot |
<slot>.deprecated=True |
|
in |
<Type>&<Value> |
DeprecatedType |
<Type>.deprecated=True |
in |
<Enum>[<PermissibleValue>] |
DeprecatedEnum |
<Enum>.deprecated=True |
in |
<Class>&<Reference> |
DeprecatedClass |
<Class>.deprecated=True |
in |
<Class>(<Assignments>) |
DeprecatedClass |
<Class>.deprecated=True |
Atomic Checks
The following checks only match when i
is an AtomicInstance
T | Element | Check | Fail Condition |
---|---|---|---|
in |
<Type>^<Value> |
Datatype |
Conforms(<Value> , <Type>.uri ) |
in |
<Type>^<Value> |
MaximumValue |
<Value> > <slot>.maximum_value |
in |
<Type>^<Value> |
MinimumValue |
<Value> < <slot>.minimum_value |
in |
<Type>^<Value> |
Pattern |
<Value> !~ <slot>.pattern |
in |
<Class>&<Reference> |
Pattern |
<Reference> !~ <slot>.pattern |
in |
<Type>^<Value> |
EqualsExpression |
<Value> != Eval(<slot>.equals_expression(parent)) |
in |
<Type>^<Value> |
StringSerialization |
<Value> != Stringify(<slot>.string_serialization(parent)) |
Class Checks
The following checks only match when i
is an InstanceOfClass
T | Element | Check | Fail Condition |
---|---|---|---|
in |
<Class>(<Assignments>) |
Abstract |
Class.abstract |
in |
<Class>(<Assignments>) |
Mixin |
Class.mixin |
in |
<Class>(<Assignments>) |
ClassRange |
slot.range ∉ A*(<Class>) |
in |
<Class>(..., <subslot>=<V>, ...) |
ApplicableSlot |
subslot ∉ <Class>.attributes |
in |
<Class>(..., <ts>=<V>, ...) |
DesignatedType |
<V> ∉ A*(<Class>) ts = TypeDesignator(<Class>) |
Enum checks
T | Element | Check | Fail Condition |
---|---|---|---|
in |
<Enum>[<PV>] |
Permissible |
<PV> ∉ <Enum>.permissible_value |
Rules
Uniqueness checks
Boolean combinations of expressions
There are 4 boolean operator metaslots:
- any_of
- exactly_one_of
- none_of
- all_of
These can apply to classes, slots, types, or enums. The range is always a list of operand expressions of the same type.
In all cases, the semantics are as follows:
- any_of: true if there exists a member of the operands that evaluates to true
- for empty lists this is always false
- exactly_one_of: true if there exists a member of the operands that evaluates to true, and all other members evaluate to false
- for empty lists this is always false
- none_of: true if there does not exist a member of the operands that evaluates to true
- for empty lists this is always true
- for lists of length 1 this is the same as the logical NOT operator
- for lists of length 2 this is the same as the logical XOR operator
- all_of: true if there are no members that evaluate to false
- for empty lists this is always true
Rule evaluation
For each rule r
in C.rules:
- if
r.preconditions
isNone
orr.preconditions
is satisfied, then r.postconditions
are applied
Classification Rule evaluation
type designator checks
Validation of TypeDefinitions
For each slot s
in Assignments, if i.<s>
is not None
, and s.range
is in m*.types
,
where i.<s> = *T*( **AtomicValue** )
must match s.range
,
here T.uri
is used to determine the type:
- for xsd floats, doubles, and decimals, AtomicValue must be a decimal- for xsd floats, doubles, and decimals, AtomicValue must be a decimal
- for xsd ints, AtomicValue must be an Integer
- for xsd dates, datetimes, and times, AtomicValue must be a string conforming to the relevant ISO type
- for xsd booleans, AtomicValue must be
True
orFalse