# Validation using Schemas

**Validation** is a procedure that takes as input:

- A LinkML instance
`i`

, where`i`

is to be validated - A LinkML instance
`root`

. This MUST contain`i`

and MAY be the same as`i`

- A schema
*m*

The validation procedure will produce output that can be used to determine if the instance is *structurally and semantically valid* according to the schema.

The formal specification of the validation procedure takes as input a *derived* schema *m ^{D}*:

Actual implementations may choose to perform this composition or work directly on the asserted schema.

The following holds for any validation procedure:

- The output MUST include a boolean slot indicating whether the input can be demonstrated to be false
- The output SHOULD include additional information about the nature of all problems encountered.
- The output SHOULD be conformant with the LinkML validation schema.
- The output MAY also return cases where
*recommendations*are not adhered to - The output MAY also be combined with parsing to yield the precise location (i.e. line numbers) in the source serialization document where problems are found.
- The procedure MAY restrict validation to defined subsets (profiles) of the Metamodel
- The procedure SHOULD return in its payload an indication of which profile and version is used.

## Validation procedure for instances

The validation procedure is to first take the metaclass that is instantiated by the type of the instance `i`

,
and apply one of the 4 checks below, with each check performing its own sub-rules. The ClassDefinition check
is *recursive*, checking each slot-value assignment. This means a check on any instance will always validate the
full instance tree.

## Validation of ClassDefinitions

Given an instance `i`

of a ClassDefinition:

**ClassDefinitionName**( **Assignments** )

Where **Assignments** is a collection of length `N`

, with index `i..N`

and members `slot_i=value_i`

,
and *ClassDefinitionName* is the name of a ClassDefinition in *m ^{D}*, such that

`C`

=m^{D}

`.classes[ClassDefinitionName]`

### Rule: Assignment values must be valid

for each `slot=value`

assignment in **Assignments**, the validation procedure is performed on `value`

, with
`root`

remaining the same

### Rule: ClassDefinition instances must instantiate a class in the schema

*ClassDefinitionName* MUST be the name of a ClassDefinition in *m ^{D}*

`C`

is assigned to be the value of m^{D}`[ClassDefinitionName]`

**Assignments** is assigned to be the value of `C.attributes`

(see the previous section)

`C`

SHOULD have all the following properties:

`C.deprecated`

SHOULD be**None**`C.abstract`

SHOULD NOT be**True**`C.mixin`

SHOULD NOT be**True**

### Rule: identifiers must be unique

We define a procedure **IdVal**(`i`

) which yields the value of `i.<identifier_slot>`

where `identifier_slot`

is the slot n **Assignments** with metaslot assignment `identifier`

=**True**

If there is no such slot then `**IdVal**(`

i`) is None and this check is ignored.

`i`

is invalid if there exists another instance `j`

such that `j`

is reachable from `root`

,
and **IdVal**(`i`

)=**IdVal**(`j`

) and `i`

and `j`

are distinct.

### Rule: All assignments must be to permitted slots

For each `s=value`

assignment in <*Assignment1*>, <*Assignment2*>, ..., <*AssignmentN*>:

`s`

must be in**Assignments**

### Rule: All required slots must be specified

For each slot `s`

in **Assignments**, if `s.required=True`

, then `i.<s>`

must be neither `None`

nor the empty collection `[]`

### Rule: All recommended slots should be specified

For each slot `s`

in **Assignments**, if `s.recommended=True`

, then `i.<s>`

should be neither `None`

nor the empty collection `[]`

If this condition is not met, this is considered a warning rather than invalidity

### Rule: Assigned values must conform to multivalued cardinality

For each slot `s`

in **Assignments**,

- if
`s.multivalued`

is True, then`i.<s>`

must be a collection or None - If
`s.multivalued`

is False, then`i.<s>`

must not be a collection

### Rule: values should be within stated bounds

For each slot `s`

in **Assignments**,

- if
`s.maximum_value`

is not None, then`i.<s>`

must be a number and must be less that or equal to the maximum value - if
`s.minimum_value`

is not None, then`i.<s>`

must be a number and must be greater that or equal to the minimum value

### Rule: values should equal evaluable expression

For each slot `s`

in **Assignments**, if `s.equals_expression`

is not None, then `i.<s>`

must equal
the value of `Eval(s.equals_expression)`

. See section on expression language
for details of syntax.

Note: this rule can be executed in inference mode

### Rule: values should equal string_serialization

For each slot `s`

in **Assignments**, if `s.string_serialization`

is not None, then `i.<s>`

must equal
the value of `Stringify(s.string_serialization)`

. See section on expression language
for details of syntax.

### Rule: values should equal regular expression patterns

Note: this rule can be executed in inference mode

### Range class instantiation check

For each slot `s`

in **Assignments**, if `i.<s>`

is not None, and `s.range`

is in `m*.classes`

,
then `s.range`

must be in `ReflexiveAncestors(Type(i.<s>))`

Additional checks MAY be performed based on whether `s.inlined`

is True

- if
`s.inlined`

, then`i.<s>`

SHOULD NOT be a Reference - if
`s.inlined`

is False, then EITHER:`i.<s>`

SHOULD be a Reference- OR
`i.<s>`

instantiates a class`R`

such that R has no slot`rs`

that is declared to be an identifier. i.e.`rs.identifier = True`

### Boolean combinations of expressions

There are 4 boolean operator metaslots:

- any_of
- exactly_one_of
- none_of
- all_of

These can apply to classes, slots, types, or enums. The range is always a list of operand expressions of the same type.

In all cases, the semantics are as follows:

- any_of: true if there exists a member of the operands that evaluates to true
- for empty lists this is always false
- exactly_one_of: true if there exists a member of the operands that evaluates to true, and all other members evaluate to false
- for empty lists this is always false
- none_of: true if there does not exist a member of the operands that evaluates to true
- for empty lists this is always true
- for lists of length 1 this is the same as the logical NOT operator
- for lists of length 2 this is the same as the logical XOR operator
- all_of: true if there are no members that evaluate to false
- for empty lists this is always true

### range expression checks

For each slot `s`

in **Assignments**, if `i.<s>`

is not None, and `RE = s.range_expression`

is not None, then a check
**CE**(`x`

) is performed on `i.<s>`

### Rule evaluation

For each rule `r`

in *C*.rules:

- if
`r.preconditions`

is None or`r.preconditions`

is satisfied, then `r.postconditions`

are applied

### Classification Rule evaluation

### type designator checks

## Validation of TypeDefinitions

For each slot `s`

in **Assignments**, if `i.<s>`

is not None, and `s.range`

is in `m*.types`

,
where `i.<s> = *T*( **AtomicValue** )`

must match `s.range`

,

here `T.uri`

is used to determine the type:

- for xsd floats, doubles, and decimals, AtomicValue must be a decimal- for xsd floats, doubles, and decimals, AtomicValue must be a decimal
- for xsd ints, AtomicValue must be an Integer
- for xsd dates, datetimes, and times, AtomicValue must be a string conforming to the relevant ISO type
- for xsd booleans, AtomicValue must be True or False

## Validation of EnumDefinitions

For each slot `s`

in **Assignments**, if `i.<s>`

is not None, and `s.range`

is in `m*.enums`

,
then `i.<s>`

must be equal to `pv.text`

for some pv in `m*.enums[s.range]`