Part 3: Adding constraints and performing validation#
Now we will add richer information to our schema, including:
adding ranges for fields such as age
using pattern to force a field to conform to a regular expression
idslot to be an identifier
full_nameslot to be required
adding textual descriptions of schema elements
id: https://w3id.org/linkml/examples/personinfo name: personinfo prefixes: linkml: https://w3id.org/linkml/ imports: - linkml:types default_range: string classes: Person: attributes: id: identifier: true ## unique key for a person full_name: required: true ## must be supplied description: name of the person aliases: multivalued: true ## range is a list description: other names for the person phone: pattern: "^[\\d\\(\\)\\-]+$" ## regular expression age: range: integer ## an int between 0 and 200 minimum_value: 0 maximum_value: 200 Container: attributes: persons: multivalued: true inlined_as_list: true range: Person
We use yaml comment syntax (i.e the part after
#) for comments - these are ignored by the parser.
Note that we haven’t declared ranges for some fields, but the default_range directive at the schema level ensures things default to string.
Let’s deliberately introduce some bad data to make sure our validator is working:
persons: - id: ORCID:1234 full_name: Clark Kent age: 90 phone: 1-800-kryptonite - id: ORCID:5678 age: 33
Running the following command:
linkml-validate -s personinfo.yaml bad-data.yaml
Will result in:
[ERROR] [bad-data.yaml/0] '1-800-kryptonite' does not match '^[\\d\\(\\)\\-]+$' in /persons/0/phone [ERROR] [bad-data.yaml/0] 'full_name' is a required property in /persons/1
This indicates there are two issues with our data. The first says that the phone number of the first entry in the persons list (
/persons/0/phone) doesn’t conform to the regular expression syntax we stated. The second says that we are missing the required
full_name slot on the second entry in the person list (
Let’s fix the second issue.
persons: - id: ORCID:1234 full_name: Clark Kent age: 90 phone: 1-800-kryptonite - id: ORCID:5678 full_name: Lois Lane age: 33
linkml-validate -s personinfo.yaml better-data.yaml
Will result in:
[ERROR] [better-data.yaml/0] '1-800-kryptonite' does not match '^[\\d\\(\\)\\-]+$' in /persons/0/phone
We have successfully fixed one of the issues with the data!
See if you can iterate on the example data to get something that validates.
Using the JSON Schema directly#
linkml-validate command is a wrapper than can be used for an
open-ended number of validator implementations. The current default is
to use a JSON Schema validator. This involves converting LinkML to
JSON-Schema - note that there are some features of LinkML not
supported by JSON-Schema, so the current validator is not guaranteed
to be complete.
If you prefer you can use your own JSON Schema validator. First compile to jsonschema:
gen-json-schema personinfo.yaml > personinfo.schema.json
You can then use the
jsonschema command that comes with the python library (any jsonschema validator will do here)
jsonschema -i bad-data.json personinfo.schema.json
In general this should give you similar results, with some caveats:
bad-data.yamlcan be converted to
linkml-validatorwill first perform an internal conversion prior to using the jsonschema validator, and some errors may be caught at that stage
the conversion process may mask some errors - e.g. if a slot has range integer and is supplied as a string, implicit conversion is used
See the JSON-Schema generator docs for more info on JSON-Schema validation
Other validation strategies#
Other strategies include
converting data to a relational database and doing performant evaluation in SQL
converting data to RDF and using either Shape validators or SPARQL queries
The next section deals with working with RDF data.