Part 3: Adding constraints and performing validation

Now we will add richer information to our schema, including:

  • adding ranges for fields such as age

  • using pattern to force a field to conform to a regular expression

  • declaring the id slot to be an identifier

  • declaring the full_name slot to be required

  • adding textual descriptions of schema elements

Example schema

personinfo.yaml:

id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
imports:
  - linkml:types
default_range: string
  
classes:
  Person:
    attributes:
      id:
        identifier: true     ## unique key for a person
      full_name:
        required: true       ## must be supplied
        description:
          name of the person
      aliases:
        multivalued: true    ## range is a list
        description:
          other names for the person
      phone:
        pattern: "^[\\d\\(\\)\\-]+$"   ## regular expression
      age:
        range: integer       ## an int between 0 and 200
        minimum_value: 0
        maximum_value: 200
  Container:
    attributes:
      persons:
        multivalued: true
        inlined_as_list: true
        range: Person

We use yaml comment syntax (i.e the part after #) for comments - these are ignored by the parser.

Depicted as:

img

Note that we haven’t declared ranges for some fields, but the default_range directive at the schema level ensures things default to string.

Example data

Let’s deliberately introduce some bad data to make sure our validator is working:

bad-data.yaml:

persons:
  - id: ORCID:1234
    full_name: Clark Kent
    age: 90
    phone: 1-800-kryptonite
  - id: ORCID:1234
    age: 33

Running the following command:

linkml-validate -s personinfo.yaml bad-data.yaml

Will result in:

ValueError: full_name must be supplied

better-data-v1.yaml:

persons:
  - id: ORCID:1234
    full_name: Clark Kent
    age: 90
    phone: 1-800-kryptonite
  - id: ORCID:1234
    full_name: Lois Lane
    age: 33
linkml-validate -s personinfo.yaml better-data-v1.yaml

Will result in:

ValueError: File "<file>", line 6, col 9: ORCID:1234: duplicate key

Let’s fix that:

better-data-v2.yaml:

persons:
  - id: ORCID:1234
    full_name: Clark Kent
    age: 90
    phone: 1-800-kryptonite
  - id: ORCID:4567
    full_name: Lois Lane
    age: 33
linkml-validate -s personinfo.yaml better-data-v2.yaml

Will result in:

{'pattern': '^[\\d\\(\\)\\-]+$', 'type': 'string'}

On instance['persons'][0]['phone']:
    '1-800-kryptonite'

This reflexts that the phone number doesn’t conform to the regular expression syntax we stated.

Exercises

  1. See if you can iterate on the example file to get something that works.

Using the JSON Schema directly

The linkml-validate command is a wrapper than can be used for an open-ended number of validator implementations. The current default is to use a JSON Schema validator. This involves converting LinkML to JSON-Schema - note that there are some features of LinkML not supported by JSON-Schema, so the current validator is not guaranteed to be complete.

If you prefer you can use your own JSON Schema validator. First compile to jsonschema:

gen-json-schema personinfo.yaml > personinfo.schema.json

You can then use the jsonschema command that comes with the python library (any jsonschema validator will do here)

jsonschema -i bad-data.json personinfo.schema.json

In general this should give you similar results, with some caveats:

  • the bad-data.yaml can be converted to bad-data.json using https://www.json2yaml.com/.

  • the linkml-validator will first perform an internal conversion prior to using the jsonschema validator, and some errors may be caught at that stage

  • the conversion process may mask some errors - e.g. if a slot has range integer and is supplied as a string, implicit conversion is used

See the JSON-Schema generator docs for more info on JSON-Schema validation

Other validation strategies

Other strategies include

  • converting data to a relational database and doing performant evaluation in SQL

  • converting data to RDF and using either Shape validators or SPARQL queries

The next section deals with working with RDF data.

Further reading