Part 2: Adding a container object#
In part 1 of this tutorial we created a schema for describing people, and showed how we could use this to validate YAML or JSON files with a single person instance.
In practice our data files will rarely be at the level of a single instance. Instead we might have a file that contains a list of people, or a more complex document that contains a variety of different heterogeneous objects.
Example data file#
Let’s start with a simple data file that contains more than one instance of person. We choose to structure this as a YAML/JSON dictionary, with an “index slot” called persons
:
data.yaml:
persons:
- id: ORCID:1234
full_name: Clark Kent
age: 32
phone: 555-555-5555
- id: ORCID:4567
full_name: Lois Lane
age: 33
(later on we will see how to express this same thing as a TSV)
Nesting lists of objects#
We can describe this data using the following schema.
personinfo.yaml:
id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
linkml: https://w3id.org/linkml/
imports:
- linkml:types
default_range: string
classes:
Person:
attributes:
id:
full_name:
aliases:
phone:
age:
range: integer
Container:
tree_root: true
attributes:
persons:
multivalued: true
inlined_as_list: true
range: Person
This time we are modeling age as an integer. We use the construct
range
to state that the values of age
must be integers.
We introduce a class called Container
. This doesn’t necessarily
reflect a “real world” entity in our domain, it’s just a convenient
holder for our data.
Right now it is holding instances of Person
but it could hold other kinds of data.
The container has a single attribute/slot called “persons”. This has 3 crucial characteristics:
it is multivalued - i.e. it holds a list
the range is Person - i.e. the expected values in the data should be people
it is inlined - i.e. the values are nested underneath the container
Later on we will explore these in more detail
Validating#
We can validate this to make sure we got it right:
linkml-validate -s personinfo.yaml data.yaml
This should report no errors.
Visualizing#
We can use yUML to visualize the schema. The gen-yuml
command can generate REST URLs that can be fed into
gen-yuml -f yuml personinfo.yaml
Outputs:
https://yuml.me/diagram/nofunky;dir:TB/class/[Container]++- persons 0..*>[Person|id:string %3F;full_name:string %3F;aliases:string %3F;phone:string %3F;age:string %3F],[Container]
Which renders as:
You can also generate a png directly
gen-yuml -f png personinfo.yaml > personinfo.png
Exercises#
Extend the container object to include dataset-level metadata:
description
of the datasetname
of the dataset
Modify the schema to allow multiple aliases
Modify the test dataset to include multiple aliases for Clark Kent: “Superman” and “Man of Steel”
Validate the data
Further reading#
Metamodel Specification
multivalued slot
range slot
Next#
Next we will explore how to add constraints to the schema