Part 2: Adding a container object#
In part 1 of this tutorial we created a schema for describing people, and showed how we could use this to validate YAML or JSON files with a single person instance.
In practice our data files will rarely be at the level of a single instance. Instead we might have a file that contains a list of people, or a more complex document that contains a variety of different heterogeneous objects.
Example data file#
Let’s start with a simple data file that contains more than one instance of person. We choose to structure this as a YAML/JSON dictionary, with an “index slot” called
persons: - id: ORCID:1234 full_name: Clark Kent age: 32 phone: 555-555-5555 - id: ORCID:4567 full_name: Lois Lane age: 33
(later on we will see how to express this same thing as a TSV)
Nesting lists of objects#
We can describe this data using the following schema.
id: https://w3id.org/linkml/examples/personinfo name: personinfo prefixes: linkml: https://w3id.org/linkml/ imports: - linkml:types default_range: string classes: Person: attributes: id: full_name: aliases: phone: age: range: integer Container: tree_root: true attributes: persons: multivalued: true inlined_as_list: true range: Person
This time we are modeling age as an integer. We use the construct
range to state that the values of
age must be integers.
We introduce a class called
Container. This doesn’t necessarily
reflect a “real world” entity in our domain, it’s just a convenient
holder for our data.
Right now it is holding instances of
Person but it could hold other kinds of data.
The container has a single attribute/slot called “persons”. This has 3 crucial characteristics:
it is multivalued - i.e. it holds a list
the range is Person - i.e. the expected values in the data should be people
it is inlined - i.e. the values are nested underneath the container
Later on we will explore these in more detail
We can validate this to make sure we got it right:
linkml-validate -s personinfo.yaml data.yaml
This should report no errors.
We can use yUML to visualize the schema. The
gen-yuml command can generate REST URLs that can be fed into
gen-yuml -f yuml personinfo.yaml
https://yuml.me/diagram/nofunky;dir:TB/class/[Container]++- persons 0..*>[Person|id:string %3F;full_name:string %3F;aliases:string %3F;phone:string %3F;age:string %3F],[Container]
Which renders as:
You can also generate a png directly
gen-yuml -f png personinfo.yaml > personinfo.png
Extend the container object to include dataset-level metadata:
descriptionof the dataset
nameof the dataset
Modify the schema to allow multiple aliases
Modify the test dataset to include multiple aliases for Clark Kent: “Superman” and “Man of Steel”
Validate the data
Next we will explore how to add constraints to the schema