# Semantic-DSL For rapid development of semantically-backed Domain-Specific Languages (DSLs), ## Installation ```bash pip install semdsl ``` ## Usage To illustrate the usage of semdsl, we will create a simple [LinkML](https://linkml.io) schema for part of the [Clue board game](https://en.wikipedia.org/wiki/Cluedo), in particular for representing a hypothesis about who committed the misdeed, where, and with what. We will *annotate* the schema with *grammar hints*, that can be used to generate the grammar for the DSL. ```yaml ```python >>> schema = """ ... id: https://example.org/clue ... name: clue ... imports: ... - https://w3id.org/linkml/types ... classes: ... ClueHypothesis: ... attributes: ... person: # e.g. Colonel Mustard ... annotations: ... grammar.main: "WORD WORD" ... location: # e.g. Kitchen ... annotations: ... grammar.main: "WORD" ... weapon: # e.g. Candlestick ... annotations: ... grammar.main: "WORD" ... annotations: ... grammar.main: >- ... "<" person "in the" location "with the" weapon ">" ... """ ``` The idea is to be able to represent hypotheses using strings like ``. We can then use the `DSLEngine` class to load the schema and generate a grammar: ```python >>> from semdsl import DSLEngine >>> engine = DSLEngine() >>> engine.load_schema(schema) >>> print(engine.lark_serialization) from lark import Lark ... class_clue_hypothesis : "<" person "in the" location "with the" weapon ">" person : WORD WORD location : WORD weapon : WORD ... ``` The default is Lark syntax. You can then use the generated grammar to parse serializations into pydantic objects that are schema conformant: ```python >>> obj = engine.parse_as_object('') >>> print(obj.location) Kitchen >>> print(obj.json()) {"person": "Colonel Mustard", "location": "Kitchen", "weapon": "Candlestick"} ``` ## Auto-assigning production rules In the previous example we saw how we could *annotate* an existing schema with grammar rules However, we can also *generate* grammar rules from the schema itself. This is done by using the `grammar.main` annotation on a class, and then using the `grammar.auto` annotation on the attributes of that class. For example: ```python >>> schema = """ ... id: https://example.org/clue ... name: clue ... imports: ... - https://w3id.org/linkml/types ... classes: ... ClueHypothesis: ... attributes: ... person: # e.g. Colonel Mustard ... location: # e.g. Kitchen ... weapon: # e.g. Candlestick ... """ ``` now we will create a new engine and load the schema, and generate a de-novo "functional-style" grammar: ```python >>> engine = DSLEngine() ## create new DSLEngine >>> engine.load_schema(schema) >>> print(engine.lark_serialization) from lark import Lark ... class_clue_hypothesis : "ClueHypothesis(" slot_clue_hypothesis__person? slot_clue_hypothesis__location? slot_clue_hypothesis__weapon? ")" slot_clue_hypothesis__person : "person=" TYPE_STRING slot_clue_hypothesis__location : "location=" TYPE_STRING slot_clue_hypothesis__weapon : "weapon=" TYPE_STRING ... ``` You can then use the generated grammar to parse strings into objects: ```python >>> obj = engine.parse_as_object('ClueHypothesis(person="Colonel Mustard" location="Kitchen" weapon="Candlestick")') >>> print(obj.location) Kitchen ``` ## Adding additional semantics You can use the following metamodel element: - [class_uri](https://w3id.prg/linkml/class_uri) - [slot_uri](https://w3id.prg/linkml/slot_uri) to assign URIs to classes and slots in your schema, which can be used in RDF serialization. Here we extend our Clue schema, adding classes for the ranges of the slots in the main class: ```python >>> schema = """ ... id: https://example.org/clue ... name: clue ... prefixes: ... linkml: https://w3id.org/linkml/ ... clue: https://example.org/clue/ ... schema: http://schema.org/ ... prov: http://www.w3.org/ns/prov# ... dbpedia: http://dbpedia.org/ontology/ ... imports: ... - linkml:types ... classes: ... NamedThing: ... class_uri: schema:Thing ... attributes: ... id: ... identifier: true ... range: uriorcurie ... Person: ... class_uri: schema:Person ... is_a: NamedThing ... Location: ... class_uri: schema:Location ... is_a: NamedThing ... Weapon: ... class_uri: dbpedia:Weapon ... is_a: NamedThing ... ClueHypothesis: ... class_uri: prov:Action ... tree_root: true ... attributes: ... person: # e.g. Colonel Mustard ... slot_uri: prov:wasAssociatedWith ... range: Person ... annotations: ... grammar.main: TYPE_URIORCURIE ... location: # e.g. Kitchen ... slot_uri: prov:atLocation ... range: Location ... annotations: ... grammar.main: TYPE_URIORCURIE ... weapon: # e.g. Candlestick ... slot_uri: prov:used ... range: Weapon ... annotations: ... grammar.main: TYPE_URIORCURIE ... annotations: ... grammar.main: >- ... "<" person "in the" location "with the" weapon ">" ... """ ``` Now parse and export to a file. This time the input string uses CURIEs to represent the different things in the Clue hypothesis. ```python >>> engine = DSLEngine() >>> engine.load_schema(schema) >>> obj = engine.parse_as_object("< clue:ColonelMustard in the clue:Kitchen with the clue:Candlestick >") >>> import yaml >>> with open("tests/output/clue-output.yaml", "w", encoding="utf-8") as f: ... yaml.dump(obj.dict(), f) ``` From here we can use LinkML to convert to an RDF serialization: ```bash cd clue-output.yaml linkml-convert clue-output -s clue_model.yaml -t ttl ```` Results: ```turtle @prefix clue: . @prefix prov: . [] a prov:Action ; prov:atLocation clue:Kitchen ; prov:used clue:Candlestick ; prov:wasAssociatedWith clue:ColonelMustard . ``` ## Command Line Interface ``` semdsl --help ``` ## Limitations ### Restricted to Lark grammars Currently, semdsl only supports Lark grammars. The framework is designed to allow extensibility, e.g. to ANTLR, but this is currently unsupported.