SQL Validation¶
Overview¶
The SQL Validation generator creates SQL queries that identify data entries violating LinkML schema constraints. When executed against a database, the query returns a table listing all constraint violations with standardized columns:
table_name: The class/table name where the violation occurred
column_name: The slot/column name that violated the constraint
constraint_type: Type of constraint violated (required, range, pattern, enum, identifier, key, unique_key)
record_id: ID of the violating record
invalid_value: The value that violates the constraint
For example, if a Person record has an age of 200 that exceeds the maximum allowed value, running the generated validation query would return:
table_name |
column_name |
constraint_type |
record_id |
invalid_value |
|---|---|---|---|---|
Person |
age |
range |
P001 |
200 |
Example Output¶
A generated sql query for the personinfo.yml schema with sqlite syntax.
Overview¶
Docs¶
Command Line¶
gen-sqlvalidation¶
Generate SQL validation queries from LinkML schema.
gen-sqlvalidation [OPTIONS] YAMLFILE
Options
- --dialect <dialect>¶
SQL dialect (sqlite, postgresql)
- Default:
'sqlite'
- --check-required, --no-check-required¶
Generate queries for required field violations
- Default:
True
- --check-ranges, --no-check-ranges¶
Generate queries for min/max value violations
- Default:
True
- --check-patterns, --no-check-patterns¶
Generate queries for pattern violations
- Default:
True
- --check-enums, --no-check-enums¶
Generate queries for enum violations
- Default:
True
- --check-unique-keys, --no-check-unique-keys¶
Generate queries for unique key violations
- Default:
True
- --include-comments, --no-include-comments¶
Include description of tests in generated query
- Default:
True
- -V, --version¶
Show the version and exit.
- -f, --format <format>¶
Output format
- Default:
'sql'- Options:
sql
- --metadata, --no-metadata¶
Include metadata in output
- Default:
True
- --useuris, --metauris¶
Use class and slot URIs over model uris
- Default:
True
- -im, --importmap <importmap>¶
Import mapping file
- --log_level <log_level>¶
Logging level
- Default:
'WARNING'- Options:
CRITICAL | ERROR | WARNING | INFO | DEBUG
- -v, --verbose¶
Verbosity. Takes precedence over –log_level.
- --mergeimports, --no-mergeimports¶
Merge imports into source file (default=mergeimports)
- --stacktrace, --no-stacktrace¶
Print a stack trace when an error occurs
- Default:
False
Arguments
- YAMLFILE¶
Required argument
Code¶
- class linkml.generators.sqlvalidationgen.SQLValidationGenerator(schema: str | ~typing.TextIO | ~linkml_runtime.linkml_model.meta.SchemaDefinition | ~linkml.utils.generator.Generator | ~pathlib.Path, schemaview: ~linkml_runtime.utils.schemaview.SchemaView | None = None, format: str | None = None, metadata: bool = True, useuris: bool | None = None, log_level: int | None = 30, mergeimports: bool | None = True, source_file_date: str | None = None, source_file_size: int | None = None, logger: ~logging.Logger | None = None, verbose: bool | None = None, output: str | None = None, namespaces: ~linkml_runtime.utils.namespaces.Namespaces | None = None, directory_output: bool = False, base_dir: str = None, metamodel_name_map: dict[str, str] = None, importmap: str | ~collections.abc.Mapping[str, str] | None = None, emit_prefixes: set[str] = <factory>, metamodel: ~linkml.utils.schemaloader.SchemaLoader = None, stacktrace: bool = False, include: str | ~pathlib.Path | ~linkml_runtime.linkml_model.meta.SchemaDefinition | None = None, dialect: str = 'sqlite', include_comments: bool = True, check_required: bool = True, check_ranges: bool = True, check_patterns: bool = True, check_enums: bool = True, check_unique_keys: bool = True)[source]¶
A
Generatorfor creating SQL validation queries.This generator creates a unified SELECT statement (combining multiple validation checks with UNION ALL) that identifies data entries violating LinkML schema constraints. The query can be executed against a database to find constraint violations.
Supported constraint types: - required: Find NULL values in required fields - minimum_value/maximum_value: Find numeric range violations - pattern: Find regex pattern violations - identifier/key: Find uniqueness violations - unique_keys: Find multi-column uniqueness violations - enum: Find values not in the permissible values list
Example
>>> gen = SQLValidationGenerator("schema.yaml", dialect="postgresql") >>> queries = gen.generate_validation_queries() >>> print(queries)
The generated query returns standardized columns: - table_name: The class/table name - column_name: The slot/column name (or constraint name for unique_keys) - constraint_type: Type of constraint violated - record_id: ID of the violating record - invalid_value: The value that violates the constraint