SQL Validation

Overview

The SQL Validation generator creates SQL queries that identify data entries violating LinkML schema constraints. When executed against a database, the query returns a table listing all constraint violations with standardized columns:

  • table_name: The class/table name where the violation occurred

  • column_name: The slot/column name that violated the constraint

  • constraint_type: Type of constraint violated (required, range, pattern, enum, identifier, key, unique_key)

  • record_id: ID of the violating record

  • invalid_value: The value that violates the constraint

For example, if a Person record has an age of 200 that exceeds the maximum allowed value, running the generated validation query would return:

table_name

column_name

constraint_type

record_id

invalid_value

Person

age

range

P001

200

Example Output

A generated sql query for the personinfo.yml schema with sqlite syntax.

personinfo.sql

Overview

Docs

Command Line

gen-sqlvalidation

Generate SQL validation queries from LinkML schema.

gen-sqlvalidation [OPTIONS] YAMLFILE

Options

--dialect <dialect>

SQL dialect (sqlite, postgresql)

Default:

'sqlite'

--check-required, --no-check-required

Generate queries for required field violations

Default:

True

--check-ranges, --no-check-ranges

Generate queries for min/max value violations

Default:

True

--check-patterns, --no-check-patterns

Generate queries for pattern violations

Default:

True

--check-enums, --no-check-enums

Generate queries for enum violations

Default:

True

--check-unique-keys, --no-check-unique-keys

Generate queries for unique key violations

Default:

True

--include-comments, --no-include-comments

Include description of tests in generated query

Default:

True

-V, --version

Show the version and exit.

-f, --format <format>

Output format

Default:

'sql'

Options:

sql

--metadata, --no-metadata

Include metadata in output

Default:

True

--useuris, --metauris

Use class and slot URIs over model uris

Default:

True

-im, --importmap <importmap>

Import mapping file

--log_level <log_level>

Logging level

Default:

'WARNING'

Options:

CRITICAL | ERROR | WARNING | INFO | DEBUG

-v, --verbose

Verbosity. Takes precedence over –log_level.

--mergeimports, --no-mergeimports

Merge imports into source file (default=mergeimports)

--stacktrace, --no-stacktrace

Print a stack trace when an error occurs

Default:

False

Arguments

YAMLFILE

Required argument

Code

class linkml.generators.sqlvalidationgen.SQLValidationGenerator(schema: str | ~typing.TextIO | ~linkml_runtime.linkml_model.meta.SchemaDefinition | ~linkml.utils.generator.Generator | ~pathlib.Path, schemaview: ~linkml_runtime.utils.schemaview.SchemaView | None = None, format: str | None = None, metadata: bool = True, useuris: bool | None = None, log_level: int | None = 30, mergeimports: bool | None = True, source_file_date: str | None = None, source_file_size: int | None = None, logger: ~logging.Logger | None = None, verbose: bool | None = None, output: str | None = None, namespaces: ~linkml_runtime.utils.namespaces.Namespaces | None = None, directory_output: bool = False, base_dir: str = None, metamodel_name_map: dict[str, str] = None, importmap: str | ~collections.abc.Mapping[str, str] | None = None, emit_prefixes: set[str] = <factory>, metamodel: ~linkml.utils.schemaloader.SchemaLoader = None, stacktrace: bool = False, include: str | ~pathlib.Path | ~linkml_runtime.linkml_model.meta.SchemaDefinition | None = None, dialect: str = 'sqlite', include_comments: bool = True, check_required: bool = True, check_ranges: bool = True, check_patterns: bool = True, check_enums: bool = True, check_unique_keys: bool = True)[source]

A Generator for creating SQL validation queries.

This generator creates a unified SELECT statement (combining multiple validation checks with UNION ALL) that identifies data entries violating LinkML schema constraints. The query can be executed against a database to find constraint violations.

Supported constraint types: - required: Find NULL values in required fields - minimum_value/maximum_value: Find numeric range violations - pattern: Find regex pattern violations - identifier/key: Find uniqueness violations - unique_keys: Find multi-column uniqueness violations - enum: Find values not in the permissible values list

Example

>>> gen = SQLValidationGenerator("schema.yaml", dialect="postgresql")
>>> queries = gen.generate_validation_queries()
>>> print(queries)

The generated query returns standardized columns: - table_name: The class/table name - column_name: The slot/column name (or constraint name for unique_keys) - constraint_type: Type of constraint violated - record_id: ID of the violating record - invalid_value: The value that violates the constraint

serialize(**kwargs: dict[str, Any]) str[source]

Main entry point for generating validation queries.

Parameters:

kwargs – Additional arguments passed to generate_validation_queries

Returns:

SQL validation queries as a string

Return type:

str