dm_bip.map_data package

Submodules

dm_bip.map_data.map_data module

Module for transforming data using LinkML-Map schemas and specifications.

class dm_bip.map_data.map_data.DataLoader(base_path)

Bases: object

Load TSV files based on populated_from identifiers.

dm_bip.map_data.map_data.get_schema(schema_path)

Load and return a LinkML schema from the given path.

Return type:

SchemaDefinition

dm_bip.map_data.map_data.get_spec_files(directory, search_string)

Find YAML files in the directory that contain the search_string.

Returns a sorted list of matching file paths.

Return type:

list[Path]

dm_bip.map_data.map_data.main(source_schema, target_schema, data_dir, var_dir, output_dir, output_prefix, output_postfix, output_type='jsonl', chunk_size=1000)

Run LinkML-Map transformation from command line arguments.

dm_bip.map_data.map_data.multi_spec_transform(data_loader, spec_files, source_schemaview, target_schemaview)

Apply multiple LinkML-Map specifications to data and yield transformed objects.

Return type:

Generator[dict[str, Any], None, None]

dm_bip.map_data.map_data.process_entities(*, entities, data_loader, var_dir, source_schemaview, target_schemaview, output_dir, output_prefix, output_postfix, output_type, chunk_size=1000)

Process each entity and write to output files.

Return type:

None

dm_bip.map_data.streams module

Data structures for creating streamed output from chunked input data.

class dm_bip.map_data.streams.JSONLStream

Bases: Stream

Convert chunks of objects to JSONL format.

process(chunks)

Serialize the output.

Return type:

Generator[str, None, None]

class dm_bip.map_data.streams.JSONStream(key_name)

Bases: Stream

Convert chunks of objects to JSON format.

process(chunks)

Serialize the output.

Return type:

Generator[str, None, None]

class dm_bip.map_data.streams.Stream

Bases: ABC

Serialize chunks into an output format.

abstractmethod process(chunks)

Process a series of chunks.

Return type:

Generator[str, None, None]

class dm_bip.map_data.streams.TSVStream(sep, reducer_str)

Bases: Stream

Convert chunks of objects to TSV format.

process(chunks)

Serialize the output.

Return type:

Generator[str, None, None]

static rewrite_header_and_pad(chunks, new_headers, sep='\\t')

Rewrite the header of TSV chunks and pad rows with missing columns.

Return type:

Generator[str, None, None]

class dm_bip.map_data.streams.YAMLStream(key_name)

Bases: Stream

Convert chunks of objects to YAML format.

process(chunks)

Serialize the output.

Return type:

Generator[str, None, None]

dm_bip.map_data.streams.make_stream(fmt, **kwargs)

Create a data stream.

Note: kwargs are validated at runtime, not enforced at the callsite. This was on purpose, to make construction at the callsite more ergonomic. In the future, we may want to ditch this function, and just enforce type safety at the callsite.

Return type:

Stream

Module contents

Map data module for LinkML-Map transformations.