Skip to content

Enum: DatasetEncodingFormat

Encoding formats (MIME types) commonly used for ML dataset files in Croissant.

These specify how data is serialized and stored.

__

URI: valuesets:DatasetEncodingFormat

Permissible Values

Value Meaning Description
CSV EDAM:format_3752 Comma-separated values format for tabular data
JSON EDAM:format_3464 JavaScript Object Notation format for structured data
JSONL None JSON Lines format (newline-delimited JSON)
PARQUET None Apache Parquet columnar storage format
PLAIN_TEXT None Plain text files
JPEG EDAM:format_3579 JPEG image format
PNG EDAM:format_3603 Portable Network Graphics image format
WAV None Waveform Audio File Format
MP4 EDAM:format_3997 MPEG-4 multimedia container format
ZIP EDAM:format_3987 ZIP archive format
TAR EDAM:format_3981 Tape Archive format

Identifier and Mapping Information

Schema Source

  • from schema: https://w3id.org/linkml/valuesets

LinkML Source

name: DatasetEncodingFormat
description: 'Encoding formats (MIME types) commonly used for ML dataset files in
  Croissant.

  These specify how data is serialized and stored.

  '
from_schema: https://w3id.org/linkml/valuesets
rank: 1000
permissible_values:
  CSV:
    text: CSV
    description: Comma-separated values format for tabular data
    meaning: EDAM:format_3752
    annotations:
      mime_type:
        tag: mime_type
        value: text/csv
  JSON:
    text: JSON
    description: JavaScript Object Notation format for structured data
    meaning: EDAM:format_3464
    annotations:
      mime_type:
        tag: mime_type
        value: application/json
  JSONL:
    text: JSONL
    description: JSON Lines format (newline-delimited JSON)
    annotations:
      mime_type:
        tag: mime_type
        value: application/jsonl
      alt_mime_type:
        tag: alt_mime_type
        value: application/x-ndjson
  PARQUET:
    text: PARQUET
    description: Apache Parquet columnar storage format
    annotations:
      mime_type:
        tag: mime_type
        value: application/parquet
  PLAIN_TEXT:
    text: PLAIN_TEXT
    description: Plain text files
    annotations:
      mime_type:
        tag: mime_type
        value: text/plain
  JPEG:
    text: JPEG
    description: JPEG image format
    meaning: EDAM:format_3579
    annotations:
      mime_type:
        tag: mime_type
        value: image/jpeg
    title: JPG
  PNG:
    text: PNG
    description: Portable Network Graphics image format
    meaning: EDAM:format_3603
    annotations:
      mime_type:
        tag: mime_type
        value: image/png
    title: PNG
  WAV:
    text: WAV
    description: Waveform Audio File Format
    annotations:
      mime_type:
        tag: mime_type
        value: audio/wav
  MP4:
    text: MP4
    description: MPEG-4 multimedia container format
    meaning: EDAM:format_3997
    annotations:
      mime_type:
        tag: mime_type
        value: video/mp4
    title: MPEG-4
  ZIP:
    text: ZIP
    description: ZIP archive format
    meaning: EDAM:format_3987
    annotations:
      mime_type:
        tag: mime_type
        value: application/zip
    title: ZIP format
  TAR:
    text: TAR
    description: Tape Archive format
    meaning: EDAM:format_3981
    annotations:
      mime_type:
        tag: mime_type
        value: application/x-tar
    title: TAR format