Enum: DatasetEncodingFormat
Encoding formats (MIME types) commonly used for ML dataset files in Croissant.
These specify how data is serialized and stored.
__
URI: valuesets:DatasetEncodingFormat
Permissible Values
| Value | Title | Meaning | Description | Alt Mime Type | Mime Type |
|---|---|---|---|---|---|
| CSV | EDAM:format_3752 | Comma-separated values format for tabular data | text/csv | ||
| JSON | EDAM:format_3464 | JavaScript Object Notation format for structured data | application/json | ||
| JSONL | JSON Lines format (newline-delimited JSON) | application/x-ndjson | application/jsonl | ||
| PARQUET | Apache Parquet columnar storage format | application/parquet | |||
| PLAIN_TEXT | Plain text files | text/plain | |||
| JPEG | JPG | EDAM:format_3579 | JPEG image format | image/jpeg | |
| PNG | PNG | EDAM:format_3603 | Portable Network Graphics image format | image/png | |
| WAV | Waveform Audio File Format | audio/wav | |||
| MP4 | MPEG-4 | EDAM:format_3997 | MPEG-4 multimedia container format | video/mp4 | |
| ZIP | ZIP format | EDAM:format_3987 | ZIP archive format | application/zip | |
| TAR | TAR format | EDAM:format_3981 | Tape Archive format | application/x-tar |
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/linkml/valuesets
LinkML Source
name: DatasetEncodingFormat
instantiates:
- valuesets_meta:ValueSetEnumDefinition
description: 'Encoding formats (MIME types) commonly used for ML dataset files in
Croissant.
These specify how data is serialized and stored.
'
title: Dataset Encoding Format
from_schema: https://w3id.org/linkml/valuesets
contributors:
- orcid:0000-0002-6601-2165
- https://github.com/anthropics/claude-code
status: DRAFT
rank: 1000
permissible_values:
CSV:
text: CSV
description: Comma-separated values format for tabular data
meaning: EDAM:format_3752
annotations:
mime_type:
tag: mime_type
value: text/csv
JSON:
text: JSON
description: JavaScript Object Notation format for structured data
meaning: EDAM:format_3464
annotations:
mime_type:
tag: mime_type
value: application/json
JSONL:
text: JSONL
description: JSON Lines format (newline-delimited JSON)
annotations:
mime_type:
tag: mime_type
value: application/jsonl
alt_mime_type:
tag: alt_mime_type
value: application/x-ndjson
PARQUET:
text: PARQUET
description: Apache Parquet columnar storage format
annotations:
mime_type:
tag: mime_type
value: application/parquet
PLAIN_TEXT:
text: PLAIN_TEXT
description: Plain text files
annotations:
mime_type:
tag: mime_type
value: text/plain
JPEG:
text: JPEG
description: JPEG image format
meaning: EDAM:format_3579
annotations:
mime_type:
tag: mime_type
value: image/jpeg
title: JPG
PNG:
text: PNG
description: Portable Network Graphics image format
meaning: EDAM:format_3603
annotations:
mime_type:
tag: mime_type
value: image/png
title: PNG
WAV:
text: WAV
description: Waveform Audio File Format
annotations:
mime_type:
tag: mime_type
value: audio/wav
MP4:
text: MP4
description: MPEG-4 multimedia container format
meaning: EDAM:format_3997
annotations:
mime_type:
tag: mime_type
value: video/mp4
title: MPEG-4
ZIP:
text: ZIP
description: ZIP archive format
meaning: EDAM:format_3987
annotations:
mime_type:
tag: mime_type
value: application/zip
title: ZIP format
TAR:
text: TAR
description: Tape Archive format
meaning: EDAM:format_3981
annotations:
mime_type:
tag: mime_type
value: application/x-tar
title: TAR format