Skip to content

Enum: DatasetSplitType

Standard dataset split types used in machine learning for training,

validation, and testing. These splits are fundamental to ML model

development and evaluation workflows.

__

URI: valuesets:DatasetSplitType

Permissible Values

Value Meaning Description Aliases Purpose Typical Size
TRAIN Training split used for model learning model training 60-80% of data
VALIDATION Validation split used for hyperparameter tuning and model selection val, dev model tuning 10-20% of data
TEST Test split used for final model evaluation model evaluation 10-20% of data
ALL Complete dataset without splits

Identifier and Mapping Information

Schema Source

  • from schema: https://w3id.org/linkml/valuesets

LinkML Source

name: DatasetSplitType
instantiates:
- valuesets_meta:ValueSetEnumDefinition
description: 'Standard dataset split types used in machine learning for training,

  validation, and testing. These splits are fundamental to ML model

  development and evaluation workflows.

  '
title: Dataset Split Type
from_schema: https://w3id.org/linkml/valuesets
contributors:
- orcid:0000-0002-6601-2165
- https://github.com/anthropics/claude-code
status: DRAFT
rank: 1000
permissible_values:
  TRAIN:
    text: TRAIN
    description: Training split used for model learning
    annotations:
      typical_size:
        tag: typical_size
        value: 60-80% of data
      purpose:
        tag: purpose
        value: model training
  VALIDATION:
    text: VALIDATION
    description: Validation split used for hyperparameter tuning and model selection
    annotations:
      typical_size:
        tag: typical_size
        value: 10-20% of data
      purpose:
        tag: purpose
        value: model tuning
      aliases:
        tag: aliases
        value: val, dev
  TEST:
    text: TEST
    description: Test split used for final model evaluation
    annotations:
      typical_size:
        tag: typical_size
        value: 10-20% of data
      purpose:
        tag: purpose
        value: model evaluation
  ALL:
    text: ALL
    description: Complete dataset without splits