Prefer Binary Enums Over Booleans
When modeling data that has two possible states, it may be tempting to use a boolean type (true/false). However, in many cases, a two-element enumeration (binary enum) is the better choice. This document explains why and when to prefer binary enums over booleans in your LinkML schemas.
The Case for Binary Enums
The Tidy Design Principles from the tidyverse project articulate several compelling reasons to prefer enums even when there are only two choices.
1. Extensibility
If you later discover a third (or fourth, or fifth) option, you'll need to change the interface. With an enum, adding new values is straightforward. With a boolean, you face a breaking change.
Example: Consider a data submission status. You might initially think "submitted" or "not submitted" covers it:
# Boolean approach - seems simple at first
slots:
is_submitted:
range: boolean
But what about "pending review", "rejected", or "withdrawn"? With a boolean, you're stuck. With an enum, you simply add new values:
# Enum approach - extensible
enums:
SubmissionStatus:
permissible_values:
SUBMITTED:
NOT_SUBMITTED:
PENDING_REVIEW: # Easy to add later
REJECTED: # Easy to add later
2. Clarity of Intent
Boolean values often have asymmetric clarity. something = TRUE tells you what will happen, but something = FALSE only tells you what won't happen, not what will happen instead.
Example from tidyverse: The sort() function uses decreasing = TRUE/FALSE. Reading decreasing = FALSE leaves ambiguity:
- Does it mean "sort in increasing order"?
- Or does it mean "don't sort at all"?
Compare this with vctrs::vec_sort() which uses direction = "asc" or direction = "desc". Both options are explicit and self-documenting.
3. Avoiding Cryptic Negations
Boolean parameters often require mental gymnastics to interpret, especially with negated names.
Example from tidyverse: The cut() function has a right parameter:
- right = TRUE: right-closed, left-open intervals (a, b]
- right = FALSE: right-open, left-closed intervals [a, b)
A clearer design would be open_side = c("right", "left") or bounds = c("[)", "(]").
4. Self-Documenting Code
Enums make data and code more readable without needing to consult documentation.
# What does this mean? Need to check docs.
sample:
is_control: false
# Self-explanatory
sample:
sample_type: EXPERIMENTAL
5. The "Name the Scale" Pattern
When converting booleans to enums, consider naming the scale with values that represent points on it. This signals that intermediate values could be added.
Example: Instead of verbose = TRUE/FALSE, use:
enums:
VerbosityLevel:
permissible_values:
NONE:
description: No output
MINIMAL:
description: Errors only
NORMAL:
description: Standard output
VERBOSE:
description: Detailed output
DEBUG:
description: All available information
When Booleans Are Acceptable
Booleans remain appropriate in certain cases:
-
Truly binary states: The states are fundamentally and permanently binary (e.g., physical properties like "alive/dead" in certain contexts)
-
Well-named parameters: The parameter name makes both states crystal clear (e.g.,
include_headerwherefalseclearly means "exclude header") -
Toggle operations: When the operation is clearly about enabling/disabling something (
enabled = true/false)
LinkML Examples
Binary Enum Pattern
enums:
SortDirection:
permissible_values:
ASCENDING:
description: Sort from lowest to highest
meaning: SIO:001395 # ascending order
DESCENDING:
description: Sort from highest to lowest
meaning: SIO:001396 # descending order
StrandOrientation:
permissible_values:
FORWARD:
description: Forward/plus strand
meaning: SO:0000853 # forward_strand
REVERSE:
description: Reverse/minus strand
meaning: SO:0000854 # reverse_strand
PresenceStatus:
permissible_values:
PRESENT:
description: The entity is present
ABSENT:
description: The entity is absent
NOT_DETERMINED:
description: Presence could not be determined
Applying to Slots
slots:
sort_direction:
range: SortDirection
description: Direction for sorting results
strand:
range: StrandOrientation
description: DNA strand orientation
presence:
range: PresenceStatus
description: Whether the feature was detected
Summary
| Aspect | Boolean | Binary Enum |
|---|---|---|
| Extensibility | Poor - breaking change to add states | Good - add new values easily |
| Clarity | Often asymmetric | Both values explicit |
| Documentation | Requires external docs | Self-documenting |
| Ontology mapping | Not possible | Supports meaning annotations |
| Future-proofing | Risky | Safe |
When in doubt, prefer a two-element enum. The small additional effort pays dividends in clarity, maintainability, and extensibility.