Skip to content

Conversation

@The-Obstacle-Is-The-Way
Copy link

@The-Obstacle-Is-The-Way The-Obstacle-Is-The-Way commented Nov 29, 2025

Summary

Adds native BIDS (Brain Imaging Data Structure) dataset loading support using PyBIDS, enabling load_dataset('bids', data_dir='/path/to/bids') workflow for neuroimaging researchers.

Contributes to #7804 (Support scientific data formats) - BIDS is a widely-used standard for organizing neuroimaging data built on NIfTI files.

Changes

Core Implementation

  • src/datasets/packaged_modules/bids/bids.py - GeneratorBasedBuilder implementation
  • src/datasets/packaged_modules/bids/__init__.py - Module exports
  • src/datasets/packaged_modules/__init__.py - Registration with module registry
  • src/datasets/config.py - PYBIDS_AVAILABLE config flag
  • setup.py - Optional pybids>=0.21.0 + nibabel dependency

Features

  • Automatic BIDS structure validation
  • Subject/session/datatype filtering via config
  • JSON sidecar metadata extraction
  • NIfTI file decoding via existing Nifti feature

Documentation & Tests

  • docs/source/bids_dataset.mdx - User guide with examples
  • tests/packaged_modules/test_bids.py - Unit tests (4 tests)

Usage

from datasets import load_dataset

# Load entire BIDS dataset
ds = load_dataset('bids', data_dir='/path/to/bids_dataset')

# Filter by subject/session
ds = load_dataset('bids', 
    data_dir='/path/to/bids_dataset',
    subjects=['01', '02'],
    sessions=['baseline']
)

# Access samples
sample = ds['train'][0]
print(sample['subject'])      # '01'
print(sample['nifti'].shape)  # (176, 256, 256)
print(sample['metadata'])     # JSON sidecar data

Test plan

  • All 4 unit tests pass (pytest tests/packaged_modules/test_bids.py)
  • make quality passes (ruff check)
  • End-to-end tested with real OpenNeuro data (ds000102)

Context

This PR is part of the neuroimaging initiative discussed with @TobiasPitters. Follows the BIDS 1.10.1 specification and leverages the existing Nifti feature for NIfTI file handling.

Related PRs:

- Remove deprecated `trust_remote_code=True` from tests (not needed for packaged modules)
- Fix ruff linting errors (import sorting, trailing newlines)
- Apply ruff formatter for consistent code style
- Convert set() generators to set comprehensions (C401)
- Update setup.py to include nibabel in BIDS extra
- Update docs to clarify nibabel is included
- Add nibabel availability check in _info()
- Move os import to module level
- Update test skipif to check both pybids and nibabel
@The-Obstacle-Is-The-Way The-Obstacle-Is-The-Way deleted the feat/bids-loader branch November 29, 2025 14:56
@The-Obstacle-Is-The-Way The-Obstacle-Is-The-Way restored the feat/bids-loader branch November 29, 2025 14:58
Copy link
Contributor

@CloseChoice CloseChoice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the code and tested this and it looks absolutely fantastic. Also uploaded a dataset to test:

from datasets import load_dataset

ds = load_dataset("TobiasPitters/ds004884-mini")
ex = ds['train'][0]
ex['nifti']

or for streaming:

from datasets import load_dataset

ds = load_dataset("TobiasPitters/ds004884-mini", streaming=True)
ex = next(iter(ds['train']))
ex['nifti']

Here's how it's visualized:

Image

@neurolabusc FYI

By the way using this branch (and niivue) I created: https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging

grafik

"run": datasets.Value("string"),
"path": datasets.Value("string"),
"nifti": datasets.Nifti(),
"metadata": datasets.Value("string"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be something for another PR but actually having a dict-like object here would be more beneficial here. Not quite sure how we could achieve that, maybe through pyarrow's mapping and union type or having a dedicated feature for BIDSMetadata (or for dictionaries in general?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants