-
Notifications
You must be signed in to change notification settings - Fork 3k
feat(bids): Add BIDS dataset loader for neuroimaging data #7886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(bids): Add BIDS dataset loader for neuroimaging data #7886
Conversation
- Remove deprecated `trust_remote_code=True` from tests (not needed for packaged modules) - Fix ruff linting errors (import sorting, trailing newlines) - Apply ruff formatter for consistent code style - Convert set() generators to set comprehensions (C401)
- Update setup.py to include nibabel in BIDS extra - Update docs to clarify nibabel is included - Add nibabel availability check in _info() - Move os import to module level - Update test skipif to check both pybids and nibabel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked over the code and tested this and it looks absolutely fantastic. Also uploaded a dataset to test:
from datasets import load_dataset
ds = load_dataset("TobiasPitters/ds004884-mini")
ex = ds['train'][0]
ex['nifti']or for streaming:
from datasets import load_dataset
ds = load_dataset("TobiasPitters/ds004884-mini", streaming=True)
ex = next(iter(ds['train']))
ex['nifti']Here's how it's visualized:
@neurolabusc FYI
By the way using this branch (and niivue) I created: https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging
| "run": datasets.Value("string"), | ||
| "path": datasets.Value("string"), | ||
| "nifti": datasets.Nifti(), | ||
| "metadata": datasets.Value("string"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be something for another PR but actually having a dict-like object here would be more beneficial here. Not quite sure how we could achieve that, maybe through pyarrow's mapping and union type or having a dedicated feature for BIDSMetadata (or for dictionaries in general?).
Summary
Adds native BIDS (Brain Imaging Data Structure) dataset loading support using PyBIDS, enabling
load_dataset('bids', data_dir='/path/to/bids')workflow for neuroimaging researchers.Contributes to #7804 (Support scientific data formats) - BIDS is a widely-used standard for organizing neuroimaging data built on NIfTI files.
Changes
Core Implementation
src/datasets/packaged_modules/bids/bids.py- GeneratorBasedBuilder implementationsrc/datasets/packaged_modules/bids/__init__.py- Module exportssrc/datasets/packaged_modules/__init__.py- Registration with module registrysrc/datasets/config.py-PYBIDS_AVAILABLEconfig flagsetup.py- Optionalpybids>=0.21.0+ nibabel dependencyFeatures
Documentation & Tests
docs/source/bids_dataset.mdx- User guide with examplestests/packaged_modules/test_bids.py- Unit tests (4 tests)Usage
Test plan
pytest tests/packaged_modules/test_bids.py)make qualitypasses (ruff check)Context
This PR is part of the neuroimaging initiative discussed with @TobiasPitters. Follows the BIDS 1.10.1 specification and leverages the existing Nifti feature for NIfTI file handling.
Related PRs: