Skip to content

ENH: Add support for GIFTI ExternalFileBinary #999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 9, 2021

Conversation

pauldmccarthy
Copy link
Contributor

This PR adds support for loading GIFTI data arrays from external files, using the ExternalFileName attribute of the <DataArray> element.

I had a query from a user who is working with a SPM toolbox which generates GIFTI files where the data arrays are stored in external .dat files, and the <DataArray> element has encoding="ExternalFileBinary". This led me to discover that nibabel does not support GIFTI files which use this feature, presumably because the GIFTI spec is slightly under-specified regarding the expected format of the external file - from section 7.0 of the GIFTI spec:

For external data storage, the Encoding attribute of the DataArray element will have a value of “ExternalFileBinary” indicating that the data is located in an external file and in binary format.

However, the CAT Toolbox simply stores its data as a raw uncompressed binary file, and I think that this is a sensible and reasonable interpretation of the above definition. If this assumption is made, then all of the necessary information (file offset, data type, row-/colum-major ordering, shape) will be available via the other attributes of the <DataArray> element, so the required changes to nibabel are quite trivial.

Happy to add in some unit tests and error checking if you think this is a useful contribution.

@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #999 (8cd83c2) into master (95e1fbe) will increase coverage by 0.01%.
The diff coverage is 93.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #999      +/-   ##
==========================================
+ Coverage   92.20%   92.22%   +0.01%     
==========================================
  Files         100      100              
  Lines       12139    12164      +25     
  Branches     2121     2128       +7     
==========================================
+ Hits        11193    11218      +25     
  Misses        618      618              
  Partials      328      328              
Impacted Files Coverage Δ
nibabel/gifti/parse_gifti_fast.py 86.06% <93.02%> (+1.45%) ⬆️
nibabel/gifti/gifti.py 95.42% <100.00%> (ø)
nibabel/xmlutils.py 86.04% <100.00%> (+0.68%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95e1fbe...8cd83c2. Read the comment docs.

@effigies
Copy link
Member

Yes, sounds good to me. Especially if there are files in the wild, we should support them.

I notice that the file is being opened relative to the cwd, not the gifti, which seems the more likely interpretation.

@pauldmccarthy
Copy link
Contributor Author

Ahh, I hadn't thought of that - the CAT toolbox file which I have specifies the external file as relative to the GIFTI:

   <DataArray  ArrayIndexingOrder="ColumnMajorOrder"
               DataType="NIFTI_TYPE_FLOAT32"
               Dim0="64984"
               Dimensionality="1"
               Encoding="ExternalFileBinary"
               Endian="LittleEndian"
               ExternalFileName="s12.mesh.thickness.resampled_32k.TESTCAT.dat"
               ExternalFileOffset="0"
               Intent="NIFTI_INTENT_NONE">
      <MetaData>
      </MetaData>
      <Data></Data>
   </DataArray>

@effigies
Copy link
Member

That makes sense. Just need to update the code to reflect that. Right now it's an open can with no path adjustment.

@pauldmccarthy
Copy link
Contributor Author

No problems - I'll follow up with that, and some tests as well - might find some more time this evening..

…, rather

than all of its attributes as separate args. Also pass in name of file being
parsed, in case data is to be loaded from an external file
@pauldmccarthy
Copy link
Contributor Author

Will add a bit more testing soon ..

Comment on lines 75 to 78
with open(ext_fname, 'rb') as f:
f.seek(darray.ext_offset)
nbytes = np.prod(darray.dims) * dtype().itemsize
buff = f.read(nbytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would certainly work, but if somebody's gone through the trouble of giving us a binary file for more efficient access, should we consider return a np.memmap or even an ArrayProxy so that the data is only loaded on-demand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point - I had thought of that, but then just went with what is done for other <DataArray> types. I suppose this should be user-selectable as well - I'll add a mmap=True option to GiftiImage.from_filename and related ...

@pauldmccarthy
Copy link
Contributor Author

Ok, I've added mem-map support, and added a few extra test cases - ready for a review!

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks Paul. I'll give it a couple days in case anybody else wants to weigh in.

@effigies effigies merged commit 62aea04 into nipy:master Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants