FIX: Disable direct creation of non-conformant GiftiDataArrays #1199

effigies · 2023-02-17T16:05:14Z

GIFTI only defines three valid datatypes. Nibabel has generally allowed users to use any defined NIFTI_TYPE. In limiting this scope, there's a balance to strike between:

Making it easy/default to generate spec-conforming GIFTI files
Permitting loading of non-conforming GIFTI files (including some previously generated with nibabel)
Respecting the user's intent when made explicit, even when it violates the spec

(1) is the primary goal here. For (2) making ourselves intentionally unable to read a file seems like a very bad idea.

(3) seems like an open question. We do not uniformly allow users to violate standards, such as allowing fixed-length strings as the voxel data in NIfTI, even though numpy would permit it. So it could be fine to simply refuse to write files with invalid DataArray types. On the other hand, you could imagine a pipeline of functions that take in a GiftiImage and return a GiftiImage to ensure metadata follows data, and it's only at write where you would really want to enforce a precision reduction. (It would in any event be very difficult to entirely prevent the construction in memory of non-conformant images, due to Python's object model.)

My initial thought was that we could auto-convert to the nearest dtype, but that could silently introduce unexpected results for users, so @matthew-brett suggested raising errors. This is a fairly simple API change.

Because the objects are constructed piecemeal by the parser, this does not prevent loading files. We might still want to find a way to warn on load.

A couple options for making it harder to write a non-conforming GIFTI:

Raise error during GiftiImage.to_xml() if any array dtype is not conforming. This could be overridden with a keyword argument force.
Raise warning during GiftiImage.__init__() if any data arrays have non-conforming dtypes, or if they are added through API mechanisms (GiftiImage.add_gifti_data_array()).

Closes #1198.

codecov · 2023-02-17T16:10:57Z

Codecov Report

Patch coverage: 93.93% and project coverage change: -0.02 ⚠️

Comparison is base (b8b57cc) 92.15% compared to head (cd1a39a) 92.14%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1199      +/-   ##
==========================================
- Coverage   92.15%   92.14%   -0.02%     
==========================================
  Files          97       97              
  Lines       12334    12360      +26     
  Branches     2534     2544      +10     
==========================================
+ Hits        11367    11389      +22     
- Misses        645      648       +3     
- Partials      322      323       +1

Impacted Files	Coverage Δ
nibabel/gifti/gifti.py	`93.26% <93.93%> (-0.08%)`	⬇️
nibabel/viewers.py	`95.49% <0.00%> (-0.65%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

neurolabusc · 2023-02-17T17:05:46Z

The primary strength of GIfTI is that it is the standard interchange format for our community, and it was designed from the outset to support domain-specific needs such as describing the anatomical structures, labeling, and spatial transforms. However, GIfTI has inherently poor speed and file size.

My sense is that datatypes of GIfTI were carefully chosen to match the needs of our community (e.g. SNR of our data is low) and the capabilities of our hardware (APIs and GPUs limit of 2 billion vertices and vertex precision limited to float32). While a user might intentionally request extreme datatypes believing that more is better, I suspect that they will experience a lot of unintended consequences.

I appreciate @satra's ability to see the big picture, while I tend to focus on implementation details. Perhaps he can share his thoughts.

effigies · 2023-02-18T13:52:03Z

Yes, GIFTI is an interchange format, but the nibabel objects are not very heavy. So for internal use, it's not unreasonable for someone to create a float64 GiftiDataArray or even GiftiImage and pass it from function to function, and only want to cast down to float32 at serialization time.

I can also see a use for wanting to inspect the object without conversion, possibly through a schema-agnostic XML viewer.

So I would suggest something like the following on GiftiImage:

class GiftiImage:
    def to_xml(self, enc='utf-8', *, mode='strict'):  # Called by to_filename()
        if mode == 'strict':
            if any(arr.datatype not in ('uint8', 'int32', 'float32')):
                raise ValueError(
                    'GiftiImage contains data arrays with invalid data types; '
                    'use mode="compat" to automatically cast to conforming types'
                )
        elif mode == 'compat':
            gii = copy(self)
            # Convert unsupported float/int types to float32 or int32 if possible
            return gii.to_xml(enc=enc, mode='strict')
        elif mode != 'force':
            raise TypeError(f'Unknown mode {mode}')
        ...

Combined with explicit datatypes, building out a non-conformant image by hand would require two specific overrides:

import numpy as np
import nibabel as nb

darr = nb.gifti.GiftiDataArray(np.zeros((5,), dtype=np.float64), datatype='NIFTI_TYPE_FLOAT64')
gii = nb.GiftiImage([darr])
gii.to_filename('64bit.gii', mode='force')

This would allow someone to create GiftiDataArrays internally and pass them around freely without premature casts. If they attempt to write them to disk without casting, they will get an error unless compatible, so we avoid silent changes in behavior (@matthew-brett's concern). They can then choose whether to coerce to be conformant with the specification or to force writing to disk.

If the force write use case is still uncompelling and two overrides are insufficient friction, I would be okay with dropping it. For loading 64-bit GIFTIs in another nibabel instance, pickling will be much faster:

In [3]: img = nb.load('/data/openneuro/ds002790/.git/annex/objects/2j/3z/MD5E-s3485910--1942b389e2cc
   ...: f637934bf1d62c6f20c2.surf.gii/MD5E-s3485910--1942b389e2ccf637934bf1d62c6f20c2.surf.gii')

In [4]: from pickle import dumps, loads

In [5]: %timeit loads(dumps(img))
2.27 ms ± 28.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit img.from_bytes(img.to_bytes())
308 ms ± 507 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

And of course, you can still inspect a data array by hand without putting it in a <GIFTI Version="1.0"></GIFTI> wrapper.

In [7]: _ = img.darrays[0].to_xml()

effigies · 2023-02-19T22:11:14Z

Went ahead and pushed a proposal since I wanted to test it. Not intended to preempt discussion; happy to consider alternatives.

alexisthual · 2023-02-24T13:12:20Z

I went through the discussion and I agree with your views @effigies 🙂 (namely, making it possible for users to write non-standard gifti files at the price of setting mode="force" and allow users to load non-standard gifti files, in which case we display a warning and a way to make the file standard).

effigies · 2023-02-27T15:36:08Z

Thanks for the comments, @alexisthual. Happy to see further discussion or a code review. In the absence of both, I'll try to look at this with fresh eyes some time this week and plan to merge next Monday.

alexisthual

I'm not familiar with the codebase so I might be missing a lot of important points.
What is missing thus far? Some warnings for non-conformant legacy images?

alexisthual · 2023-03-01T21:41:14Z

nibabel/gifti/gifti.py

@@ -834,20 +852,45 @@ def _to_xml_element(self):
            GIFTI.append(dar._to_xml_element())
        return GIFTI

-    def to_xml(self, enc='utf-8') -> bytes:
+    def to_xml(self, enc='utf-8', *, mode='strict') -> bytes:


Just curious, what is the * for here? Do you expect this function to be called with more kwargs?

The * means you have to call mode= as a keyword argument. So to_xml('utf-8', 'force') will fail, to_xml('utf-8', mode='force') will pass. I would be inclined to make enc keyword-only as well, I just didn't want to make it part of this PR...

That's interesting, thanks!

alexisthual · 2023-03-01T22:09:32Z

nibabel/gifti/gifti.py

+                if arr.datatype not in GIFTI_DTYPES:
+                    arr = copy(arr)
+                    # TODO: Better typing for recoders
+                    dtype = cast(np.dtype, data_type_codes.dtype[arr.datatype])
+                    if np.issubdtype(dtype, np.floating):
+                        arr.datatype = data_type_codes['float32']
+                    elif np.issubdtype(dtype, np.integer):
+                        arr.datatype = data_type_codes['int32']
+                    else:
+                        raise ValueError(f'Cannot convert {dtype} to float32/int32')


I get how this part corrects the darray's datatype attribute, but I don't understand where the actual data is casted to the new datatype

The data is only cast at write time. Serialization to XML will call DataArray._to_xml_element():

nibabel/nibabel/gifti/gifti.py

Lines 487 to 522 in 3a4cc5e

def _to_xml_element(self):

# fix endianness to machine endianness

self.endian = gifti_endian_codes.code[sys.byteorder]

# All attribute values must be strings

data_array = xml.Element(

'DataArray',

attrib={

'Intent': intent_codes.niistring[self.intent],

'DataType': data_type_codes.niistring[self.datatype],

'ArrayIndexingOrder': array_index_order_codes.label[self.ind_ord],

'Dimensionality': str(self.num_dim),

'Encoding': gifti_encoding_codes.specs[self.encoding],

'Endian': gifti_endian_codes.specs[self.endian],

'ExternalFileName': self.ext_fname,

'ExternalFileOffset': str(self.ext_offset),

},

)

for di, dn in enumerate(self.dims):

data_array.attrib['Dim%d' % di] = str(dn)

if self.meta is not None:

data_array.append(self.meta._to_xml_element())

if self.coordsys is not None:

data_array.append(self.coordsys._to_xml_element())

# write data array depending on the encoding

data_array.append(

_data_tag_element(

self.data,

gifti_encoding_codes.specs[self.encoding],

data_type_codes.dtype[self.datatype],

self.ind_ord,

)

)

return data_array

Which calls _data_tag_element():

nibabel/nibabel/gifti/gifti.py

Lines 371 to 391 in 3a4cc5e

def _data_tag_element(dataarray, encoding, dtype, ordering):

"""Creates data tag with given `encoding`, returns as XML element"""

import zlib

order = array_index_order_codes.npcode[ordering]

enclabel = gifti_encoding_codes.label[encoding]

if enclabel == 'ASCII':

da = _arr2txt(dataarray, KIND2FMT[dtype.kind])

elif enclabel in ('B64BIN', 'B64GZ'):

out = np.asanyarray(dataarray, dtype).tobytes(order)

if enclabel == 'B64GZ':

out = zlib.compress(out)

da = base64.b64encode(out).decode()

elif enclabel == 'External':

raise NotImplementedError('In what format are the external files?')

else:

da = ''

data = xml.Element('Data')

data.text = da

return data

L380 is the one that finally does it: out = np.asanyarray(dataarray, dtype).tobytes(order)

nibabel/gifti/tests/test_gifti.py

effigies · 2023-03-13T00:56:39Z

Thanks for the review, @alexisthual!

See: nipy/nibabel#1198 nipy/nibabel#1199 nilearn/nilearn#3649

MRG: Fix test creating unsupported double-precision GiftiDataArray See: nipy/nibabel#1198 nipy/nibabel#1199 nilearn/nilearn#3649

effigies mentioned this pull request Feb 17, 2023

Nibabel creates invalid GIfTI files #1198

Closed

neurolabusc mentioned this pull request Feb 17, 2023

Improve GIFTI support niivue/niivue#521

Merged

effigies added 7 commits February 19, 2023 19:09

FIX: Disable direct creation of non-conformant GiftiDataArrays

6df4a95

TEST: Validate GiftiDataArray construction wrt types

b9ef70a

TEST: Upgrade to new PRNG interface and cast output when needed

89d20b2

ENH: Enforce GIFTI compatibility at write

f2c1084

DOCTEST: Catch deprecation warning in doctest

fead0d5

TEST: Test write modes

cf9cf15

TEST: Capture stdout in some GIFTI tests

b400dd5

effigies force-pushed the fix/gifti_dtypes branch from 8919154 to b400dd5 Compare February 20, 2023 00:09

alexisthual reviewed Mar 1, 2023

View reviewed changes

Update nibabel/gifti/tests/test_gifti.py

cd1a39a

effigies merged commit 40e31e8 into nipy:master Mar 13, 2023

effigies deleted the fix/gifti_dtypes branch March 13, 2023 00:56

effigies added this to the 5.1.0 milestone Mar 29, 2023

ymzayek mentioned this pull request Apr 4, 2023

[MAINT] Fix failing tests on main due to creating unsupported GiftiDataArrays nilearn/nilearn#3649

Merged

musicinmybrain added a commit to musicinmybrain/nipy that referenced this pull request Sep 21, 2023

Fix test creating unsupported double-precision GiftiDataArray

f2c0829

See: nipy/nibabel#1198 nipy/nibabel#1199 nilearn/nilearn#3649

musicinmybrain mentioned this pull request Sep 21, 2023

Fix test creating unsupported double-precision GiftiDataArray nipy/nipy#514

Merged

musicinmybrain added a commit to musicinmybrain/nipy that referenced this pull request Sep 21, 2023

Fix test creating unsupported double-precision GiftiDataArray

4f8f032

See: nipy/nibabel#1198 nipy/nibabel#1199 nilearn/nilearn#3649

matthew-brett added a commit to nipy/nipy that referenced this pull request Sep 21, 2023

Merge pull request #514 from musicinmybrain/gifti

47a22d0

MRG: Fix test creating unsupported double-precision GiftiDataArray See: nipy/nibabel#1198 nipy/nibabel#1199 nilearn/nilearn#3649

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX: Disable direct creation of non-conformant GiftiDataArrays #1199

FIX: Disable direct creation of non-conformant GiftiDataArrays #1199

Uh oh!

effigies commented Feb 17, 2023

Uh oh!

codecov bot commented Feb 17, 2023 •

edited

Loading

Uh oh!

neurolabusc commented Feb 17, 2023

Uh oh!

effigies commented Feb 18, 2023

Uh oh!

effigies commented Feb 19, 2023

Uh oh!

alexisthual commented Feb 24, 2023

Uh oh!

effigies commented Feb 27, 2023

Uh oh!

alexisthual left a comment

Uh oh!

alexisthual Mar 1, 2023

Uh oh!

effigies Mar 1, 2023

Uh oh!

alexisthual Mar 2, 2023

Uh oh!

alexisthual Mar 1, 2023

Uh oh!

effigies Mar 1, 2023

Uh oh!

Uh oh!

effigies commented Mar 13, 2023

Uh oh!

Uh oh!

	def _to_xml_element(self):
	# fix endianness to machine endianness
	self.endian = gifti_endian_codes.code[sys.byteorder]

	# All attribute values must be strings
	data_array = xml.Element(
	'DataArray',
	attrib={
	'Intent': intent_codes.niistring[self.intent],
	'DataType': data_type_codes.niistring[self.datatype],
	'ArrayIndexingOrder': array_index_order_codes.label[self.ind_ord],
	'Dimensionality': str(self.num_dim),
	'Encoding': gifti_encoding_codes.specs[self.encoding],
	'Endian': gifti_endian_codes.specs[self.endian],
	'ExternalFileName': self.ext_fname,
	'ExternalFileOffset': str(self.ext_offset),
	},
	)
	for di, dn in enumerate(self.dims):
	data_array.attrib['Dim%d' % di] = str(dn)

	if self.meta is not None:
	data_array.append(self.meta._to_xml_element())
	if self.coordsys is not None:
	data_array.append(self.coordsys._to_xml_element())
	# write data array depending on the encoding
	data_array.append(
	_data_tag_element(
	self.data,
	gifti_encoding_codes.specs[self.encoding],
	data_type_codes.dtype[self.datatype],
	self.ind_ord,
	)
	)

	return data_array

	def _data_tag_element(dataarray, encoding, dtype, ordering):
	"""Creates data tag with given `encoding`, returns as XML element"""
	import zlib

	order = array_index_order_codes.npcode[ordering]
	enclabel = gifti_encoding_codes.label[encoding]
	if enclabel == 'ASCII':
	da = _arr2txt(dataarray, KIND2FMT[dtype.kind])
	elif enclabel in ('B64BIN', 'B64GZ'):
	out = np.asanyarray(dataarray, dtype).tobytes(order)
	if enclabel == 'B64GZ':
	out = zlib.compress(out)
	da = base64.b64encode(out).decode()
	elif enclabel == 'External':
	raise NotImplementedError('In what format are the external files?')
	else:
	da = ''

	data = xml.Element('Data')
	data.text = da
	return data

FIX: Disable direct creation of non-conformant GiftiDataArrays #1199

FIX: Disable direct creation of non-conformant GiftiDataArrays #1199

Uh oh!

Conversation

effigies commented Feb 17, 2023

Uh oh!

codecov bot commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

neurolabusc commented Feb 17, 2023

Uh oh!

effigies commented Feb 18, 2023

Uh oh!

effigies commented Feb 19, 2023

Uh oh!

alexisthual commented Feb 24, 2023

Uh oh!

effigies commented Feb 27, 2023

Uh oh!

alexisthual left a comment

Choose a reason for hiding this comment

Uh oh!

alexisthual Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

effigies Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

alexisthual Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

alexisthual Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

effigies Mar 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

effigies commented Mar 13, 2023

Uh oh!

Uh oh!

codecov bot commented Feb 17, 2023 •

edited

Loading