FIX: Accept dtype parameter to ArrayProxy.array #844

effigies · 2019-11-19T16:34:46Z

At least back through numpy 1.12, the __array__ protocol has accepted a dtype parameter.

Overall this is poorly documented in numpy (the above link the only place I've seen reference to the dtype parameter), but some manual testing seems to indicate it works, and that its absence produces errors in the cases described in #843. Systematic tests are added to the proxy API. Happy to add more, if we can think of specific cases that might trip us up.

It occurs to me that this largely replaces the get_scaled() method added in #833, with the primary difference being that get_scaled() guarantees not to downcast if it can't assure there will be no overflow. get_scaled() will also make as few copies as possible, while __array__ may make another if get_scaled(dtype).dtype != np.dtype(dtype). If that distinction is not worth drawing, it may be worth removing get_scaled() before it hits a release, and make the __array__ interface THE way to control dtypes beyond get_fdata().

One interesting side effect of this is that mask or atlas data that is known to be <=255, but the scale factors are not guaranteed to be slope = 1, inter = 0 can be retrieved with minimal up-casting via:

np.array(atlas.dataobj, np.uint8)
# or 
np.uint8(atlas.dataobj)

This is of course quite dangerous, and with the current implementation, non-integer data will be happily coerced into meaningless garbage. Validating coercion or using safer casting rules could be a good idea, but that would also break the following equality by causing the RHS to raise an exception:

np.uint8(np.array(atlas.dataobj)) == np.array(atlas.dataobj, np.uint8)

All of your thoughts are very welcome.

Fixes #843.
Affects #842.
Follow-up to #833.

cc @jeromedockes @kchawla-pi @adelavega @rmarkello @matthew-brett

codecov · 2019-11-19T16:37:43Z

Codecov Report

Merging #844 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #844      +/-   ##
==========================================
- Coverage   90.02%   90.01%   -0.01%     
==========================================
  Files          98       98              
  Lines       12452    12441      -11     
  Branches     2190     2191       +1     
==========================================
- Hits        11210    11199      -11     
  Misses        890      890              
  Partials      352      352

Impacted Files	Coverage Δ
nibabel/ecat.py	`88.14% <100%> (-0.13%)`	⬇️
nibabel/minc1.py	`90.75% <100%> (-0.26%)`	⬇️
nibabel/dataobj_images.py	`95.52% <100%> (-0.26%)`	⬇️
nibabel/arrayproxy.py	`100% <100%> (ø)`	⬆️
nibabel/parrec.py	`91.87% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcc5448...3351d16. Read the comment docs.

effigies · 2019-11-25T14:05:23Z

I'm inclined to drop get_scaled() in favor of a dtyped __array__. Any thoughts?

adelavega · 2019-11-25T18:23:32Z

The upside of get_scaled is that its a bit clearer what it idoes. Using __array__ directly seems a bit more opaque, unless well documented. But it seems this is not something you necessarily want to encourage doing anyways (versus the standard way of accessing, which is get_fdata), so maybe that's fine.

If so, then I don't see much need for get_scaled either.

effigies · 2019-11-25T19:20:31Z

I'd be happy to add a more complete description of the dataobj to the docs, if that would help. Right now, it's a pretty small thing here: https://nipy.org/nibabel/images_and_memory.html#use-the-array-proxy-instead-of-get-fdata

Just to concisely summarize: np.array(img.dataobj) (and np.asarray and np.asanyarray) is a valid way of getting information out of the object that's guaranteed to have had scaling applied to it, but the dtype is ambiguous. np.array(img.dataobj, dtype=np.int16) is (with this) very precise, although it may involve unsafe casting.

img.dataobj.get_scaled() is somewhere in between, with bounded ambiguity. Which might still be useful. If I say get_scaled(dtype='uint16'), I know I have something to which I can safely do anything I could do to a uint16.

IDK if that affects your thinking here.

rmarkello · 2019-11-26T02:38:18Z

In my mind this is similar to the recent pandas (v0.25.0) transition away from DataFrame.get_values() (now deprecated) to np.asarray(DataFrame) and/or DataFrame.to_numpy(), both of which allow unsafe casting of the sort you describe.

I'd personally be fine to see get_scaled() replaced entirely by __array__, but if you're really worried about the casting you could also add a small check (e.g., confirm whether the provided dtype would result in potentially unsafe casting and throw a warning so that people are aware of what they're doing). That way it would still return whatever datatype was requested so the equality you proposed in the initial post would pass, but you're not being too prescriptive.

effigies · 2019-11-26T15:22:40Z

Okay. I've dropped get_scaled() for now, and will remove it from the API changes in 3.0. It can always be added back, if it's useful enough.

effigies · 2019-12-10T16:49:42Z

Any further comments? I'm inclined to do a second RC after this, to give a bit more testing time.

effigies mentioned this pull request Nov 25, 2019

FIX: Update all SpatialImage.get_data -> get_fdata nipreps/niworkflows#426

Merged

effigies force-pushed the fix/arraylike_dtype branch 2 times, most recently from 6572302 to 99b368b Compare November 26, 2019 16:57

effigies mentioned this pull request Nov 28, 2019

TEST: Check non-integral slopes, intercepts in ArrayProxy API #847

Merged

effigies force-pushed the fix/arraylike_dtype branch from 71a2899 to 6856c31 Compare December 2, 2019 16:50

effigies added 6 commits December 5, 2019 19:03

FIX: Accept dtype parameter to ArrayProxy.__array__

ccd7453

TEST: Validate dataobj.__array__(dtype)

5b76c8e

RF: Drop (mostly) redundant ArrayProxy.get_scaled() method

b457534

DOC: Update changelog

8630b26

TEST: Filter complex warnings

5e5ec22

TEST: Improve test naming for tracking down failures

70987fb

effigies force-pushed the fix/arraylike_dtype branch from 43c23d6 to 70987fb Compare December 6, 2019 00:04

FIX: ECAT data must be coerced after reading

3351d16

effigies merged commit f1e1008 into nipy:master Dec 11, 2019

effigies deleted the fix/arraylike_dtype branch December 11, 2019 13:16

skoudoro mentioned this pull request Dec 23, 2019

Update nibabel minimum version (3.0.0) dipy/dipy#2015

Merged

effigies added this to the 3.0.0 milestone Jan 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX: Accept dtype parameter to ArrayProxy.array #844

FIX: Accept dtype parameter to ArrayProxy.array #844

Uh oh!

effigies commented Nov 19, 2019

Uh oh!

codecov bot commented Nov 19, 2019 •

edited

Loading

Uh oh!

effigies commented Nov 25, 2019

Uh oh!

adelavega commented Nov 25, 2019

Uh oh!

effigies commented Nov 25, 2019

Uh oh!

rmarkello commented Nov 26, 2019

Uh oh!

effigies commented Nov 26, 2019

Uh oh!

effigies commented Dec 10, 2019

Uh oh!

Uh oh!

FIX: Accept dtype parameter to ArrayProxy.__array__ #844

FIX: Accept dtype parameter to ArrayProxy.__array__ #844

Uh oh!

Conversation

effigies commented Nov 19, 2019

Uh oh!

codecov bot commented Nov 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

effigies commented Nov 25, 2019

Uh oh!

adelavega commented Nov 25, 2019

Uh oh!

effigies commented Nov 25, 2019

Uh oh!

rmarkello commented Nov 26, 2019

Uh oh!

effigies commented Nov 26, 2019

Uh oh!

effigies commented Dec 10, 2019

Uh oh!

Uh oh!

FIX: Accept dtype parameter to ArrayProxy.array #844

FIX: Accept dtype parameter to ArrayProxy.array #844

codecov bot commented Nov 19, 2019 •

edited

Loading