Skip to content

MRG: implementing / testing get_fdata #551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 5, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 173 additions & 10 deletions nibabel/dataobj_images.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def __init__(self, dataobj, header=None, extra=None, file_map=None):
super(DataobjImage, self).__init__(header=header, extra=extra,
file_map=file_map)
self._dataobj = dataobj
self._fdata_cache = None
self._data_cache = None

@property
Expand All @@ -55,7 +56,19 @@ def _data(self):
return self._dataobj

def get_data(self, caching='fill'):
""" Return image data from image with any necessary scalng applied
""" Return image data from image with any necessary scaling applied

.. WARNING::

We recommend you use the ``get_fdata`` method instead of the
``get_data`` method, because it is easier to predict the return
data type. We will deprecate the ``get_data`` method around April
2018, and remove it around April 2020.

If you don't care about the predictability of the return data type,
and you want the minimum possible data size in memory, you can
replicate the array that would be returned by ``img.get_data()`` by
using ``np.asanyarray(img.dataobj)``.

The image ``dataobj`` property can be an array proxy or an array. An
array proxy is an object that knows how to load the image data from
Expand Down Expand Up @@ -125,7 +138,7 @@ def get_data(self, caching='fill'):
(no reference to the array). If the cache is full, "unchanged" leaves
the cache full and returns the cached array reference.

The cache can effect the behavior of the image, because if the cache is
The cache can affect the behavior of the image, because if the cache is
full, or you have an array image, then modifying the returned array
will modify the result of future calls to ``get_data()``. For example
you might do this:
Expand Down Expand Up @@ -191,11 +204,160 @@ def get_data(self, caching='fill'):
self._data_cache = data
return data

def get_fdata(self, caching='fill', dtype=np.float64):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why float64 over float32 is the default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@satra - the default up till now has been to return float64 for images with scaling. Also, some images do have float64 and it seems a shame to downgrade them with this call. So float32 seems to me like an option rather than a default.

""" Return floating point image data with necessary scaling applied

The image ``dataobj`` property can be an array proxy or an array. An
array proxy is an object that knows how to load the image data from
disk. An image with an array proxy ``dataobj`` is a *proxy image*; an
image with an array in ``dataobj`` is an *array image*.

The default behavior for ``get_fdata()`` on a proxy image is to read
the data from the proxy, and store in an internal cache. Future calls
to ``get_fdata`` will return the cached array. This is the behavior
selected with `caching` == "fill".

Once the data has been cached and returned from an array proxy, if you
modify the returned array, you will also modify the cached array
(because they are the same array). Regardless of the `caching` flag,
this is always true of an array image.

Parameters
----------
caching : {'fill', 'unchanged'}, optional
See the Notes section for a detailed explanation. This argument
specifies whether the image object should fill in an internal
cached reference to the returned image data array. "fill" specifies
that the image should fill an internal cached reference if
currently empty. Future calls to ``get_fdata`` will return this
cached reference. You might prefer "fill" to save the image object
from having to reload the array data from disk on each call to
``get_fdata``. "unchanged" means that the image should not fill in
the internal cached reference if the cache is currently empty. You
might prefer "unchanged" to "fill" if you want to make sure that
the call to ``get_fdata`` does not create an extra (cached)
reference to the returned array. In this case it is easier for
Python to free the memory from the returned array.
dtype : numpy dtype specifier
A numpy dtype specifier specifying a floating point type. Data is
returned as this floating point type. Default is ``np.float64``.

Returns
-------
fdata : array
Array of image data of data type `dtype`.

See also
--------
uncache: empty the array data cache

Notes
-----
All images have a property ``dataobj`` that represents the image array
data. Images that have been loaded from files usually do not load the
array data from file immediately, in order to reduce image load time
and memory use. For these images, ``dataobj`` is an *array proxy*; an
object that knows how to load the image array data from file.

By default (`caching` == "fill"), when you call ``get_fdata`` on a
proxy image, we load the array data from disk, store (cache) an
internal reference to this array data, and return the array. The next
time you call ``get_fdata``, you will get the cached reference to the
array, so we don't have to load the array data from disk again.

Array images have a ``dataobj`` property that already refers to an
array in memory, so there is no benefit to caching, and the `caching`
keywords have no effect.

For proxy images, you may not want to fill the cache after reading the
data from disk because the cache will hold onto the array memory until
the image object is deleted, or you use the image ``uncache`` method.
If you don't want to fill the cache, then always use
``get_fdata(caching='unchanged')``; in this case ``get_fdata`` will not
fill the cache (store the reference to the array) if the cache is empty
(no reference to the array). If the cache is full, "unchanged" leaves
the cache full and returns the cached array reference.

The cache can effect the behavior of the image, because if the cache is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

affect

full, or you have an array image, then modifying the returned array
will modify the result of future calls to ``get_fdata()``. For example
you might do this:

>>> import os
>>> import nibabel as nib
>>> from nibabel.testing import data_path
>>> img_fname = os.path.join(data_path, 'example4d.nii.gz')

>>> img = nib.load(img_fname) # This is a proxy image
>>> nib.is_proxy(img.dataobj)
True

The array is not yet cached by a call to "get_fdata", so:

>>> img.in_memory
False

After we call ``get_fdata`` using the default `caching` == 'fill', the
cache contains a reference to the returned array ``data``:

>>> data = img.get_fdata()
>>> img.in_memory
True

We modify an element in the returned data array:

>>> data[0, 0, 0, 0]
0.0
>>> data[0, 0, 0, 0] = 99
>>> data[0, 0, 0, 0]
99.0

The next time we call 'get_fdata', the method returns the cached
reference to the (modified) array:

>>> data_again = img.get_fdata()
>>> data_again is data
True
>>> data_again[0, 0, 0, 0]
99.0

If you had *initially* used `caching` == 'unchanged' then the returned
``data`` array would have been loaded from file, but not cached, and:

>>> img = nib.load(img_fname) # a proxy image again
>>> data = img.get_fdata(caching='unchanged')
>>> img.in_memory
False
>>> data[0, 0, 0] = 99
>>> data_again = img.get_fdata(caching='unchanged')
>>> data_again is data
False
>>> data_again[0, 0, 0, 0]
0.0
"""
if caching not in ('fill', 'unchanged'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to test caching='unknown'?

raise ValueError('caching value should be "fill" or "unchanged"')
dtype = np.dtype(dtype)
if not issubclass(dtype.type, np.inexact):
raise ValueError('{} should be floating point type'.format(dtype))
# Return cache if cache present and of correct dtype.
if self._fdata_cache is not None:
if self._fdata_cache.dtype.type == dtype.type:
return self._fdata_cache
data = np.asanyarray(self._dataobj).astype(dtype)
if caching == 'fill':
self._fdata_cache = data
return data

@property
def in_memory(self):
""" True when array data is in memory
""" True when any array data is in memory cache

There are separate caches for `get_data` reads and `get_fdata` reads.
This property is True if either of those caches are set.
"""
return (isinstance(self._dataobj, np.ndarray) or
self._fdata_cache is not None or
self._data_cache is not None)

def uncache(self):
Expand All @@ -206,23 +368,24 @@ def uncache(self):
* *array images* where the data ``img.dataobj`` is an array
* *proxy images* where the data ``img.dataobj`` is a proxy object

If you call ``img.get_data()`` on a proxy image, the result of reading
If you call ``img.get_fdata()`` on a proxy image, the result of reading
from the proxy gets cached inside the image object, and this cache is
what gets returned from the next call to ``img.get_data()``. If you
what gets returned from the next call to ``img.get_fdata()``. If you
modify the returned data, as in::

data = img.get_data()
data = img.get_fdata()
data[:] = 42

then the next call to ``img.get_data()`` returns the modified array,
then the next call to ``img.get_fdata()`` returns the modified array,
whether the image is an array image or a proxy image::

assert np.all(img.get_data() == 42)
assert np.all(img.get_fdata() == 42)

When you uncache an array image, this has no effect on the return of
``img.get_data()``, but when you uncache a proxy image, the result of
``img.get_data()`` returns to its original value.
``img.get_fdata()``, but when you uncache a proxy image, the result of
``img.get_fdata()`` returns to its original value.
"""
self._fdata_cache = None
self._data_cache = None

@property
Expand Down
3 changes: 3 additions & 0 deletions nibabel/tests/test_filebasedimages.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ def shape(self):
def get_data(self):
return self.arr

def get_fdata(self):
return self.arr.astype(np.float64)

@classmethod
def from_file_map(klass, file_map):
with file_map['image'].get_prepare_fileobj('rb') as fobj:
Expand Down
Loading