-
Notifications
You must be signed in to change notification settings - Fork 264
Fix for issue 362: nibabel fails to stream gzipped files > 4GB (uncompressed) in Python 3.5 #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5ee2c7c
b061fe9
2e65cf9
38ebe17
e49197e
1c13e69
50f147c
795580f
3123456
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
.. -*- mode: rst -*- | ||
.. ex: set sts=4 ts=4 sw=4 et tw=79: | ||
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### | ||
# | ||
# See COPYING file distributed along with the NiBabel package for the | ||
# copyright and license terms. | ||
# | ||
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### | ||
|
||
.. _advanced_testing: | ||
|
||
************ | ||
Advanced Testing | ||
************ | ||
|
||
Setup | ||
----- | ||
|
||
Before running advanced tests, please update all submodules of nibabel, by running ``git submodule update --init`` | ||
|
||
|
||
Long-running tests | ||
------------------ | ||
|
||
Long-running tests are not enabled by default, and can be resource-intensive. To run these tests: | ||
|
||
* Set environment variable ``NIPY_EXTRA_TESTS=slow`` | ||
* Run ``nosetests``. | ||
|
||
Note that some tests may require a machine with >4GB of RAM. | ||
|
||
.. include:: ../links_names.txt |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,3 +14,4 @@ Developer documentation page | |
add_image_format | ||
devdiscuss | ||
make_release | ||
advanced_testing |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,19 +9,65 @@ | |
""" Context manager openers for various fileobject types | ||
""" | ||
|
||
from os.path import splitext | ||
import gzip | ||
import bz2 | ||
import gzip | ||
import sys | ||
from os.path import splitext | ||
|
||
|
||
# The largest memory chunk that gzip can use for reads | ||
GZIP_MAX_READ_CHUNK = 100 * 1024 * 1024 # 100Mb | ||
|
||
|
||
class BufferedGzipFile(gzip.GzipFile): | ||
"""GzipFile able to readinto buffer >= 2**32 bytes. | ||
|
||
This class only differs from gzip.GzipFile | ||
in Python 3.5.0. | ||
|
||
This works around a known issue in Python 3.5. | ||
See https://bugs.python.org/issue25626""" | ||
|
||
# This helps avoid defining readinto in Python 2.6, | ||
# where it is undefined on gzip.GzipFile. | ||
# It also helps limit the exposure to this code. | ||
if sys.version_info[:3] == (3, 5, 0): | ||
def __init__(self, fileish, mode='rb', compresslevel=9, | ||
buffer_size=2**32-1): | ||
super(BufferedGzipFile, self).__init__(fileish, mode=mode, | ||
compresslevel=compresslevel) | ||
self.buffer_size = buffer_size | ||
|
||
def readinto(self, buf): | ||
"""Uses self.buffer_size to do a buffered read.""" | ||
n_bytes = len(buf) | ||
if n_bytes < 2 ** 32: | ||
return super(BufferedGzipFile, self).readinto(buf) | ||
|
||
# This works around a known issue in Python 3.5. | ||
# See https://bugs.python.org/issue25626 | ||
mv = memoryview(buf) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer keeping
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to invert?
Or is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine on the I believe that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Point taken about waiting on an OverflowError. What about making the check explicitly for Python 3.5.0, instead of testing for And my understanding is that |
||
n_read = 0 | ||
max_read = 2 ** 32 - 1 # Max for unsigned 32-bit integer | ||
while (n_read < n_bytes): | ||
n_wanted = min(n_bytes - n_read, max_read) | ||
n_got = super(BufferedGzipFile, self).readinto( | ||
mv[n_read:n_read + n_wanted]) | ||
n_read += n_got | ||
if n_got != n_wanted: | ||
break | ||
return n_read | ||
|
||
|
||
def _gzip_open(fileish, *args, **kwargs): | ||
# open gzip files with faster reads on large files using larger chunks | ||
gzip_file = BufferedGzipFile(fileish, *args, **kwargs) | ||
|
||
# Speedup for #209; attribute not present in in Python 3.5 | ||
# open gzip files with faster reads on large files using larger | ||
# See https://github.com/nipy/nibabel/pull/210 for discussion | ||
gzip_file = gzip.open(fileish, *args, **kwargs) | ||
gzip_file.max_read_chunk = GZIP_MAX_READ_CHUNK | ||
if hasattr(gzip_file, 'max_chunk_read'): | ||
gzip_file.max_read_chunk = GZIP_MAX_READ_CHUNK | ||
|
||
return gzip_file | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,17 +13,18 @@ | |
|
||
import numpy as np | ||
|
||
from ..externals.six import BytesIO | ||
from ..casting import type_info, have_binary128 | ||
from ..tmpdirs import InTemporaryDirectory | ||
from ..spatialimages import HeaderDataError | ||
from ..eulerangles import euler2mat | ||
from ..affines import from_matvec | ||
from .. import nifti1 as nifti1 | ||
from ..nifti1 import (load, Nifti1Header, Nifti1PairHeader, Nifti1Image, | ||
Nifti1Pair, Nifti1Extension, Nifti1Extensions, | ||
data_type_codes, extension_codes, slice_order_codes) | ||
|
||
from nibabel import nifti1 as nifti1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reason to avoid relative imports? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought the standard is to use absolute imports in tests. At least, @effigies and I have been migrating to that standard as we touch test files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason for that that you know? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got it from nilearn: nilearn/nilearn#384
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK - I don't feel strongly either way. |
||
from nibabel.affines import from_matvec | ||
from nibabel.casting import type_info, have_binary128 | ||
from nibabel.eulerangles import euler2mat | ||
from nibabel.externals.six import BytesIO | ||
from nibabel.nifti1 import (load, Nifti1Header, Nifti1PairHeader, Nifti1Image, | ||
Nifti1Pair, Nifti1Extension, Nifti1Extensions, | ||
data_type_codes, extension_codes, | ||
slice_order_codes) | ||
from nibabel.openers import ImageOpener | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unused import. |
||
from nibabel.spatialimages import HeaderDataError | ||
from nibabel.tmpdirs import InTemporaryDirectory | ||
from ..freesurfer import load as mghload | ||
|
||
from .test_arraywriters import rt_err_estimate, IUINT_TYPES | ||
|
@@ -35,7 +36,7 @@ | |
from nose.tools import (assert_true, assert_false, assert_equal, | ||
assert_raises) | ||
|
||
from ..testing import data_path, suppress_warnings | ||
from ..testing import data_path, suppress_warnings, runif_extra_has | ||
|
||
from . import test_analyze as tana | ||
from . import test_spm99analyze as tspm | ||
|
@@ -1242,3 +1243,19 @@ def test_rt_bias(self): | |
# Hokey use of max_miss as a std estimate | ||
bias_thresh = np.max([max_miss / np.sqrt(count), eps]) | ||
assert_true(np.abs(bias) < bias_thresh) | ||
|
||
|
||
@runif_extra_has('slow') | ||
def test_large_nifti1(): | ||
image_shape = (91, 109, 91, 1200) | ||
img = Nifti1Image(np.ones(image_shape, dtype=np.float32), | ||
affine=np.eye(4)) | ||
# Dump and load the large image. | ||
with InTemporaryDirectory(): | ||
img.to_filename('test.nii.gz') | ||
del img | ||
data = load('test.nii.gz').get_data() | ||
# Check that the data are all ones | ||
assert_equal(image_shape, data.shape) | ||
n_ones = np.sum((data == 1.)) | ||
assert_equal(np.prod(image_shape), n_ones) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we checked to make sure large
.bz2
files aren't affected by the same upstream bug?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. What file types allow
.bz
?.nii.bz
didn't seem to work.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duh,
.bz2
. Trying it now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fails :(

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like 2.6, and so is probably a different issue from this PR's concern. Possibly worth looking into in a new PR, but low priority in my mind. (Nobody seems to be using large .bz2 files, and who's working on HCP data with seriously old Python?)
What about 3.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was Python 2.7, but good point--I didn't use Python3. Python 3 is hanging, using a ton of resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
bz2
sounds like its own issue, not something we can quickly take care of while we're here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it finally returned successfully in Python 3.5