Skip to content

Fix for issue 362: nibabel fails to stream gzipped files > 4GB (uncompressed) in Python 3.5 #383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 20, 2015

Conversation

bcipolli
Copy link
Contributor

This not a nibabel bug, but works around a Python bug (https://bugs.python.org/issue25626).

The workaround is: we wrap gzip.GzipFile with buffering, so that files > 4GB require multiple calls to GzipFile.readinto.

I've also added test functionality: if NIPY_EXTRA_TESTS contains 'slow', slowly running tests can be run. I've used this to include a test for creation of a large file.

@@ -17,11 +17,36 @@
GZIP_MAX_READ_CHUNK = 100 * 1024 * 1024 # 100Mb


class BufferedGzipFile(gzip.GzipFile):
"""GzipFile capable to readinto buffer >= 2**32 bytes."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"able to" rather than "capable to"

@effigies
Copy link
Member

I haven't been following this PR, but I'm blocking 2.0.2 on this. Let me know if I shouldn't be.

@bcipolli
Copy link
Contributor Author

@effigies This "fix" didn't actually work. I will submit a temporary workaround we discussed previously, this afternoon. Job interview this morning! :)

@bcipolli
Copy link
Contributor Author

read suffers from the same OverflowError. However, this webpage covers how to solve the buffering issue I hit here.
http://eli.thegreenplace.net/2011/11/28/less-copies-in-python-with-the-buffer-protocol-and-memoryviews

I will work on an update...

@bcipolli
Copy link
Contributor Author

OK, fixed this by using memoryview if available (3.0+ for sure, often in 2.7), falling back to vanilla readinto if not (up to 3.4). Also created a runif_extra_has header and a large file test (off by default). I ran the test locally in Python 3.5, and works for me...

If someone could pull this branch and run the Python 3.5 test, would be great. Will have to set NIPY_EXTRA_TESTS=slow in the environment.

Last question: where to document the use of NIPY_EXTRA_TESTS?

class BufferedGzipFile(gzip.GzipFile):
"""GzipFile able to readinto buffer >= 2**32 bytes."""
def __init__(self, fileish, mode='rb', compresslevel=9, buffer_size=2**32-1):
super(BufferedGzipFile, self).__init__(fileish, mode=mode, compresslevel=compresslevel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like gzip.GzipFile is not a new-style class in 2.6.

I think this should work:

gzip.GzipFile.__init__(self, fileish, mode=mode, compresslevel=compresslevel)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@matthew-brett
Copy link
Member

Sorry to ask - but would you consider adding a small "Running the tests" section to doc/source/installation.rst, maybe including NIPY_EXTRA_TESTS?

@bcipolli
Copy link
Contributor Author

Sorry to ask - but would you consider adding a small "Running the tests" section to doc/source/installation.rst, maybe including NIPY_EXTRA_TESTS?

Great; exactly what I was looking for. Will do.

@bcipolli
Copy link
Contributor Author

@matthew-brett a bit puzzled about pushing documentation to installation.rst. Isn't running tests something for either development or deployment? Or, could you help me understand what you had in mind for adding to installation.rst?

@bcipolli
Copy link
Contributor Author

readinto isn't defined in Python 2.6, so this code is invalid for that version. Taking @effigies hesitancy above as well:

  • BufferedGZipFile only differs from gzip.GZipFile for Python 3.5.0
  • The buffering code is only hit if the len(buf) >= 2**32

I tested this locally on my machine, it worked well. I will temporarily push a change to .travis.yml to test this on other architectures.

return super(BufferedGzipFile, self).readinto(buf)

# This works around a known issue in Python 3.5.
# See https://bugs.python.org/issue25626"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No """ needed.

@bcipolli
Copy link
Contributor Author

ok, updated code again per comments.

def decorator(func):
return skipif(test_str not in EXTRA_SET,
"Skip {0} tests.".format(test_str))(func)
return decorator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm going to be annoying and insist on removing the decorator function. What you have is:

def f(x):
    return g(x)
return f

When you could just have return g. Just use:

def runif_extra_has(test_str):
    return skipif(test_str not in EXTRA_SET, "Skip {0} tests.".format(test_str))

@effigies
Copy link
Member

Thanks for your patience. LGTM.

effigies pushed a commit to effigies/nibabel that referenced this pull request Nov 18, 2015
Squashed for 2.0.2 release.
See nipygh-383 for full history.

BF: Use buffered gzip read in Py3.5.0, specifically
TST: Add high-memory usage test for large nifti1 files

Conflicts:
	nibabel/testing/__init__.py
effigies pushed a commit to effigies/nibabel that referenced this pull request Nov 18, 2015
Squashed for 2.0.2 release.
See nipygh-383 for full history.

BF: Use buffered gzip read in Py3.5.0, specifically
TST: Add high-memory usage test for large nifti1 files

Conflicts:
	nibabel/testing/__init__.py
effigies pushed a commit to effigies/nibabel that referenced this pull request Nov 18, 2015
Squashed for 2.0.2 release.
See nipygh-383 for full history.

BF: Use buffered gzip read in Py3.5.0, specifically
TST: Add high-memory usage test for large nifti1 files

Conflicts:
	nibabel/testing/__init__.py
img.to_filename('test.nii.gz')
del img
data = load('test.nii.gz').get_data()
# Check that te data are all ones
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type 'te'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow' that's meta. I meant "typo 'te'"

@arthurmensch
Copy link

I can confirm that HCP is working again on Python 3.5. Cheers !

@bcipolli
Copy link
Contributor Author

👍 @matthew-brett still looking for some details on best place for docs; I think it's the last thing to do.
#383 (comment)

effigies pushed a commit to effigies/nibabel that referenced this pull request Nov 19, 2015
Squashed for 2.0.2 release.
See nipygh-383 for full history.

BF: Use buffered gzip read in Py3.5.0, specifically
TST: Add high-memory usage test for large nifti1 files

Conflicts:
	nibabel/testing/__init__.py
Nifti1Pair, Nifti1Extension, Nifti1Extensions,
data_type_codes, extension_codes,
slice_order_codes)
from nibabel.openers import ImageOpener
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import.

@matthew-brett
Copy link
Member

Ben - sorry - are you asking me where to put the docs?

@bcipolli
Copy link
Contributor Author

Yes, I'm not sure where to add the documentation about the new test env variable. "Installation" felt odd to me, as it seems like a developer / release issue, so I wanted to double-check. In "installation", can you give me an idea how you envisioned it fitting in?

@matthew-brett
Copy link
Member

How about a little section on 'testing' in the installation docs, pointing to a document advanced_testing in doc/source/devel ?

@bcipolli
Copy link
Contributor Author

@matthew-brett Not sure if this is what you had in mind...

image

image

effigies pushed a commit to effigies/nibabel that referenced this pull request Nov 20, 2015
Squashed for 2.0.2 release.
See nipygh-383 for full history.

BF: Use buffered gzip read in Py3.5.0, specifically
TST: Add high-memory usage test for large nifti1 files

Conflicts:
	nibabel/testing/__init__.py
@matthew-brett
Copy link
Member

I was thinking of something like 'to run tests, run nosetests nibabel or python -c "import nibabel; nibabel.test()" See also advanced testing.

Then advanced testing would be using the git submodules, as in:

git submodule update --init

and the slow tests stuff.

@bcipolli
Copy link
Contributor Author

OK. Here's what it looks like now:
image
image

@matthew-brett
Copy link
Member

Looks good - maybe add that nosetests etc should be run from the terminal rather than python / IPython.

@bcipolli
Copy link
Contributor Author

Done. I think we're ready to go!

@matthew-brett
Copy link
Member

Great - thanks again for wading through.

matthew-brett added a commit that referenced this pull request Nov 20, 2015
MRG: Fix for issue 362: Python 3.5 fails reading large gzipped files 

This not a nibabel bug, but works around a Python bug (https://bugs.python.org/issue25626).

The workaround is: we wrap gzip.GzipFile with buffering, so that files > 4GB require multiple calls to GzipFile.readinto.

I've also added test functionality: if NIPY_EXTRA_TESTS contains 'slow', slowly running tests can be run. I've used this to include a test for creation of a large file.
@matthew-brett matthew-brett merged commit 853f5fb into nipy:master Nov 20, 2015
effigies pushed a commit to effigies/nibabel that referenced this pull request Nov 20, 2015
Squashed for 2.0.2 release.
See nipygh-383 for full history.

BF: Use buffered gzip read in Py3.5.0, specifically
TST: Add high-memory usage test for large nifti1 files

Conflicts:
	nibabel/testing/__init__.py
@matthew-brett
Copy link
Member

@bcipolli bcipolli deleted the issue-362 branch February 5, 2016 16:54
grlee77 pushed a commit to grlee77/nibabel that referenced this pull request Mar 15, 2016
MRG: Fix for issue 362: Python 3.5 fails reading large gzipped files 

This not a nibabel bug, but works around a Python bug (https://bugs.python.org/issue25626).

The workaround is: we wrap gzip.GzipFile with buffering, so that files > 4GB require multiple calls to GzipFile.readinto.

I've also added test functionality: if NIPY_EXTRA_TESTS contains 'slow', slowly running tests can be run. I've used this to include a test for creation of a large file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants