Fix for issue 362: nibabel fails to stream gzipped files > 4GB (uncompressed) in Python 3.5 #383

bcipolli · 2015-11-15T00:02:49Z

This not a nibabel bug, but works around a Python bug (https://bugs.python.org/issue25626).

The workaround is: we wrap gzip.GzipFile with buffering, so that files > 4GB require multiple calls to GzipFile.readinto.

I've also added test functionality: if NIPY_EXTRA_TESTS contains 'slow', slowly running tests can be run. I've used this to include a test for creation of a large file.

matthew-brett · 2015-11-15T00:04:18Z

nibabel/openers.py

@@ -17,11 +17,36 @@
 GZIP_MAX_READ_CHUNK = 100 * 1024 * 1024  # 100Mb


+class BufferedGzipFile(gzip.GzipFile):
+    """GzipFile capable to readinto buffer >= 2**32 bytes."""


"able to" rather than "capable to"

effigies · 2015-11-16T00:49:46Z

I haven't been following this PR, but I'm blocking 2.0.2 on this. Let me know if I shouldn't be.

bcipolli · 2015-11-16T15:47:42Z

@effigies This "fix" didn't actually work. I will submit a temporary workaround we discussed previously, this afternoon. Job interview this morning! :)

bcipolli · 2015-11-17T22:09:15Z

read suffers from the same OverflowError. However, this webpage covers how to solve the buffering issue I hit here.
http://eli.thegreenplace.net/2011/11/28/less-copies-in-python-with-the-buffer-protocol-and-memoryviews

I will work on an update...

bcipolli · 2015-11-17T23:06:01Z

OK, fixed this by using memoryview if available (3.0+ for sure, often in 2.7), falling back to vanilla readinto if not (up to 3.4). Also created a runif_extra_has header and a large file test (off by default). I ran the test locally in Python 3.5, and works for me...

If someone could pull this branch and run the Python 3.5 test, would be great. Will have to set NIPY_EXTRA_TESTS=slow in the environment.

Last question: where to document the use of NIPY_EXTRA_TESTS?

effigies · 2015-11-18T01:01:55Z

nibabel/openers.py

+class BufferedGzipFile(gzip.GzipFile):
+    """GzipFile able to readinto buffer >= 2**32 bytes."""
+    def __init__(self, fileish, mode='rb', compresslevel=9, buffer_size=2**32-1):
+        super(BufferedGzipFile, self).__init__(fileish, mode=mode, compresslevel=compresslevel)


Looks like gzip.GzipFile is not a new-style class in 2.6.

I think this should work:

gzip.GzipFile.__init__(self, fileish, mode=mode, compresslevel=compresslevel)

matthew-brett · 2015-11-18T02:17:55Z

Sorry to ask - but would you consider adding a small "Running the tests" section to doc/source/installation.rst, maybe including NIPY_EXTRA_TESTS?

bcipolli · 2015-11-18T02:47:38Z

Sorry to ask - but would you consider adding a small "Running the tests" section to doc/source/installation.rst, maybe including NIPY_EXTRA_TESTS?

Great; exactly what I was looking for. Will do.

bcipolli · 2015-11-18T03:36:26Z

@matthew-brett a bit puzzled about pushing documentation to installation.rst. Isn't running tests something for either development or deployment? Or, could you help me understand what you had in mind for adding to installation.rst?

bcipolli · 2015-11-18T13:01:24Z

readinto isn't defined in Python 2.6, so this code is invalid for that version. Taking @effigies hesitancy above as well:

BufferedGZipFile only differs from gzip.GZipFile for Python 3.5.0
The buffering code is only hit if the len(buf) >= 2**32

I tested this locally on my machine, it worked well. I will temporarily push a change to .travis.yml to test this on other architectures.

effigies · 2015-11-18T14:03:26Z

nibabel/openers.py

+                return super(BufferedGzipFile, self).readinto(buf)
+
+            # This works around a known issue in Python 3.5.
+            # See https://bugs.python.org/issue25626"""


No """ needed.

… returned data read/copy fails).

bcipolli · 2015-11-18T19:11:58Z

ok, updated code again per comments.

effigies · 2015-11-18T19:18:23Z

nibabel/testing/__init__.py

+    def decorator(func):
+        return skipif(test_str not in EXTRA_SET,
+                      "Skip {0} tests.".format(test_str))(func)
+    return decorator


Sorry, I'm going to be annoying and insist on removing the decorator function. What you have is:

def f(x): return g(x) return f

When you could just have return g. Just use:

def runif_extra_has(test_str): return skipif(test_str not in EXTRA_SET, "Skip {0} tests.".format(test_str))

effigies · 2015-11-18T19:27:02Z

Thanks for your patience. LGTM.

Squashed for 2.0.2 release. See nipygh-383 for full history. BF: Use buffered gzip read in Py3.5.0, specifically TST: Add high-memory usage test for large nifti1 files Conflicts: nibabel/testing/__init__.py

matthew-brett · 2015-11-19T01:59:45Z

nibabel/tests/test_nifti1.py

+        img.to_filename('test.nii.gz')
+        del img
+        data = load('test.nii.gz').get_data()
+    # Check that te data are all ones


Wow' that's meta. I meant "typo 'te'"

arthurmensch · 2015-11-19T08:44:57Z

I can confirm that HCP is working again on Python 3.5. Cheers !

bcipolli · 2015-11-19T16:16:40Z

👍 @matthew-brett still looking for some details on best place for docs; I think it's the last thing to do.
#383 (comment)

Squashed for 2.0.2 release. See nipygh-383 for full history. BF: Use buffered gzip read in Py3.5.0, specifically TST: Add high-memory usage test for large nifti1 files Conflicts: nibabel/testing/__init__.py

effigies · 2015-11-19T16:27:51Z

nibabel/tests/test_nifti1.py

+                            Nifti1Pair, Nifti1Extension, Nifti1Extensions,
+                            data_type_codes, extension_codes,
+                            slice_order_codes)
+from nibabel.openers import ImageOpener


Unused import.

matthew-brett · 2015-11-19T23:12:14Z

Ben - sorry - are you asking me where to put the docs?

bcipolli · 2015-11-19T23:21:44Z

Yes, I'm not sure where to add the documentation about the new test env variable. "Installation" felt odd to me, as it seems like a developer / release issue, so I wanted to double-check. In "installation", can you give me an idea how you envisioned it fitting in?

matthew-brett · 2015-11-20T00:14:04Z

How about a little section on 'testing' in the installation docs, pointing to a document advanced_testing in doc/source/devel ?

bcipolli · 2015-11-20T16:37:41Z

@matthew-brett Not sure if this is what you had in mind...

Squashed for 2.0.2 release. See nipygh-383 for full history. BF: Use buffered gzip read in Py3.5.0, specifically TST: Add high-memory usage test for large nifti1 files Conflicts: nibabel/testing/__init__.py

matthew-brett · 2015-11-20T20:00:31Z

I was thinking of something like 'to run tests, run nosetests nibabel or python -c "import nibabel; nibabel.test()" See also advanced testing.

Then advanced testing would be using the git submodules, as in:

git submodule update --init

and the slow tests stuff.

bcipolli · 2015-11-20T20:12:06Z

OK. Here's what it looks like now:

matthew-brett · 2015-11-20T20:14:15Z

Looks good - maybe add that nosetests etc should be run from the terminal rather than python / IPython.

bcipolli · 2015-11-20T20:15:10Z

Done. I think we're ready to go!

matthew-brett · 2015-11-20T20:17:32Z

Great - thanks again for wading through.

MRG: Fix for issue 362: Python 3.5 fails reading large gzipped files This not a nibabel bug, but works around a Python bug (https://bugs.python.org/issue25626). The workaround is: we wrap gzip.GzipFile with buffering, so that files > 4GB require multiple calls to GzipFile.readinto. I've also added test functionality: if NIPY_EXTRA_TESTS contains 'slow', slowly running tests can be run. I've used this to include a test for creation of a large file.

Squashed for 2.0.2 release. See nipygh-383 for full history. BF: Use buffered gzip read in Py3.5.0, specifically TST: Add high-memory usage test for large nifti1 files Conflicts: nibabel/testing/__init__.py

matthew-brett · 2015-11-20T20:50:44Z

Running the slow tests on a big Mac buildbot : http://nipy.bic.berkeley.edu/builders/nibabel-py2.7-osx-10.10/builds/27/steps/shell_5/logs/stdio

MRG: Fix for issue 362: Python 3.5 fails reading large gzipped files This not a nibabel bug, but works around a Python bug (https://bugs.python.org/issue25626). The workaround is: we wrap gzip.GzipFile with buffering, so that files > 4GB require multiple calls to GzipFile.readinto. I've also added test functionality: if NIPY_EXTRA_TESTS contains 'slow', slowly running tests can be run. I've used this to include a test for creation of a large file.

matthew-brett reviewed Nov 15, 2015
View reviewed changes

Ben Cipollini and others added 2 commits November 17, 2015 15:03

BF: Fix #362 using buffered gzip read.

5ee2c7c

TST: Add large nifti file test

b061fe9

Ben Cipollini added 3 commits November 17, 2015 15:08

TST: add image tests in large image test.

2e65cf9

TST: Add a runif_extra_has decorator.

38ebe17

DOC: updates for code review.

e49197e

effigies reviewed Nov 18, 2015
View reviewed changes

Ben Cipollini added 2 commits November 18, 2015 06:08

BF: Only define code change in Py3.5.0; only use if needed.

1c13e69

TST: Fix logic to check test values, use ones instead of zeros (zeros…

50f147c

… returned data read/copy fails).

effigies reviewed Nov 18, 2015
View reviewed changes

effigies mentioned this pull request Nov 18, 2015

MRG: Release candidate 2.0.2 #377

Merged

matthew-brett reviewed Nov 19, 2015
View reviewed changes

STY: code changes per code review.

795580f

effigies reviewed Nov 19, 2015
View reviewed changes

DOC: add documentation of advanced testing.

3123456

matthew-brett merged commit 853f5fb into nipy:master Nov 20, 2015

bcipolli deleted the issue-362 branch February 5, 2016 16:54

Fix for issue 362: nibabel fails to stream gzipped files > 4GB (uncompressed) in Python 3.5 #383

Fix for issue 362: nibabel fails to stream gzipped files > 4GB (uncompressed) in Python 3.5 #383

Uh oh!

Conversation

bcipolli commented Nov 15, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

effigies commented Nov 16, 2015

Uh oh!

bcipolli commented Nov 16, 2015

Uh oh!

bcipolli commented Nov 17, 2015

Uh oh!

bcipolli commented Nov 17, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthew-brett commented Nov 18, 2015

Uh oh!

bcipolli commented Nov 18, 2015

Uh oh!

bcipolli commented Nov 18, 2015

Uh oh!

bcipolli commented Nov 18, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bcipolli commented Nov 18, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

effigies commented Nov 18, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arthurmensch commented Nov 19, 2015

Uh oh!

bcipolli commented Nov 19, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthew-brett commented Nov 19, 2015

Uh oh!

bcipolli commented Nov 19, 2015

Uh oh!

matthew-brett commented Nov 20, 2015

Uh oh!

bcipolli commented Nov 20, 2015

Uh oh!

matthew-brett commented Nov 20, 2015

Uh oh!

bcipolli commented Nov 20, 2015

Uh oh!

matthew-brett commented Nov 20, 2015

Uh oh!

bcipolli commented Nov 20, 2015

Uh oh!

matthew-brett commented Nov 20, 2015

Uh oh!

matthew-brett commented Nov 20, 2015

Uh oh!

Uh oh!