-
Notifications
You must be signed in to change notification settings - Fork 261
Only use indexed_gzip
when explicitly requested
#562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for this. It looks good to me on a first pass, but I don't think I'm thinking optimally this morning, so I'll have a more detailed look later, unless Matthew beats me to it. |
The failing Travis tests are related to #556. Don't worry about those. |
Codecov Report
@@ Coverage Diff @@
## master #562 +/- ##
==========================================
- Coverage 94.33% 94.32% -0.02%
==========================================
Files 177 177
Lines 24680 24689 +9
Branches 2635 2638 +3
==========================================
+ Hits 23283 23288 +5
- Misses 921 925 +4
Partials 476 476
Continue to review full report at Codecov.
|
In light of a discussion over at #557, I've also added a check to the present version of |
nibabel/openers.py
Outdated
|
||
if StrictVersion(version) < StrictVersion("0.6.0"): | ||
raise ImportError('indexed_gzip is present, but too old ' | ||
'(>= 0.6.0 required): {})'.format(version)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning won't actually be presented to the user due to the except
. I'd do:
try:
from indexed_gzip import SafeIndexedGzip
HAVE_INDEXED_GZIP = True
except ImportError:
HAVE_INDEXED_GZIP = False
else:
from indexed_gzip import __version__ as igzip_version
if StrictVersion(igzip_version) < StrictVersion("0.6.0"):
warnings.warn(...)
HAVE_INDEXED_GZIP = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. One question and a minor suggestion.
nibabel/openers.py
Outdated
@@ -117,6 +130,9 @@ class Opener(object): | |||
default_compresslevel = 1 | |||
#: whether to ignore case looking for compression extensions | |||
compress_ext_icase = True | |||
#: hint which tells us whether the file handle will be kept open for | |||
# multiple reads/writes, or just for one-time access. | |||
default_keep_open = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this class variable intended to be changed by a power user? Under what circumstances might I want to change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure - I was just going with the convention set from the other kwargs. I can't really think of a use-case for changing it, so will change it default it to False
in __init__
.
nibabel/openers.py
Outdated
# Default keep_open hint | ||
if 'keep_open' in arg_names: | ||
if 'keep_open' not in kwargs: | ||
kwargs['keep_open'] = self.default_keep_open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can replace the internal if
block with:
kwargs.setdefault('keep_open', self.default_keep_open)
(True, {'mode' : 'rb', 'keep_open' : False}, GzipFile), | ||
(True, {'mode' : 'wb', 'keep_open' : True}, GzipFile), | ||
(True, {'mode' : 'wb', 'keep_open' : False}, GzipFile), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Since 0.6.1 was released, now getting issues with this line: nibabel/nibabel/tests/test_openers.py Line 241 in dbe74e1
Should probably make it |
Aah true - I changed the inheritance hierarchy so that You were testing against the master branch, right? I didn't pick this up on the PR branch, because I've been quite busy this week, so still have a bit of work to do on the benchmarking code (and have had to make a few changes to other benchmark modules to get them to run). But I should be able to finalise the outstanding issues before the end of the week. |
Ok, sorry for the delay ... I've knocked up a benchmarking script which compares
Each of these are compared against the time taken to perform an equivalent In its current form the script takes about 30 minutes to run. Some representative results are as follows:
I found it interesting that slices of the form The last commit in this push contains some changes that I had to make to get all of the existing |
Yeah, I would see if upping your buffer size can improve performance a bit. On my system, the default is 8KiB, while we set (when possible) the GzipFile max read chunk at 100MiB. |
Although that fix is only applied to python < 3.4 - in 3.5 and above, the Here are results after changing gzip_file = SafeIndexedGzipFile(filename,
readbuf_size=GZIP_MAX_READ_CHUNK,
buffer_size=GZIP_MAX_READ_CHUNK)
It's made volume access slower! I guess because way more data is being read than necessary. |
Hmm. Okay. Still, failing back to standard GzipFile speeds isn't bad. As long as we're not significantly under-performing vanilla gzip for common operations, I'm okay with "This only improves some operations." |
Well, it does improve performance on all of the tested slices, just not as much as I would like :). It's not shown directly in the benchmark output, but the most interesting comparison is between
The biggest gain is for volume access (x18 speed-up, what I originally wrote I'll try and put aside a few days over Christmas to do some more digging, and see if I can get some bigger speed-ups. |
Well I am going to have to eat my words there - no change in results with more iterations - these results were with 200 iterations, on a shape of
But if I run the benchmark on a bigger image (50 iterations,
Performance for the first slice type |
I'm good with this if you want to merge or rebase to master to fix the tests. |
multiple accesses, so Opener knows whether it should use indexed_gzip or not.
…dGzipFile classes as appropriate
…ror to a warning, so it will be shown to users.
…is hard coded in __init__
bc7b6f7
to
cc724ef
Compare
Thanks for the quick work, @pauldmccarthy. |
This PR aims to address issue #558 - it should have been a part of PR #552
For small files, the current version of
indexed_gzip
is much slower to use than the built-inGzipFile
class. Therefore,indexed_gzip
should not be used unless it is requested (via thekeep_file_open
flag, which is interpreted by thenibabel.arrayproxy.ArrayProxy
class). In all other circumstances, the built-ingzip.GzipFile
class should be used instead.I have made this change by having the
ArrayProxy
pass a "hint" to theOpener
class, indicating whether the file handle is going to be kept open for multiple accesses, or whether it is just for a one-time access. TheOpener
class (actually, thenibabel.openers._gzip_open
function) then decides whether or not to useindexed_gzip
.