-
Notifications
You must be signed in to change notification settings - Fork 261
ENH: Create gzip header deterministically by default #1024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@yarikoptic @ltetrel Would either of you care to review? |
Codecov Report
@@ Coverage Diff @@
## master #1024 +/- ##
==========================================
- Coverage 92.26% 92.10% -0.16%
==========================================
Files 100 100
Lines 12201 12205 +4
Branches 2134 2136 +2
==========================================
- Hits 11257 11242 -15
- Misses 616 630 +14
- Partials 328 333 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you @effigies !!!
Thanks for the review. |
This is great! Thank you for being proactive about squashing the indeterminism/data leakage. |
As discussed in #1023, we currently allow
gzip.GzipFile
to set the gzip header as it will, including filename and mtime fields. This means that files with identical unzipped contents and compression levels have different gzipped representations, which is not ideal for verifying sources of non-determinism in a complex workflow.This PR attempts to balance sensible default behaviors with user control. It changes the default
mtime
fromtime.time()
to0
, while allowing users to set themtime
as an argument toOpener()
.It removes the behavior of setting the original filename in the gzip header. In a medical imaging context, the original filenames may include PHI (e.g., subject name, scan date, etc) that will follow the file through renames unless the file is rewritten entirely. A flag to preserve the old behavior does not seem worth the effort.
Fixes #1023.