-
Notifications
You must be signed in to change notification settings - Fork 262
Setting gzip mtime to zero on image save #1023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is odd -- I thought nibabel already did timestamp annihilation (now I have difficulty quickly finding the project(s) where we had done that in the past, but I know we did! ;)). I think Didn't even think about filename - indeed should not be in the header regardless of the user's desires IMHO ;-) |
I'm fine with 2, though in the short term I think it would be a parameter to |
The only advantage I see of keeping the |
In principle, we could set |
and then someone doing "data carving" @ltetrel suggested overwrites all different files with a single "last winner" ;) IMHO not worth it. |
So I was reading the gzip spec, and the last 8 bytes can actually be used to check data integrity (CRC-32 code followed by size of the object). |
does python library support adding that data integrity check? if does -- I guess it should not hurt but worth checking if it doesn't ;) |
I checked and the |
The spec hasn't changed in 25 years... |
I may be misreading, but the def _read_eof(self):
# We've read to the end of the file
# We check that the computed CRC and size of the
# uncompressed data matches the stored values. Note that the size
# stored is the true file size mod 2**32.
crc32, isize = struct.unpack("<II", self._read_exact(8))
if crc32 != self._crc:
raise BadGzipFile("CRC check failed %s != %s" % (hex(crc32),
hex(self._crc)))
elif isize != (self._stream_size & 0xffffffff):
raise BadGzipFile("Incorrect length of data produced") |
@effigies Oh yes but I mean it is private not public so we cannot use this function member, but again it is simple to just copy it. |
Ah, right. You can get it easily with: try:
_ = gzipfile.read()
valid_crc = True
except BadGzipFile:
valid_crc = False |
Right now
ImageOpener()
does not allow us to pass anmtime
, which means that saving identical images as.nii.gz
1s apart will produce two images with two different checksums.Is there any value in saving a non-zero mtime? I see a few options, in my order of preference, most preferred first:
Note that the way we use
GzipFile
, it also sets the filename, which means saving thea.nii.gz
andb.nii.gz
with the same timestamp would still not have identical checksums. I would also be inclined to open GzipFiles in such a way as to not set filenames.The text was updated successfully, but these errors were encountered: