-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-103477: Write gzip trailer with zlib #112199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fb44649
to
d17848d
Compare
f86cb37
to
f110d56
Compare
f110d56
to
8732a79
Compare
Rebased. @gpshead @rhpvorderman Could you please have a look? |
I am not a core developer so I cannot approve anything. It looks good to me though. A quite non-invasive change with massive performance benefits. Is it really 40%? I thought calculating the crc is only something like 10% of the total cost, but I may be wrong here. |
Thanks for having a look. For the pure software implementation I would also put it at 10%. But on s390x we have hardware-accelerated deflate (with a compression speedup factor of up to 100x, you can see the performance figures here: https://github.com/iii-i/zlib-ng/wiki/Performance-with-dfltcc-patch-applied-and-dfltcc-support-built-on-dfltcc-enabled-machine), so computing CRC in software stands out more. |
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib optimization [1] that significantly improves deflate performance by using a specialized CPU instruction. This instruction not only compresses the data, but also computes a checksum. At the moment Pyhton's gzip support performs compression and checksum calculation separately, which creates unnecessary overhead. The reason is that Python needs to write specific values into gzip header, so it uses a raw stream instead of a gzip stream, and zlib does not compute a checksum for raw streams. The challenge with using gzip streams instead of zlib streams is dealing with zlib-generated gzip header, which we need to rather generate manually. Implement the method proposed by @rhpvorderman: use Z_BLOCK on the first deflate() call in order to stop before the first deflate block is emitted. The data that is emitted up until this point is zlib-generated gzip header, which should be discarded. Expose this new functionality by adding a boolean gzip_trailer argument to zlib.compress() and zlib.compressobj(). Make use of it in gzip.compress(), GzipFile and TarFile. The performance improvement varies depending on data being compressed, but it's in the ballpark of 40%. An alternative approach is to use the deflateSetHeader() function, introduced in zlib v1.2.2.1 (2011). This also works, but the change was deemed too intrusive [2]. 📜🤖 Added by blurb_it. [1] madler/zlib#410 [2] python#103478
8732a79
to
3be4530
Compare
Rebased. |
RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.
This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.
The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.
Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.
An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].
📜🤖 Added by blurb_it.
[1] madler/zlib#410
[2] #103478