Skip to content

gh-103477: Write gzip trailer with zlib #112199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iii-i
Copy link
Contributor

@iii-i iii-i commented Nov 17, 2023

RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] #103478

@iii-i
Copy link
Contributor Author

iii-i commented Jan 29, 2024

Rebased.

@gpshead @rhpvorderman Could you please have a look?

@rhpvorderman
Copy link
Contributor

I am not a core developer so I cannot approve anything. It looks good to me though. A quite non-invasive change with massive performance benefits. Is it really 40%? I thought calculating the crc is only something like 10% of the total cost, but I may be wrong here.

@iii-i
Copy link
Contributor Author

iii-i commented Jan 30, 2024

Thanks for having a look. For the pure software implementation I would also put it at 10%. But on s390x we have hardware-accelerated deflate (with a compression speedup factor of up to 100x, you can see the performance figures here: https://github.com/iii-i/zlib-ng/wiki/Performance-with-dfltcc-patch-applied-and-dfltcc-support-built-on-dfltcc-enabled-machine), so computing CRC in software stands out more.

RHEL, SLES and Ubuntu for IBM zSystems (aka s390x) ship with a zlib
optimization [1] that significantly improves deflate performance by
using a specialized CPU instruction.

This instruction not only compresses the data, but also computes a
checksum. At the moment Pyhton's gzip support performs compression and
checksum calculation separately, which creates unnecessary overhead.
The reason is that Python needs to write specific values into gzip
header, so it uses a raw stream instead of a gzip stream, and zlib
does not compute a checksum for raw streams.

The challenge with using gzip streams instead of zlib streams is
dealing with zlib-generated gzip header, which we need to rather
generate manually. Implement the method proposed by @rhpvorderman: use
Z_BLOCK on the first deflate() call in order to stop before the first
deflate block is emitted. The data that is emitted up until this point
is zlib-generated gzip header, which should be discarded.

Expose this new functionality by adding a boolean gzip_trailer argument
to zlib.compress() and zlib.compressobj(). Make use of it in
gzip.compress(), GzipFile and TarFile. The performance improvement
varies depending on data being compressed, but it's in the ballpark of
40%.

An alternative approach is to use the deflateSetHeader() function,
introduced in zlib v1.2.2.1 (2011). This also works, but the change
was deemed too intrusive [2].

📜🤖 Added by blurb_it.

[1] madler/zlib#410
[2] python#103478
@iii-i
Copy link
Contributor Author

iii-i commented Feb 28, 2024

Rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants