Skip to content

Conversation

@JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Nov 17, 2025

Also generally enable SSE4.2 instructions when targeting x86-64. These have been supported by hardware since ~2010, so it seems fine to require them now.

This speeds up b64encode by up to 100% on Linux running on a recent AMD CPU.

Some fairly recent hardware doesn't support AVX2, so it's not enabled. We'd probably need to rely on hardware capability checking for AVX2 support, and we'd need compile different files with different architecture flags probably, and I didn't want to go there (at least not yet).

@JukkaL JukkaL merged commit 1b6ebb1 into master Nov 17, 2025
14 checks passed
@JukkaL JukkaL deleted the mypyc-base64-3 branch November 17, 2025 15:39
JukkaL pushed a commit that referenced this pull request Nov 28, 2025
…uild flags (#20253)

Fixes the current SSE4.2 requirement added in
1b6ebb1
/ #20244

This PR fully enables the existing x86-64 CPU detection and dispatch
code for SSSE3, SSE4.1, SSE4.2, AVX, and AVX2 in the base64 module.

To use the existing CPU dispatch from the [upstream base64
code](https://github.com/aklomp/base64), one needs to compile the
sources in each of the CPU specific codec directories with a specific
compiler flag; alas this is difficult to do with setuptools, but I found
a solution inspired by https://stackoverflow.com/a/68508804

Note that I did not enable the AVX512 path with this PR, as many intel
CPUs that support AVX512 can come with a performance hit if AVX512 is
sporadically used; the performance of the AVX512 (encoding) path need to
be evaluated in the context of how mypyc uses base64 in various
realistic scenarios. (There is no AVX512 accelerated decoding path in
the upstream base64 codebase, it falls back to the avx2 decoder).

If there are additional performance concerns, then I suggest
benchmarking with the openmp feature of base64 turned on, for multi-core
processing.
p-sawicki pushed a commit that referenced this pull request Nov 28, 2025
…uild flags (#20253)

Fixes the current SSE4.2 requirement added in
1b6ebb1
/ #20244

This PR fully enables the existing x86-64 CPU detection and dispatch
code for SSSE3, SSE4.1, SSE4.2, AVX, and AVX2 in the base64 module.

To use the existing CPU dispatch from the [upstream base64
code](https://github.com/aklomp/base64), one needs to compile the
sources in each of the CPU specific codec directories with a specific
compiler flag; alas this is difficult to do with setuptools, but I found
a solution inspired by https://stackoverflow.com/a/68508804

Note that I did not enable the AVX512 path with this PR, as many intel
CPUs that support AVX512 can come with a performance hit if AVX512 is
sporadically used; the performance of the AVX512 (encoding) path need to
be evaluated in the context of how mypyc uses base64 in various
realistic scenarios. (There is no AVX512 accelerated decoding path in
the upstream base64 codebase, it falls back to the avx2 decoder).

If there are additional performance concerns, then I suggest
benchmarking with the openmp feature of base64 turned on, for multi-core
processing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants