Skip to content

Conversation

methane
Copy link
Member

@methane methane commented May 10, 2025

@methane methane requested a review from markshannon as a code owner May 10, 2025 08:22
@methane methane added type-bug An unexpected behavior, bug, or error needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels May 10, 2025
@methane methane force-pushed the fix-dict-keysize branch from a573d9e to df1e204 Compare May 10, 2025 14:00
@angela-tarantula
Copy link
Contributor

angela-tarantula commented May 10, 2025

Good PR. Since this code is a hotspot where every instruction matters, I found a way to shave off a few more instructions. This doesn't have to block the PR, but it's something we could consider in a follow-up optimization.

_Py_bit_length() doesn't need to check if (x != 0) in this context.

Look at the GCC/Clang path:

_Py_bit_length(unsigned long x)
{
#if (defined(__clang__) || defined(__GNUC__))
if (x != 0) {
// __builtin_clzl() is available since GCC 3.4.
// Undefined behavior for x == 0.
return (int)sizeof(unsigned long) * 8 - __builtin_clzl(x);
}
else {
return 0;
}
#elif defined(_MSC_VER)

We don't need to check if (x != 0) because x == Py_MAX(minsize, PyDict_MINSIZE) - 1 is guaranteed nonzero, so we end up paying a compare+branch there for no reason. Likewise, we don't need to check if (_BitScanReverse(&msb, x)) in the MSVC branch:

#elif defined(_MSC_VER)
// _BitScanReverse() is documented to search 32 bits.
Py_BUILD_ASSERT(sizeof(unsigned long) <= 4);
unsigned long msb;
if (_BitScanReverse(&msb, x)) {
return (int)msb + 1;
}
else {
return 0;
}
#else

If the tradeoff is worth it, we can create a _Py_bit_length_nonzero() that is optimized for the guarantee that x > 0. With your go-ahead, I could file an issue for that. I only bring this up because you suggested earlier that Py_MAX()'s conditional logic almost made it inferior to bitwise operations simply due to the overhead of extra instructions on arm64 and amd64.

@methane
Copy link
Member Author

methane commented May 11, 2025

_Py_bit_length is inline function. compiler optimize away unnecessary 0 comparison.

@methane
Copy link
Member Author

methane commented May 11, 2025

I had suggested avoiding Py_MAX because it can not be eliminated statically.

@methane methane merged commit 92337f6 into python:main May 11, 2025
39 checks passed
@methane methane deleted the fix-dict-keysize branch May 11, 2025 05:44
@miss-islington-app
Copy link

Thanks @methane for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 11, 2025
@miss-islington-app
Copy link

Sorry, @methane, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 92337f666e8a076a68305a8d6dc8bc9c095000e9 3.13

@bedevere-app
Copy link

bedevere-app bot commented May 11, 2025

GH-133861 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label May 11, 2025
@bedevere-app
Copy link

bedevere-app bot commented May 11, 2025

GH-133862 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label May 11, 2025
methane added a commit that referenced this pull request May 11, 2025
methane added a commit that referenced this pull request May 11, 2025
(cherry picked from commit 92337f6)
Co-authored-by: Inada Naoki <[email protected]>
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot PPC64LE RHEL8 3.14 (tier-2) has failed when building commit 5c9f0ae.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1688/builds/8) and take a look at the build logs.
  4. Check if the failure is related to this commit (5c9f0ae) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1688/builds/8

Failed tests:

  • test.test_multiprocessing_spawn.test_processes

Failed subtests:

  • test_interrupt - test.test_multiprocessing_spawn.test_processes.WithProcessesTestProcess.test_interrupt

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/test/_test_multiprocessing.py", line 578, in test_interrupt
    self.assertEqual(exitcode, 1)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
AssertionError: -2 != 1


Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=9)
                                                  ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/test/test_multiprocessing_spawn/test_processes.py", line 2, in <module>
    from test._test_multiprocessing import install_tests_in_module_dict
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/test/_test_multiprocessing.py", line 2766, in <module>
    class SayWhenError(ValueError): pass
KeyboardInterrupt
FAIL


Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=8)
                                                  ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/multiprocessing/spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/buildbot/buildarea/3.14.cstratak-RHEL8-ppc64le/build/Lib/test/test_multiprocessing_spawn/test_processes.py", line 2, in <module>
    from test._test_multiprocessing import install_tests_in_module_dict
  File "<frozen importlib._bootstrap>", line 1371, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1342, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 938, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 758, in exec_module
  File "<frozen importlib._bootstrap_external>", line 891, in get_code
  File "<frozen importlib._bootstrap_external>", line 514, in _compile_bytecode
KeyboardInterrupt
FAIL

@angela-tarantula
Copy link
Contributor

I had suggested avoiding Py_MAX because it can not be eliminated statically.

@methane Sorry for misunderstanding, I just couldn't find the comment anymore where you said that. I think it's gone.

_Py_bit_length is inline function. compiler optimize away unnecessary 0 comparison.

You're right about most modern compilers, but not all of them. I tested the no-zero-check concept on MSVC x86, a Tier 1 platform, and it does actually shave a few assembly instructions.

Notice the conditional je is present after the bsr:

_calculate_log2_keysize PROC
        mov     eax, 8
        cmp     DWORD PTR _minsize$[esp-4], eax
        cmovg   eax, DWORD PTR _minsize$[esp-4]
        dec     eax
        bsr     eax, eax
        je      SHORT $LN4@calculate_
        inc     eax
        ret     0
$LN4@calculate_:
        xor     al, al
        ret     0
_calculate_log2_keysize ENDP

But not here:

_calculate_log2_keysize_nonzero PROC
        mov     ecx, 8
        cmp     DWORD PTR _minsize$[esp-4], ecx
        cmovg   ecx, DWORD PTR _minsize$[esp-4]
        dec     ecx
        bsr     eax, ecx
        inc     al
        ret     0
_calculate_log2_keysize_nonzero ENDP

You can see it for yourself here. Yeah, it's "only" a je + unused branch, but it's unexpectedly running on every dictresize on 32-bit Windows for no reason, so I thought I'd tell you.

@methane
Copy link
Member Author

methane commented May 12, 2025

You're right about most modern compilers, but not all of them. I tested the no-zero-check concept on MSVC x86, a Tier 1 platform, and it does actually shave a few assembly instructions.

We don't use _Py_bit_length on MSVC. We have nonzero specialization for it already.

@methane
Copy link
Member Author

methane commented May 12, 2025

You're right about most modern compilers, but not all of them. I tested the no-zero-check concept on MSVC x86, a Tier 1 platform, and it does actually shave a few assembly instructions.

We don't use _Py_bit_length on MSVC. We have nonzero specialization for it already.

Sorry, I was wrong. We use _Py_bit_length on x86-32.
But I don't feel enough attraction of it. Tier 1 doesn't mean 100% effort for optimization and dictresize is not hot as dict lookup.

@angela-tarantula
Copy link
Contributor

That's understandable.

Pranjal095 pushed a commit to Pranjal095/cpython that referenced this pull request Jul 12, 2025
taegyunkim pushed a commit to taegyunkim/cpython that referenced this pull request Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants