Skip to content

Commit 95c9c2b

Browse files
gh-93065: Fix HAMT to iterate correctly over 7-level deep trees (GH-93066) (#93147)
Also while there, clarify a few things about why we reduce the hash to 32 bits. Co-authored-by: Eli Libman <[email protected]> Co-authored-by: Yury Selivanov <[email protected]> Co-authored-by: Łukasz Langa <[email protected]> (cherry picked from commit c1f5c90)
1 parent a43f4e7 commit 95c9c2b

File tree

5 files changed

+65
-4
lines changed

5 files changed

+65
-4
lines changed

Include/internal/pycore_hamt.h

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,19 @@
55
# error "this header requires Py_BUILD_CORE define"
66
#endif
77

8-
#define _Py_HAMT_MAX_TREE_DEPTH 7
8+
9+
/*
10+
HAMT tree is shaped by hashes of keys. Every group of 5 bits of a hash denotes
11+
the exact position of the key in one level of the tree. Since we're using
12+
32 bit hashes, we can have at most 7 such levels. Although if there are
13+
two distinct keys with equal hashes, they will have to occupy the same
14+
cell in the 7th level of the tree -- so we'd put them in a "collision" node.
15+
Which brings the total possible tree depth to 8. Read more about the actual
16+
layout of the HAMT tree in `hamt.c`.
17+
18+
This constant is used to define a datastucture for storing iteration state.
19+
*/
20+
#define _Py_HAMT_MAX_TREE_DEPTH 8
921

1022

1123
#define PyHamt_Check(o) Py_IS_TYPE(o, &_PyHamt_Type)

Lib/test/test_context.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -533,6 +533,41 @@ def test_hamt_collision_1(self):
533533
self.assertEqual(len(h4), 2)
534534
self.assertEqual(len(h5), 3)
535535

536+
def test_hamt_collision_3(self):
537+
# Test that iteration works with the deepest tree possible.
538+
# https://github.com/python/cpython/issues/93065
539+
540+
C = HashKey(0b10000000_00000000_00000000_00000000, 'C')
541+
D = HashKey(0b10000000_00000000_00000000_00000000, 'D')
542+
543+
E = HashKey(0b00000000_00000000_00000000_00000000, 'E')
544+
545+
h = hamt()
546+
h = h.set(C, 'C')
547+
h = h.set(D, 'D')
548+
h = h.set(E, 'E')
549+
550+
# BitmapNode(size=2 count=1 bitmap=0b1):
551+
# NULL:
552+
# BitmapNode(size=2 count=1 bitmap=0b1):
553+
# NULL:
554+
# BitmapNode(size=2 count=1 bitmap=0b1):
555+
# NULL:
556+
# BitmapNode(size=2 count=1 bitmap=0b1):
557+
# NULL:
558+
# BitmapNode(size=2 count=1 bitmap=0b1):
559+
# NULL:
560+
# BitmapNode(size=2 count=1 bitmap=0b1):
561+
# NULL:
562+
# BitmapNode(size=4 count=2 bitmap=0b101):
563+
# <Key name:E hash:0>: 'E'
564+
# NULL:
565+
# CollisionNode(size=4 id=0x107a24520):
566+
# <Key name:C hash:2147483648>: 'C'
567+
# <Key name:D hash:2147483648>: 'D'
568+
569+
self.assertEqual({k.name for k in h.keys()}, {'C', 'D', 'E'})
570+
536571
def test_hamt_stress(self):
537572
COLLECTION_SIZE = 7000
538573
TEST_ITERS_EVERY = 647

Misc/ACKS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1033,6 +1033,7 @@ Robert Li
10331033
Xuanji Li
10341034
Zekun Li
10351035
Zheao Li
1036+
Eli Libman
10361037
Dan Lidral-Porter
10371038
Robert van Liere
10381039
Ross Light
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Fix contextvars HAMT implementation to handle iteration over deep trees.
2+
3+
The bug was discovered and fixed by Eli Libman. See
4+
`MagicStack/immutables#84 <https://github.com/MagicStack/immutables/issues/84>`_
5+
for more details.

Python/hamt.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -407,14 +407,22 @@ hamt_hash(PyObject *o)
407407
return -1;
408408
}
409409

410-
/* While it's suboptimal to reduce Python's 64 bit hash to
410+
/* While it's somewhat suboptimal to reduce Python's 64 bit hash to
411411
32 bits via XOR, it seems that the resulting hash function
412412
is good enough (this is also how Long type is hashed in Java.)
413413
Storing 10, 100, 1000 Python strings results in a relatively
414414
shallow and uniform tree structure.
415415
416-
Please don't change this hashing algorithm, as there are many
417-
tests that test some exact tree shape to cover all code paths.
416+
Also it's worth noting that it would be possible to adapt the tree
417+
structure to 64 bit hashes, but that would increase memory pressure
418+
and provide little to no performance benefits for collections with
419+
fewer than billions of key/value pairs.
420+
421+
Important: do not change this hash reducing function. There are many
422+
tests that need an exact tree shape to cover all code paths and
423+
we do that by specifying concrete values for test data's `__hash__`.
424+
If this function is changed most of the regression tests would
425+
become useless.
418426
*/
419427
int32_t xored = (int32_t)(hash & 0xffffffffl) ^ (int32_t)(hash >> 32);
420428
return xored == -1 ? -2 : xored;

0 commit comments

Comments
 (0)