Skip to content

[3.8] gh-93065: Fix HAMT to iterate correctly over 7-level deep trees (GH-93066) #93148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion Include/internal/pycore_hamt.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,19 @@
# error "this header requires Py_BUILD_CORE define"
#endif

#define _Py_HAMT_MAX_TREE_DEPTH 7

/*
HAMT tree is shaped by hashes of keys. Every group of 5 bits of a hash denotes
the exact position of the key in one level of the tree. Since we're using
32 bit hashes, we can have at most 7 such levels. Although if there are
two distinct keys with equal hashes, they will have to occupy the same
cell in the 7th level of the tree -- so we'd put them in a "collision" node.
Which brings the total possible tree depth to 8. Read more about the actual
layout of the HAMT tree in `hamt.c`.

This constant is used to define a datastucture for storing iteration state.
*/
#define _Py_HAMT_MAX_TREE_DEPTH 8


#define PyHamt_Check(o) (Py_TYPE(o) == &_PyHamt_Type)
Expand Down
35 changes: 35 additions & 0 deletions Lib/test/test_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,41 @@ def test_hamt_collision_1(self):
self.assertEqual(len(h4), 2)
self.assertEqual(len(h5), 3)

def test_hamt_collision_3(self):
# Test that iteration works with the deepest tree possible.
# https://github.com/python/cpython/issues/93065

C = HashKey(0b10000000_00000000_00000000_00000000, 'C')
D = HashKey(0b10000000_00000000_00000000_00000000, 'D')

E = HashKey(0b00000000_00000000_00000000_00000000, 'E')

h = hamt()
h = h.set(C, 'C')
h = h.set(D, 'D')
h = h.set(E, 'E')

# BitmapNode(size=2 count=1 bitmap=0b1):
# NULL:
# BitmapNode(size=2 count=1 bitmap=0b1):
# NULL:
# BitmapNode(size=2 count=1 bitmap=0b1):
# NULL:
# BitmapNode(size=2 count=1 bitmap=0b1):
# NULL:
# BitmapNode(size=2 count=1 bitmap=0b1):
# NULL:
# BitmapNode(size=2 count=1 bitmap=0b1):
# NULL:
# BitmapNode(size=4 count=2 bitmap=0b101):
# <Key name:E hash:0>: 'E'
# NULL:
# CollisionNode(size=4 id=0x107a24520):
# <Key name:C hash:2147483648>: 'C'
# <Key name:D hash:2147483648>: 'D'

self.assertEqual({k.name for k in h.keys()}, {'C', 'D', 'E'})

def test_hamt_stress(self):
COLLECTION_SIZE = 7000
TEST_ITERS_EVERY = 647
Expand Down
2 changes: 2 additions & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -1002,6 +1002,8 @@ Robert Li
Xuanji Li
Zekun Li
Zheao Li
Eli Libman
Dan Lidral-Porter
Robert van Liere
Ross Light
Shawn Ligocki
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Fix contextvars HAMT implementation to handle iteration over deep trees.

The bug was discovered and fixed by Eli Libman. See
`MagicStack/immutables#84 <https://github.com/MagicStack/immutables/issues/84>`_
for more details.
14 changes: 11 additions & 3 deletions Python/hamt.c
Original file line number Diff line number Diff line change
Expand Up @@ -408,14 +408,22 @@ hamt_hash(PyObject *o)
return -1;
}

/* While it's suboptimal to reduce Python's 64 bit hash to
/* While it's somewhat suboptimal to reduce Python's 64 bit hash to
32 bits via XOR, it seems that the resulting hash function
is good enough (this is also how Long type is hashed in Java.)
Storing 10, 100, 1000 Python strings results in a relatively
shallow and uniform tree structure.

Please don't change this hashing algorithm, as there are many
tests that test some exact tree shape to cover all code paths.
Also it's worth noting that it would be possible to adapt the tree
structure to 64 bit hashes, but that would increase memory pressure
and provide little to no performance benefits for collections with
fewer than billions of key/value pairs.

Important: do not change this hash reducing function. There are many
tests that need an exact tree shape to cover all code paths and
we do that by specifying concrete values for test data's `__hash__`.
If this function is changed most of the regression tests would
become useless.
*/
int32_t xored = (int32_t)(hash & 0xffffffffl) ^ (int32_t)(hash >> 32);
return xored == -1 ? -2 : xored;
Expand Down