-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
CoW: Use exponential backoff when clearing dead references #55518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
79c663f
a2f81a4
4423d19
d4c159b
da639c8
4d8c8fb
dd202a6
4664c52
4227598
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -890,17 +890,29 @@ cdef class BlockValuesRefs: | |
""" | ||
cdef: | ||
public list referenced_blocks | ||
public int clear_counter | ||
|
||
def __cinit__(self, blk: Block | None = None) -> None: | ||
if blk is not None: | ||
self.referenced_blocks = [weakref.ref(blk)] | ||
else: | ||
self.referenced_blocks = [] | ||
|
||
def _clear_dead_references(self) -> None: | ||
self.referenced_blocks = [ | ||
ref for ref in self.referenced_blocks if ref() is not None | ||
] | ||
self.clear_counter = 500 # set reasonably high | ||
|
||
def _clear_dead_references(self, force=False) -> None: | ||
# Use exponential backoff to decide when we want to clear references | ||
# if force=False. Clearing for every insertion causes slowdowns if | ||
# all these objects stay alive, e.g. df.items() for wide DataFrames | ||
# see GH#55245 and GH#55008 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have a reference issue to eventually change this to a WeakSet-like implementation? IIRC I saw that discussed somewhere There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll open an issue about that as a follow up when this is in |
||
if force or len(self.referenced_blocks) > self.clear_counter: | ||
self.referenced_blocks = [ | ||
ref for ref in self.referenced_blocks if ref() is not None | ||
] | ||
nr_of_refs = len(self.referenced_blocks) | ||
if nr_of_refs < self.clear_counter // 2: | ||
self.clear_counter = max(self.clear_counter // 2, 500) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest a shrink factor of 4 or more. If it's the same as the growth factor it can create a few corner cases that will still have O(n^2). e.g. length going back and forth between (500*2^n)-1 and (500*2^n)+1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Couldn't this happen as well for a shrink factor of 4? And this would only happen If we have this interleaved with inlace modifications, e.g if force=True, correct? Merging for now, but happy to follow up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a factor of 4 you would need to change the length to the extremes of the range [500*2^(n-1); 500*2^n], which is at least 500 (and more for a larger n), this is much better than triggering the slow operation on just adding and removing 3 references. |
||
elif nr_of_refs > self.clear_counter: | ||
self.clear_counter = max(self.clear_counter * 2, nr_of_refs) | ||
|
||
def add_reference(self, blk: Block) -> None: | ||
"""Adds a new reference to our reference collection. | ||
|
@@ -934,6 +946,6 @@ cdef class BlockValuesRefs: | |
------- | ||
bool | ||
""" | ||
self._clear_dead_references() | ||
self._clear_dead_references(force=True) | ||
# Checking for more references than block pointing to itself | ||
return len(self.referenced_blocks) > 1 |
Uh oh!
There was an error while loading. Please reload this page.