fix(search): Fix numeric tree splits #6143
Open
+174
−43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Intro:
Only Stephan worked on this code with me as the reviewer, so a quick recap:
Range tree is like a binary search tree that stores
double lower_bound -> sorted_vector <pair<double, DocId>>nodes, so we use lower_bound to determine for a new value into which sorted vector to insert. When those vectors (blocks) become too large, we split them. See header comment for more detailsCause:
Currently, if a block grows beyond a defined max size, we split it by the median value into a left and right part. Documents with the same value have to be inside one block - so the split can be uneven. If we add the same value to the tree over and over, we will get a single block that is split over and over - yet the split operation doesn't make it smaller at all, everythign ends up in the same block. It creates a doom loop where on each insertion of the same value the block is "split" over and over without results, making it a hugely expensive operation.
Solution:
Relevant to #6120