-
Notifications
You must be signed in to change notification settings - Fork 320
Improve kmerge's perf #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A hand rolled heap implementation allows us to improve kmerge. This follows Frank McSherry's implementation (differential-dataflow project). Use a sift down that places elements eagerly (alternate strategy would be to sift down all the way to the bottom, then sift back up). Keep the iterator in place, and sift it down from the top. This assumes the current minimal iterator is unlikely to move very far to adjust for its new element. This should perform well if there is a run of least elements from the same iterator. Benchmark Large wins on the small (2 iterators) bench because we can avoid lots of swaps. The "tenway" bench uses random elements, so iterators should alternate a lot, so it's somewhat of a worst case. ``` before test kmerge_default ... bench: 123,363 ns/iter (+/- 15,518) test kmerge_tenway ... bench: 908,695 ns/iter (+/- 270,372) after test kmerge_default ... bench: 13,738 ns/iter (+/- 3,318) test kmerge_tenway ... bench: 380,020 ns/iter (+/- 9,921) ```
cc @bsteinb Maybe I'm stealing your fun here, by implementing some improvements — apologies! Thanks to @frankmcsherry for gracefully putting his code online so that I could read & copy it directly 😄 The custom heap also makes us ready for a "kmerge_by" adaptor (user-supplied ordering closure). |
Related to #98 (but there's still avenues to explore there). |
Steal away! 👍 My fun is not going to stand in the way of progress. Very impressive increase in performance for the two way merge. That seems to go well beyond what I achieved with the modified |
Stopping on equality skips even more redundant swaps. It's visible in benchmarks actually. after: test kmerge_default ... bench: 9,123 ns/iter (+/- 235) test kmerge_tenway ... bench: 368,562 ns/iter (+/- 5,517)
It's hard to say for sure, just inlining itself shouldn't give anything, but this change to not move the iterator (unless we reorder the heap) is the main thing, and it's part of that. |
I had an idea for a struct similar to With specializable associated types, we could avoid storing |
return None; | ||
} | ||
let result = if let Some(next_0) = self.heap[0].tail.next() { | ||
replace(&mut self.heap[0].head, next_0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go to HeadTail::next
, ideally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice, done.
@pczarn Hm I can like this name and you can like yours 😄. It's a internal detail so it doesn't matter much. |
Thanks for the input on this! |
Use a simple heap / sift down implementation to optimize kmerge
A hand rolled heap implementation allows us to improve kmerge. This
follows Frank McSherry's implementation (differential-dataflow project).
Use a sift down that places elements eagerly (alternate strategy would
be to sift down all the way to the bottom, then sift back up).
Keep the iterator in place, and sift it down from the top. This assumes
the current minimal iterator is unlikely to move very far to adjust for
its new element. This should perform well if there is a run of least
elements from the same iterator.
Benchmark
Large wins on the small (2 iterators) bench because we can avoid lots of
swaps. The "tenway" bench uses random elements, so iterators should
alternate a lot, so it's somewhat of a worst case.