Skip to content

Improve kmerge's perf #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 23, 2016
Merged

Improve kmerge's perf #101

merged 5 commits into from
Feb 23, 2016

Conversation

bluss
Copy link
Member

@bluss bluss commented Feb 23, 2016

Use a simple heap / sift down implementation to optimize kmerge

A hand rolled heap implementation allows us to improve kmerge. This
follows Frank McSherry's implementation (differential-dataflow project).

Use a sift down that places elements eagerly (alternate strategy would
be to sift down all the way to the bottom, then sift back up).

Keep the iterator in place, and sift it down from the top. This assumes
the current minimal iterator is unlikely to move very far to adjust for
its new element. This should perform well if there is a run of least
elements from the same iterator.

Benchmark

Large wins on the small (2 iterators) bench because we can avoid lots of
swaps. The "tenway" bench uses random elements, so iterators should
alternate a lot, so it's somewhat of a worst case.

before
test kmerge_default        ... bench:     123,363 ns/iter (+/- 15,518)
test kmerge_tenway         ... bench:     908,695 ns/iter (+/- 270,372)

after
test kmerge_default        ... bench:       9,123 ns/iter (+/- 235)
test kmerge_tenway         ... bench:     368,562 ns/iter (+/- 5,517)

A hand rolled heap implementation allows us to improve kmerge. This
follows Frank McSherry's implementation (differential-dataflow project).

Use a sift down that places elements eagerly (alternate strategy would
be to sift down all the way to the bottom, then sift back up).

Keep the iterator in place, and sift it down from the top. This assumes
the current minimal iterator is unlikely to move very far to adjust for
its new element. This should perform well if there is a run of least
elements from the same iterator.

Benchmark

Large wins on the small (2 iterators) bench because we can avoid lots of
swaps. The "tenway" bench uses random elements, so iterators should
alternate a lot, so it's somewhat of a worst case.

```
before

test kmerge_default        ... bench:     123,363 ns/iter (+/- 15,518)
test kmerge_tenway         ... bench:     908,695 ns/iter (+/- 270,372)

after

test kmerge_default        ... bench:      13,738 ns/iter (+/- 3,318)
test kmerge_tenway         ... bench:     380,020 ns/iter (+/- 9,921)
```
@bluss
Copy link
Member Author

bluss commented Feb 23, 2016

cc @bsteinb

Maybe I'm stealing your fun here, by implementing some improvements — apologies!

Thanks to @frankmcsherry for gracefully putting his code online so that I could read & copy it directly 😄

The custom heap also makes us ready for a "kmerge_by" adaptor (user-supplied ordering closure).

@bluss
Copy link
Member Author

bluss commented Feb 23, 2016

Related to #98 (but there's still avenues to explore there).

@bsteinb
Copy link
Contributor

bsteinb commented Feb 23, 2016

Steal away! 👍 My fun is not going to stand in the way of progress.

Very impressive increase in performance for the two way merge. That seems to go well beyond what I achieved with the modified BinaryHeap from std. Could you say how much you gained by inlining-by-hand HeadTails next() method?

Stopping on equality skips even more redundant swaps. It's visible in
benchmarks actually.

after:

test kmerge_default           ... bench:       9,123 ns/iter (+/- 235)
test kmerge_tenway            ... bench:     368,562 ns/iter (+/- 5,517)
@bluss
Copy link
Member Author

bluss commented Feb 23, 2016

It's hard to say for sure, just inlining itself shouldn't give anything, but this change to not move the iterator (unless we reorder the heap) is the main thing, and it's part of that.

@pczarn
Copy link

pczarn commented Feb 23, 2016

I had an idea for a struct similar to HeadTail which I called PrePeeked. Perhaps that's a better name.

With specializable associated types, we could avoid storing head for slice::Iter and access iter.as_slice()[0] instead.

return None;
}
let result = if let Some(next_0) = self.heap[0].tail.next() {
replace(&mut self.heap[0].head, next_0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go to HeadTail::next, ideally

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice, done.

@bluss
Copy link
Member Author

bluss commented Feb 23, 2016

@pczarn Hm I can like this name and you can like yours 😄. It's a internal detail so it doesn't matter much.

bluss added a commit that referenced this pull request Feb 23, 2016
@bluss bluss merged commit 51cc406 into master Feb 23, 2016
@bluss
Copy link
Member Author

bluss commented Feb 23, 2016

Thanks for the input on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants