Bitv improvements #7703

sfackler · 2013-07-11T02:52:07Z

Switched Bitv and BitvSet to external iterators. They still use some internal iterators internally (ha).

Derived clone for all Bitv types.

Removed indirection in BitvVariant. It previously held a unique pointer to the appropriate Bitv struct, even though those structs are the size of a pointer themselves. BitvVariant is the same size (16 bytes) as it was previously.

emberian · 2013-07-11T12:58:07Z

Can you add tests for the iterator(s)?

alexcrichton · 2013-07-12T07:44:34Z

This should also probably remove the ones iterator from Bitv, and out of curiosity could you also see what happens to the benchmarks for this change? There currently aren't any benchmarks for iteration, but they shouldn't be too bad to just iterate over small/large bit vectors.

sfackler · 2013-07-14T20:42:30Z

I added some iterator benchmarks. It looks like these iterators are significantly slower than the old ones: https://gist.github.com/sfackler/058274387cf0bc90883d. I don't know if it matters, but if it does I can probably speed them up by opening some boxes. The crate as a whole needs a lot of work two switch over from internal to external iterators.

alexcrichton · 2013-07-14T21:27:05Z

If these were a bit slower, I'd be fine with that, but being 3-5x worse seems like a bad thing to me. Normally bit vectors are are specifically to be fast, so if iteration is super slow it seems to defeat the purpose... Others may have different opinions though. If there's an obvious speedup in the future once external iterators are "more solid" then that's fine, but I'm not sure how this would get much better.

I'm kinda of the opinion that we shouldn't entirely switch from internal to external iterators. In some cases it's really nice and convenient, but it's not 100% necessary all the time.

thestinger · 2013-07-14T21:35:36Z

@sfackler: which optimization level are you using?

@alexcrichton: They're not slower because they're external iterators. C++ uses external iterators exclusively, and LLVM is fully capable of making them zero-cost in all cases. I'm sure there's something else going on here, because there are no such performance losses when comparing the external iterators/adaptors on vectors to a counter with unchecked indexing.

sfackler · 2013-07-14T22:02:17Z

The benchmarks were compiled with -O. I'm traveling now, but I can take a deeper look at what's going on with the iterators tomorrow.

sfackler · 2013-07-17T05:18:26Z

So I've done some digging and it looks like the performance problems may actually be an issue with the benchmark framework rather than any of the bitv code.

I wrote up a quick test: https://gist.github.com/sfackler/0efb2f23738898b9b938. I commented out the iter benchmarks and compiled and ran it under master without any of the bitv changes with these results:

test bench_bitv_big_each ... bench: 828 ns/iter (+/- 665)
test bench_bitv_small_each ... bench: 33 ns/iter (+/- 14)

I then cherry-picked 128bdad onto master and un-deleted each from Bitv. I compiled the test with the iter benchmarks and ran it with these results:

test bench_bitv_big_each ... bench: 4008 ns/iter (+/- 2122)
test bench_bitv_small_each ... bench: 248 ns/iter (+/- 104)
test bench_btv_big_iter ... bench: 4015 ns/iter (+/- 2117)
test bench_btv_small_iter ... bench: 248 ns/iter (+/- 89)

Keep in mind that all 128bdad did was add an iter function to Bitv and BitvSet and corresponding iterator types. I checked the disassembly of the two benchmarks and it turns out that bench_bitv_small_{each,iter} compiled down to the exact same instruction sequence: https://gist.github.com/sfackler/8cce21f68c222c4dcb37.

Here's the runtimes with the other changes (deriving Clone and removing indirection):

test bench_bitv_big_each ... bench: 3830 ns/iter (+/- 4244)
test bench_bitv_small_each ... bench: 235 ns/iter (+/- 256)
test bench_btv_big_iter ... bench: 3757 ns/iter (+/- 1722)
test bench_btv_small_iter ... bench: 248 ns/iter (+/- 150)

The disassembly is more or less the same with one less call to free: https://gist.github.com/sfackler/6cf1c8d5fd7fc0cf53cb

I have a change that removes ones that's compiling right now. I'll push up in a bit.

alexcrichton · 2013-07-17T05:26:42Z

Oh wow, that's nice to hear! Although I'm a bit dubious because all of those +/- values (of the very last set of results) are at least half of the actual value. Those look like they're really noisy benchmarks, but if it's the same before/after then I'm not complaining.

alexcrichton · 2013-07-17T05:28:04Z

Although am I reading this correctly in that there is either a regression somewhere in master or that when you added iter it "slowed down" each too?

sfackler · 2013-07-17T05:29:47Z

I think it has to be an issue with the benchmark backend. The version of bench_bitv_small_each compiled on master is instruction-for-instruction identical to the version compiled on 128bdad.

alexcrichton · 2013-07-17T05:31:54Z

Ah, sounds good to me then.

graydon · 2013-07-17T05:55:49Z

Stack crossing? Which platform? Can you pin to a cpu with taskset? The bench backend is definitely still a work in progress. Any further details appreciated. There are queued fixes for it incoming soon.

sfackler · 2013-07-17T06:40:49Z

Stack crossing?

I'm runnning 64 bit Linux through Virtual Box off of a Windows 7 host. IIRC, I was also getting the same benchmark results on OSX as well. I get the same results pinning the tests to a core with taskset.

pnkfelix · 2013-07-17T10:27:21Z

@sfackler In the past, Graydon has used the term "stack crossing" to refer to the overhead if you happen to get unlucky and inject a change that causes us to need to allocate more stack chunks than usual. I think its synonymous with "stack thrashing"

Note that in the IRC log linked above, the suggestion was to play with the setting for RUST_MIN_STACK to see if it makes such performance problems go away; then you'd have some evidence that stack thrashing is to blame.

sfackler · 2013-07-17T16:05:59Z

Doesn't look like that was it:

~ > env RUST_MIN_STACK=100000000 ./test --bench

running 4 tests
test bench_bitv_big_each ... bench: 3731 ns/iter (+/- 1458)
test bench_bitv_small_each ... bench: 229 ns/iter (+/- 126)
test bench_btv_big_iter ... bench: 3754 ns/iter (+/- 1850)
test bench_btv_small_iter ... bench: 248 ns/iter (+/- 135)

alexcrichton · 2013-07-18T03:14:38Z

src/libextra/bitv.rs

+
+    fn size_hint(&self) -> (uint, Option<uint>) {
+        let rem = self.bitv.nbits - self.next_idx;
+        (rem, Some(rem))


Shouldn't the lower bound be 0 because all of the bits could be off?

@alexcrichton it looks like this iterates over all the bits yielding false for 0 and true for 1, so it always traverses rem items.

Ah, yes, I thought this was the ones iterator.

sfackler · 2013-07-20T20:44:07Z

I reverted the once removal.

huonw · 2013-07-21T07:09:43Z

@sfackler needs a rebase.

BitvVariant is the same size as it was before (16 bytes).

sfackler · 2013-07-21T07:36:27Z

@huonw Done

sfackler · 2013-07-23T01:06:00Z

@alexcrichton r?

alexcrichton · 2013-07-23T02:04:06Z

I think in the future we may want to transition ones to an external iterator as well, but that doesn't have to happen right now. This is accumulating enough as is.

Switched Bitv and BitvSet to external iterators. They still use some internal iterators internally (ha). Derived clone for all Bitv types. Removed indirection in BitvVariant. It previously held a unique pointer to the appropriate Bitv struct, even though those structs are the size of a pointer themselves. BitvVariant is the same size (16 bytes) as it was previously.

Implement `non_send_field_in_send_ty` lint changelog: Implement [`non_send_fields_in_send_ty`] lint Fixes rust-lang#7703

thestinger closed this Jul 14, 2013

thestinger reopened this Jul 14, 2013

alexcrichton reviewed Jul 18, 2013
View reviewed changes

sfackler added 5 commits July 21, 2013 03:22

Switched bitv to external iterators

5f208b8

Derive Clone for bitv stuff

b32a02c

Removed unecessary indirection in Bitv

d6923dd

BitvVariant is the same size as it was before (16 bytes).

Added iterator tests

0b4d8d6

Added bitv iterator benchmarks

fd757a8

bors closed this Jul 23, 2013

flip1995 pushed a commit to flip1995/rust that referenced this pull request Oct 7, 2021

Auto merge of rust-lang#7709 - Qwaz:drop_non_send, r=xFrednet

33c34fb

Implement `non_send_field_in_send_ty` lint changelog: Implement [`non_send_fields_in_send_ty`] lint Fixes rust-lang#7703

Bitv improvements #7703

Bitv improvements #7703

Uh oh!

Conversation

sfackler commented Jul 11, 2013

Uh oh!

emberian commented Jul 11, 2013

Uh oh!

alexcrichton commented Jul 12, 2013

Uh oh!

sfackler commented Jul 14, 2013

Uh oh!

alexcrichton commented Jul 14, 2013

Uh oh!

thestinger commented Jul 14, 2013

Uh oh!

sfackler commented Jul 14, 2013

Uh oh!

sfackler commented Jul 17, 2013

Uh oh!

alexcrichton commented Jul 17, 2013

Uh oh!

alexcrichton commented Jul 17, 2013

Uh oh!

sfackler commented Jul 17, 2013

Uh oh!

alexcrichton commented Jul 17, 2013

Uh oh!

graydon commented Jul 17, 2013

Uh oh!

sfackler commented Jul 17, 2013

Uh oh!

pnkfelix commented Jul 17, 2013

Uh oh!

sfackler commented Jul 17, 2013

Uh oh!

alexcrichton Jul 18, 2013

Choose a reason for hiding this comment

Uh oh!

huonw Jul 18, 2013

Choose a reason for hiding this comment

Uh oh!

alexcrichton Jul 18, 2013

Choose a reason for hiding this comment

Uh oh!

sfackler commented Jul 20, 2013

Uh oh!

huonw commented Jul 21, 2013

Uh oh!

sfackler commented Jul 21, 2013

Uh oh!

sfackler commented Jul 23, 2013

Uh oh!

alexcrichton commented Jul 23, 2013

Uh oh!

Uh oh!