-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Bitv improvements #7703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitv improvements #7703
Conversation
Can you add tests for the iterator(s)? |
This should also probably remove the |
I added some iterator benchmarks. It looks like these iterators are significantly slower than the old ones: https://gist.github.com/sfackler/058274387cf0bc90883d. I don't know if it matters, but if it does I can probably speed them up by opening some boxes. The crate as a whole needs a lot of work two switch over from internal to external iterators. |
If these were a bit slower, I'd be fine with that, but being 3-5x worse seems like a bad thing to me. Normally bit vectors are are specifically to be fast, so if iteration is super slow it seems to defeat the purpose... Others may have different opinions though. If there's an obvious speedup in the future once external iterators are "more solid" then that's fine, but I'm not sure how this would get much better. I'm kinda of the opinion that we shouldn't entirely switch from internal to external iterators. In some cases it's really nice and convenient, but it's not 100% necessary all the time. |
@sfackler: which optimization level are you using? @alexcrichton: They're not slower because they're external iterators. C++ uses external iterators exclusively, and LLVM is fully capable of making them zero-cost in all cases. I'm sure there's something else going on here, because there are no such performance losses when comparing the external iterators/adaptors on vectors to a counter with unchecked indexing. |
The benchmarks were compiled with -O. I'm traveling now, but I can take a deeper look at what's going on with the iterators tomorrow. |
So I've done some digging and it looks like the performance problems may actually be an issue with the benchmark framework rather than any of the bitv code. I wrote up a quick test: https://gist.github.com/sfackler/0efb2f23738898b9b938. I commented out the iter benchmarks and compiled and ran it under
I then cherry-picked 128bdad onto
Keep in mind that all 128bdad did was add an Here's the runtimes with the other changes (deriving Clone and removing indirection):
The disassembly is more or less the same with one less call to free: https://gist.github.com/sfackler/6cf1c8d5fd7fc0cf53cb I have a change that removes |
Oh wow, that's nice to hear! Although I'm a bit dubious because all of those +/- values (of the very last set of results) are at least half of the actual value. Those look like they're really noisy benchmarks, but if it's the same before/after then I'm not complaining. |
Although am I reading this correctly in that there is either a regression somewhere in master or that when you added |
I think it has to be an issue with the benchmark backend. The version of |
Ah, sounds good to me then. |
Stack crossing? Which platform? Can you pin to a cpu with taskset? The bench backend is definitely still a work in progress. Any further details appreciated. There are queued fixes for it incoming soon. |
Stack crossing? I'm runnning 64 bit Linux through Virtual Box off of a Windows 7 host. IIRC, I was also getting the same benchmark results on OSX as well. I get the same results pinning the tests to a core with taskset. |
@sfackler In the past, Graydon has used the term "stack crossing" to refer to the overhead if you happen to get unlucky and inject a change that causes us to need to allocate more stack chunks than usual. I think its synonymous with "stack thrashing" Note that in the IRC log linked above, the suggestion was to play with the setting for |
Doesn't look like that was it:
|
|
||
fn size_hint(&self) -> (uint, Option<uint>) { | ||
let rem = self.bitv.nbits - self.next_idx; | ||
(rem, Some(rem)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the lower bound be 0 because all of the bits could be off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton it looks like this iterates over all the bits yielding false
for 0 and true
for 1, so it always traverses rem
items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I thought this was the ones
iterator.
I reverted the |
@sfackler needs a rebase. |
BitvVariant is the same size as it was before (16 bytes).
@huonw Done |
I think in the future we may want to transition |
Switched Bitv and BitvSet to external iterators. They still use some internal iterators internally (ha). Derived clone for all Bitv types. Removed indirection in BitvVariant. It previously held a unique pointer to the appropriate Bitv struct, even though those structs are the size of a pointer themselves. BitvVariant is the same size (16 bytes) as it was previously.
Implement `non_send_field_in_send_ty` lint changelog: Implement [`non_send_fields_in_send_ty`] lint Fixes rust-lang#7703
Switched Bitv and BitvSet to external iterators. They still use some internal iterators internally (ha).
Derived clone for all Bitv types.
Removed indirection in BitvVariant. It previously held a unique pointer to the appropriate Bitv struct, even though those structs are the size of a pointer themselves. BitvVariant is the same size (16 bytes) as it was previously.