-
Notifications
You must be signed in to change notification settings - Fork 13.3k
std: Optimize Vec::from_iter #22200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: Optimize Vec::from_iter #22200
Conversation
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
|
@bors r+ 2543 rollup |
@huonw these are the raw results I get: #[bench]
fn libc1(b: &mut test::Bencher) {
b.iter(|| unsafe {
extern { fn memset(p: *mut u8, n: u8, amt: usize); }
let ptr = libc::malloc(1000);
assert!(!ptr.is_null());
memset(ptr as *mut u8, 0, 1000);
libc::free(ptr);
});
}
#[bench]
fn heap(b: &mut test::Bencher) {
b.iter(|| unsafe {
extern { fn memset(p: *mut u8, n: u8, amt: usize); }
let ptr = std::rt::heap::allocate(1000, 8);
assert!(!ptr.is_null());
memset(ptr as *mut u8, 0, 1000);
std::rt::heap::deallocate(ptr, 1000, 8);
});
}
|
⌛ Testing commit 25435ee with merge c838a13... |
💔 Test failed - auto-win-32-opt |
vector.set_len(len + 1); | ||
} | ||
} | ||
|
||
for element in iterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our description of the iterator protocol says that after returning None, an iterator can do whatever it wants (unfortunately). So on this line that caveat applies. I guess it's not important really what happens, but an iterator may for example be in spec and produce itself double when collected into a Vec this way.
On balance I think that if we believe in our description for the iterator protocol, we can't push it aside in this central location.
Suggested fix: Rewrite the first for loop to loop - match and return on first None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose inserting if vector.len() < vector.capacity() { return vector }
before this loop should protect against this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the None stuff can be waived if the iterator produces less elements than its lower bound. I guess that means the current code is fine. These soft requirements are confusing :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be a use of fuse()
:
diff --git a/src/libcollections/vec.rs b/src/libcollections/vec.rs
index 7fcf0dd..9fd52a1 100644
--- a/src/libcollections/vec.rs
+++ b/src/libcollections/vec.rs
@@ -1380,7 +1380,9 @@ impl<T> FromIterator<T> for Vec<T> {
fn from_iter<I:Iterator<Item=T>>(mut iterator: I) -> Vec<T> {
let (lower, _) = iterator.size_hint();
let mut vector = Vec::with_capacity(lower);
- for element in iterator.by_ref().take(vector.capacity()) {
+
+ let mut i = iterator.by_ref().fuse();
+ for element in i.by_ref().take(vector.capacity()) {
let len = vector.len();
unsafe {
ptr::write(vector.get_unchecked_mut(len), element);
@@ -1388,7 +1390,7 @@ impl<T> FromIterator<T> for Vec<T> {
}
}
- for element in iterator {
+ for element in i {
vector.push(element)
}
vector
In this case the numbers are also not affected:
test heap ... bench: 31 ns/iter (+/- 2)
test libc1 ... bench: 49 ns/iter (+/- 3)
test macro_repeat1 ... bench: 55 ns/iter (+/- 3)
test macro_repeat2 ... bench: 54 ns/iter (+/- 3)
test map_clone1 ... bench: 33 ns/iter (+/- 2)
test map_clone2 ... bench: 33 ns/iter (+/- 2)
test repeat1 ... bench: 286 ns/iter (+/- 17)
test repeat2 ... bench: 283 ns/iter (+/- 12)
(the repeat numbers are a bit flaky unfortunately, so that's not too surprising)
@bors: retry |
@bors: r- |
25435ee
to
985fc7d
Compare
This PR is an optimization of the `FromIterator` implementation of `Vec` Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70 Before: test macro_repeat1 ... bench: 57 ns/iter (+/- 1) test macro_repeat2 ... bench: 56 ns/iter (+/- 1) test map_clone1 ... bench: 828 ns/iter (+/- 13) test map_clone2 ... bench: 828 ns/iter (+/- 8) test repeat1 ... bench: 1104 ns/iter (+/- 10) test repeat2 ... bench: 1106 ns/iter (+/- 11) After: test macro_repeat1 ... bench: 75 ns/iter (+/- 21) test macro_repeat2 ... bench: 59 ns/iter (+/- 31) test map_clone1 ... bench: 34 ns/iter (+/- 22) test map_clone2 ... bench: 52 ns/iter (+/- 21) test repeat1 ... bench: 34 ns/iter (+/- 11) test repeat2 ... bench: 33 ns/iter (+/- 12) The idea behind this optimization is to avoid all bounds checks for space already allocated into the vector. This may involve running the iterator twice, but the first run of the iterator should be optimizable to a memcpy or memset if possible. The same treatment can in theory be applied to `Vec::extend` but the benchmarks for that currently get *worse* if the change is applied. This appears to be some LLVM optimizations going awry but it's seems prudent to land at least the `collect` portion beforehand.
This PR is an optimization of the `FromIterator` implementation of `Vec` Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70 Before: test macro_repeat1 ... bench: 57 ns/iter (+/- 1) test macro_repeat2 ... bench: 56 ns/iter (+/- 1) test map_clone1 ... bench: 828 ns/iter (+/- 13) test map_clone2 ... bench: 828 ns/iter (+/- 8) test repeat1 ... bench: 1104 ns/iter (+/- 10) test repeat2 ... bench: 1106 ns/iter (+/- 11) After: test macro_repeat1 ... bench: 75 ns/iter (+/- 21) test macro_repeat2 ... bench: 59 ns/iter (+/- 31) test map_clone1 ... bench: 34 ns/iter (+/- 22) test map_clone2 ... bench: 52 ns/iter (+/- 21) test repeat1 ... bench: 34 ns/iter (+/- 11) test repeat2 ... bench: 33 ns/iter (+/- 12) The idea behind this optimization is to avoid all bounds checks for space already allocated into the vector. This may involve running the iterator twice, but the first run of the iterator should be optimizable to a memcpy or memset if possible. The same treatment can in theory be applied to `Vec::extend` but the benchmarks for that currently get *worse* if the change is applied. This appears to be some LLVM optimizations going awry but it's seems prudent to land at least the `collect` portion beforehand.
This PR is an optimization of the `FromIterator` implementation of `Vec` Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70 Before: test macro_repeat1 ... bench: 57 ns/iter (+/- 1) test macro_repeat2 ... bench: 56 ns/iter (+/- 1) test map_clone1 ... bench: 828 ns/iter (+/- 13) test map_clone2 ... bench: 828 ns/iter (+/- 8) test repeat1 ... bench: 1104 ns/iter (+/- 10) test repeat2 ... bench: 1106 ns/iter (+/- 11) After: test macro_repeat1 ... bench: 75 ns/iter (+/- 21) test macro_repeat2 ... bench: 59 ns/iter (+/- 31) test map_clone1 ... bench: 34 ns/iter (+/- 22) test map_clone2 ... bench: 52 ns/iter (+/- 21) test repeat1 ... bench: 34 ns/iter (+/- 11) test repeat2 ... bench: 33 ns/iter (+/- 12) The idea behind this optimization is to avoid all bounds checks for space already allocated into the vector. This may involve running the iterator twice, but the first run of the iterator should be optimizable to a memcpy or memset if possible. The same treatment can in theory be applied to `Vec::extend` but the benchmarks for that currently get *worse* if the change is applied. This appears to be some LLVM optimizations going awry but it's seems prudent to land at least the `collect` portion beforehand.
@@ -1380,7 +1380,17 @@ impl<T> FromIterator<T> for Vec<T> { | |||
fn from_iter<I:Iterator<Item=T>>(iterator: I) -> Vec<T> { | |||
let (lower, _) = iterator.size_hint(); | |||
let mut vector = Vec::with_capacity(lower); | |||
for element in iterator { | |||
|
|||
let mut i = iterator.fuse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps obvious to you superbrains but it took me a while to puzzle out why the fuse was necessary here -- maybe worth a comment?
If I understand correctly, the problem is just that the first loop may exhaust the iterator, in which case the second loop would potentially re-run some iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I'll open a new PR to add a comment, your understanding is spot on.
Requested by Niko in rust-lang#22200 (and is good to have anyway)
Requested by Niko in rust-lang#22200 (and is good to have anyway)
… r=brson Requested by Niko in rust-lang#22200 (and is good to have anyway)
… r=brson Requested by Niko in rust-lang#22200 (and is good to have anyway)
Requested by Niko in rust-lang#22200 (and is good to have anyway)
Requested by Niko in rust-lang#22200 (and is good to have anyway)
This PR is an optimization of the
FromIterator
implementation ofVec
Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70
Before:
After:
The idea behind this optimization is to avoid all bounds checks for space
already allocated into the vector. This may involve running the iterator twice,
but the first run of the iterator should be optimizable to a memcpy or memset if
possible.
The same treatment can in theory be applied to
Vec::extend
but the benchmarksfor that currently get worse if the change is applied. This appears to be some
LLVM optimizations going awry but it's seems prudent to land at least the
collect
portion beforehand.