std: Optimize Vec::from_iter #22200

alexcrichton · 2015-02-12T00:50:25Z

This PR is an optimization of the FromIterator implementation of Vec

Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70

Before:

test macro_repeat1     ... bench:        57 ns/iter (+/- 1)
test macro_repeat2     ... bench:        56 ns/iter (+/- 1)
test map_clone1        ... bench:       828 ns/iter (+/- 13)
test map_clone2        ... bench:       828 ns/iter (+/- 8)
test repeat1           ... bench:      1104 ns/iter (+/- 10)
test repeat2           ... bench:      1106 ns/iter (+/- 11)

After:

test macro_repeat1     ... bench:        75 ns/iter (+/- 21)
test macro_repeat2     ... bench:        59 ns/iter (+/- 31)
test map_clone1        ... bench:        34 ns/iter (+/- 22)
test map_clone2        ... bench:        52 ns/iter (+/- 21)
test repeat1           ... bench:        34 ns/iter (+/- 11)
test repeat2           ... bench:        33 ns/iter (+/- 12)

The idea behind this optimization is to avoid all bounds checks for space
already allocated into the vector. This may involve running the iterator twice,
but the first run of the iterator should be optimizable to a memcpy or memset if
possible.

The same treatment can in theory be applied to Vec::extend but the benchmarks
for that currently get worse if the change is applied. This appears to be some
LLVM optimizations going awry but it's seems prudent to land at least the
collect portion beforehand.

rust-highfive · 2015-02-12T00:50:37Z

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2015-02-12T00:50:39Z

cc @gankro, rust-lang/rfcs#832

huonw · 2015-02-12T01:02:37Z

33 ns/iter seems rather short, even for a memcpy/memset?

huonw · 2015-02-12T01:04:18Z

@bors r+ 2543 rollup

alexcrichton · 2015-02-12T01:27:04Z

@huonw these are the raw results I get:

#[bench]                                                            
fn libc1(b: &mut test::Bencher) {                                   
    b.iter(|| unsafe {                                              
        extern { fn memset(p: *mut u8, n: u8, amt: usize); }        
        let ptr = libc::malloc(1000);                               
        assert!(!ptr.is_null());                                    
        memset(ptr as *mut u8, 0, 1000);                            
        libc::free(ptr);                                            
    });                                                             
}                                                                   
#[bench]                                                            
fn heap(b: &mut test::Bencher) {                                    
    b.iter(|| unsafe {                                              
        extern { fn memset(p: *mut u8, n: u8, amt: usize); }        
        let ptr = std::rt::heap::allocate(1000, 8);                 
        assert!(!ptr.is_null());                                    
        memset(ptr as *mut u8, 0, 1000);                            
        std::rt::heap::deallocate(ptr, 1000, 8);                    
    });                                                             
}

test heap          ... bench:        30 ns/iter (+/- 1)
test libc1         ... bench:        49 ns/iter (+/- 3)

bors · 2015-02-12T07:10:30Z

⌛ Testing commit 25435ee with merge c838a13...

bors · 2015-02-12T10:00:06Z

💔 Test failed - auto-win-32-opt

bluss · 2015-02-12T12:46:43Z

src/libcollections/vec.rs

+                vector.set_len(len + 1);
+            }
+        }
+
        for element in iterator {


Our description of the iterator protocol says that after returning None, an iterator can do whatever it wants (unfortunately). So on this line that caveat applies. I guess it's not important really what happens, but an iterator may for example be in spec and produce itself double when collected into a Vec this way.

On balance I think that if we believe in our description for the iterator protocol, we can't push it aside in this central location.

Suggested fix: Rewrite the first for loop to loop - match and return on first None?

I suppose inserting if vector.len() < vector.capacity() { return vector } before this loop should protect against this case?

Maybe the None stuff can be waived if the iterator produces less elements than its lower bound. I guess that means the current code is fine. These soft requirements are confusing :-)

Another option would be a use of fuse():

diff --git a/src/libcollections/vec.rs b/src/libcollections/vec.rs index 7fcf0dd..9fd52a1 100644 --- a/src/libcollections/vec.rs +++ b/src/libcollections/vec.rs @@ -1380,7 +1380,9 @@ impl<T> FromIterator<T> for Vec<T> { fn from_iter<I:Iterator<Item=T>>(mut iterator: I) -> Vec<T> { let (lower, _) = iterator.size_hint(); let mut vector = Vec::with_capacity(lower); - for element in iterator.by_ref().take(vector.capacity()) { + + let mut i = iterator.by_ref().fuse(); + for element in i.by_ref().take(vector.capacity()) { let len = vector.len(); unsafe { ptr::write(vector.get_unchecked_mut(len), element); @@ -1388,7 +1390,7 @@ impl<T> FromIterator<T> for Vec<T> { } } - for element in iterator { + for element in i { vector.push(element) } vector

In this case the numbers are also not affected:

test heap ... bench: 31 ns/iter (+/- 2) test libc1 ... bench: 49 ns/iter (+/- 3) test macro_repeat1 ... bench: 55 ns/iter (+/- 3) test macro_repeat2 ... bench: 54 ns/iter (+/- 3) test map_clone1 ... bench: 33 ns/iter (+/- 2) test map_clone2 ... bench: 33 ns/iter (+/- 2) test repeat1 ... bench: 286 ns/iter (+/- 17) test repeat2 ... bench: 283 ns/iter (+/- 12)

(the repeat numbers are a bit flaky unfortunately, so that's not too surprising)

alexcrichton · 2015-02-12T18:15:59Z

@bors: retry

alexcrichton · 2015-02-12T18:23:08Z

@bors: r-

alexcrichton · 2015-02-12T18:25:26Z

@bors: r+ 985fc7d

This PR is an optimization of the `FromIterator` implementation of `Vec` Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70 Before: test macro_repeat1 ... bench: 57 ns/iter (+/- 1) test macro_repeat2 ... bench: 56 ns/iter (+/- 1) test map_clone1 ... bench: 828 ns/iter (+/- 13) test map_clone2 ... bench: 828 ns/iter (+/- 8) test repeat1 ... bench: 1104 ns/iter (+/- 10) test repeat2 ... bench: 1106 ns/iter (+/- 11) After: test macro_repeat1 ... bench: 75 ns/iter (+/- 21) test macro_repeat2 ... bench: 59 ns/iter (+/- 31) test map_clone1 ... bench: 34 ns/iter (+/- 22) test map_clone2 ... bench: 52 ns/iter (+/- 21) test repeat1 ... bench: 34 ns/iter (+/- 11) test repeat2 ... bench: 33 ns/iter (+/- 12) The idea behind this optimization is to avoid all bounds checks for space already allocated into the vector. This may involve running the iterator twice, but the first run of the iterator should be optimizable to a memcpy or memset if possible. The same treatment can in theory be applied to `Vec::extend` but the benchmarks for that currently get *worse* if the change is applied. This appears to be some LLVM optimizations going awry but it's seems prudent to land at least the `collect` portion beforehand.

alexcrichton · 2015-02-12T18:25:37Z

@bors: r=huonw 985fc7d

This PR is an optimization of the `FromIterator` implementation of `Vec` Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70 Before: test macro_repeat1 ... bench: 57 ns/iter (+/- 1) test macro_repeat2 ... bench: 56 ns/iter (+/- 1) test map_clone1 ... bench: 828 ns/iter (+/- 13) test map_clone2 ... bench: 828 ns/iter (+/- 8) test repeat1 ... bench: 1104 ns/iter (+/- 10) test repeat2 ... bench: 1106 ns/iter (+/- 11) After: test macro_repeat1 ... bench: 75 ns/iter (+/- 21) test macro_repeat2 ... bench: 59 ns/iter (+/- 31) test map_clone1 ... bench: 34 ns/iter (+/- 22) test map_clone2 ... bench: 52 ns/iter (+/- 21) test repeat1 ... bench: 34 ns/iter (+/- 11) test repeat2 ... bench: 33 ns/iter (+/- 12) The idea behind this optimization is to avoid all bounds checks for space already allocated into the vector. This may involve running the iterator twice, but the first run of the iterator should be optimizable to a memcpy or memset if possible. The same treatment can in theory be applied to `Vec::extend` but the benchmarks for that currently get *worse* if the change is applied. This appears to be some LLVM optimizations going awry but it's seems prudent to land at least the `collect` portion beforehand.

bors · 2015-02-13T21:15:40Z

⌛ Testing commit 985fc7d with merge b9ba643...

nikomatsakis · 2015-02-13T22:12:17Z

src/libcollections/vec.rs

@@ -1380,7 +1380,17 @@ impl<T> FromIterator<T> for Vec<T> {
    fn from_iter<I:Iterator<Item=T>>(iterator: I) -> Vec<T> {
        let (lower, _) = iterator.size_hint();
        let mut vector = Vec::with_capacity(lower);
-        for element in iterator {
+
+        let mut i = iterator.fuse();


Perhaps obvious to you superbrains but it took me a while to puzzle out why the fuse was necessary here -- maybe worth a comment?

If I understand correctly, the problem is just that the first loop may exhaust the iterator, in which case the second loop would potentially re-run some iterators.

Ah yes, I'll open a new PR to add a comment, your understanding is spot on.

bors · 2015-02-13T23:56:38Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-32-nopt-t, auto-win-32-opt, auto-win-64-nopt-t, auto-win-64-opt

Requested by Niko in rust-lang#22200 (and is good to have anyway)

… r=brson Requested by Niko in rust-lang#22200 (and is good to have anyway)

Requested by Niko in rust-lang#22200 (and is good to have anyway)

rust-highfive assigned huonw Feb 12, 2015

bluss reviewed Feb 12, 2015
View reviewed changes

alexcrichton force-pushed the opt-vec-collect branch from 25435ee to 985fc7d Compare February 12, 2015 18:25

steveklabnik mentioned this pull request Feb 13, 2015

Rollup of 15 pull requests #22281

Closed

nikomatsakis reviewed Feb 13, 2015
View reviewed changes

bors merged commit 985fc7d into rust-lang:master Feb 13, 2015

semarie mentioned this pull request Feb 14, 2015

Build fails with --disable-jemalloc #21526

Closed

alexcrichton deleted the opt-vec-collect branch February 16, 2015 05:00

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 16, 2015

std: Add Vec::from_iter comment

42053b9

Requested by Niko in rust-lang#22200 (and is good to have anyway)

alexcrichton mentioned this pull request Feb 16, 2015

std: Add Vec::from_iter comment #22394

Merged

Manishearth pushed a commit to Manishearth/rust that referenced this pull request Feb 17, 2015

std: Add Vec::from_iter comment

a35824b

Requested by Niko in rust-lang#22200 (and is good to have anyway)

Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 17, 2015

Rollup merge of rust-lang#22394 - alexcrichton:vec-from-iter-comment,…

b491b16

… r=brson Requested by Niko in rust-lang#22200 (and is good to have anyway)

Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 17, 2015

Rollup merge of rust-lang#22394 - alexcrichton:vec-from-iter-comment,…

071f8cc

… r=brson Requested by Niko in rust-lang#22200 (and is good to have anyway)

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 17, 2015

std: Add Vec::from_iter comment

95a28c9

Requested by Niko in rust-lang#22200 (and is good to have anyway)

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 18, 2015

rollup merge of rust-lang#22394: alexcrichton/vec-from-iter-comment

b283881

Requested by Niko in rust-lang#22200 (and is good to have anyway)

std: Optimize Vec::from_iter #22200

std: Optimize Vec::from_iter #22200

Uh oh!

Conversation

alexcrichton commented Feb 12, 2015

Uh oh!

rust-highfive commented Feb 12, 2015

Uh oh!

alexcrichton commented Feb 12, 2015

Uh oh!

huonw commented Feb 12, 2015

Uh oh!

huonw commented Feb 12, 2015

Uh oh!

alexcrichton commented Feb 12, 2015

Uh oh!

bors commented Feb 12, 2015

Uh oh!

bors commented Feb 12, 2015

Uh oh!

bluss Feb 12, 2015

Choose a reason for hiding this comment

Uh oh!

huonw Feb 12, 2015

Choose a reason for hiding this comment

Uh oh!

bluss Feb 12, 2015

Choose a reason for hiding this comment

Uh oh!

alexcrichton Feb 12, 2015

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Feb 12, 2015

Uh oh!

alexcrichton commented Feb 12, 2015

Uh oh!

alexcrichton commented Feb 12, 2015

Uh oh!

alexcrichton commented Feb 12, 2015

Uh oh!

bors commented Feb 13, 2015

Uh oh!

nikomatsakis Feb 13, 2015

Choose a reason for hiding this comment

Uh oh!

alexcrichton Feb 16, 2015

Choose a reason for hiding this comment

Uh oh!

bors commented Feb 13, 2015

Uh oh!

Uh oh!