Skip to content

std: Optimize Vec::from_iter #22200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 13, 2015
Merged

Conversation

alexcrichton
Copy link
Member

This PR is an optimization of the FromIterator implementation of Vec

Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70

Before:

test macro_repeat1     ... bench:        57 ns/iter (+/- 1)
test macro_repeat2     ... bench:        56 ns/iter (+/- 1)
test map_clone1        ... bench:       828 ns/iter (+/- 13)
test map_clone2        ... bench:       828 ns/iter (+/- 8)
test repeat1           ... bench:      1104 ns/iter (+/- 10)
test repeat2           ... bench:      1106 ns/iter (+/- 11)

After:

test macro_repeat1     ... bench:        75 ns/iter (+/- 21)
test macro_repeat2     ... bench:        59 ns/iter (+/- 31)
test map_clone1        ... bench:        34 ns/iter (+/- 22)
test map_clone2        ... bench:        52 ns/iter (+/- 21)
test repeat1           ... bench:        34 ns/iter (+/- 11)
test repeat2           ... bench:        33 ns/iter (+/- 12)

The idea behind this optimization is to avoid all bounds checks for space
already allocated into the vector. This may involve running the iterator twice,
but the first run of the iterator should be optimizable to a memcpy or memset if
possible.

The same treatment can in theory be applied to Vec::extend but the benchmarks
for that currently get worse if the change is applied. This appears to be some
LLVM optimizations going awry but it's seems prudent to land at least the
collect portion beforehand.

@rust-highfive
Copy link
Contributor

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

cc @gankro, rust-lang/rfcs#832

@huonw
Copy link
Member

huonw commented Feb 12, 2015

33 ns/iter seems rather short, even for a memcpy/memset?

@huonw
Copy link
Member

huonw commented Feb 12, 2015

@bors r+ 2543 rollup

@alexcrichton
Copy link
Member Author

@huonw these are the raw results I get:

#[bench]                                                            
fn libc1(b: &mut test::Bencher) {                                   
    b.iter(|| unsafe {                                              
        extern { fn memset(p: *mut u8, n: u8, amt: usize); }        
        let ptr = libc::malloc(1000);                               
        assert!(!ptr.is_null());                                    
        memset(ptr as *mut u8, 0, 1000);                            
        libc::free(ptr);                                            
    });                                                             
}                                                                   
#[bench]                                                            
fn heap(b: &mut test::Bencher) {                                    
    b.iter(|| unsafe {                                              
        extern { fn memset(p: *mut u8, n: u8, amt: usize); }        
        let ptr = std::rt::heap::allocate(1000, 8);                 
        assert!(!ptr.is_null());                                    
        memset(ptr as *mut u8, 0, 1000);                            
        std::rt::heap::deallocate(ptr, 1000, 8);                    
    });                                                             
}                                                                   
test heap          ... bench:        30 ns/iter (+/- 1)
test libc1         ... bench:        49 ns/iter (+/- 3)

@bors
Copy link
Collaborator

bors commented Feb 12, 2015

⌛ Testing commit 25435ee with merge c838a13...

@bors
Copy link
Collaborator

bors commented Feb 12, 2015

💔 Test failed - auto-win-32-opt

vector.set_len(len + 1);
}
}

for element in iterator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our description of the iterator protocol says that after returning None, an iterator can do whatever it wants (unfortunately). So on this line that caveat applies. I guess it's not important really what happens, but an iterator may for example be in spec and produce itself double when collected into a Vec this way.

On balance I think that if we believe in our description for the iterator protocol, we can't push it aside in this central location.

Suggested fix: Rewrite the first for loop to loop - match and return on first None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose inserting if vector.len() < vector.capacity() { return vector } before this loop should protect against this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the None stuff can be waived if the iterator produces less elements than its lower bound. I guess that means the current code is fine. These soft requirements are confusing :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be a use of fuse():

diff --git a/src/libcollections/vec.rs b/src/libcollections/vec.rs
index 7fcf0dd..9fd52a1 100644
--- a/src/libcollections/vec.rs
+++ b/src/libcollections/vec.rs
@@ -1380,7 +1380,9 @@ impl<T> FromIterator<T> for Vec<T> {
     fn from_iter<I:Iterator<Item=T>>(mut iterator: I) -> Vec<T> {
         let (lower, _) = iterator.size_hint();
         let mut vector = Vec::with_capacity(lower);
-        for element in iterator.by_ref().take(vector.capacity()) {
+
+        let mut i = iterator.by_ref().fuse();
+        for element in i.by_ref().take(vector.capacity()) {
             let len = vector.len();
             unsafe {
                 ptr::write(vector.get_unchecked_mut(len), element);
@@ -1388,7 +1390,7 @@ impl<T> FromIterator<T> for Vec<T> {
             }
         }

-        for element in iterator {
+        for element in i {
             vector.push(element)
         }
         vector

In this case the numbers are also not affected:

test heap          ... bench:        31 ns/iter (+/- 2)
test libc1         ... bench:        49 ns/iter (+/- 3)
test macro_repeat1 ... bench:        55 ns/iter (+/- 3)
test macro_repeat2 ... bench:        54 ns/iter (+/- 3)
test map_clone1    ... bench:        33 ns/iter (+/- 2)
test map_clone2    ... bench:        33 ns/iter (+/- 2)
test repeat1       ... bench:       286 ns/iter (+/- 17)
test repeat2       ... bench:       283 ns/iter (+/- 12)

(the repeat numbers are a bit flaky unfortunately, so that's not too surprising)

@alexcrichton
Copy link
Member Author

@bors: retry

@alexcrichton
Copy link
Member Author

@bors: r-

@alexcrichton
Copy link
Member Author

@bors: r+ 985fc7d

This PR is an optimization of the `FromIterator` implementation of `Vec`

Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70

Before:
    test macro_repeat1     ... bench:        57 ns/iter (+/- 1)
    test macro_repeat2     ... bench:        56 ns/iter (+/- 1)
    test map_clone1        ... bench:       828 ns/iter (+/- 13)
    test map_clone2        ... bench:       828 ns/iter (+/- 8)
    test repeat1           ... bench:      1104 ns/iter (+/- 10)
    test repeat2           ... bench:      1106 ns/iter (+/- 11)

After:
    test macro_repeat1     ... bench:        75 ns/iter (+/- 21)
    test macro_repeat2     ... bench:        59 ns/iter (+/- 31)
    test map_clone1        ... bench:        34 ns/iter (+/- 22)
    test map_clone2        ... bench:        52 ns/iter (+/- 21)
    test repeat1           ... bench:        34 ns/iter (+/- 11)
    test repeat2           ... bench:        33 ns/iter (+/- 12)

The idea behind this optimization is to avoid all bounds checks for space
already allocated into the vector. This may involve running the iterator twice,
but the first run of the iterator should be optimizable to a memcpy or memset if
possible.

The same treatment can in theory be applied to `Vec::extend` but the benchmarks
for that currently get *worse* if the change is applied. This appears to be some
LLVM optimizations going awry but it's seems prudent to land at least the
`collect` portion beforehand.
@alexcrichton
Copy link
Member Author

@bors: r=huonw 985fc7d

steveklabnik added a commit to steveklabnik/rust that referenced this pull request Feb 13, 2015
This PR is an optimization of the `FromIterator` implementation of `Vec`

Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70

Before:

    test macro_repeat1     ... bench:        57 ns/iter (+/- 1)
    test macro_repeat2     ... bench:        56 ns/iter (+/- 1)
    test map_clone1        ... bench:       828 ns/iter (+/- 13)
    test map_clone2        ... bench:       828 ns/iter (+/- 8)
    test repeat1           ... bench:      1104 ns/iter (+/- 10)
    test repeat2           ... bench:      1106 ns/iter (+/- 11)

After:

    test macro_repeat1     ... bench:        75 ns/iter (+/- 21)
    test macro_repeat2     ... bench:        59 ns/iter (+/- 31)
    test map_clone1        ... bench:        34 ns/iter (+/- 22)
    test map_clone2        ... bench:        52 ns/iter (+/- 21)
    test repeat1           ... bench:        34 ns/iter (+/- 11)
    test repeat2           ... bench:        33 ns/iter (+/- 12)

The idea behind this optimization is to avoid all bounds checks for space
already allocated into the vector. This may involve running the iterator twice,
but the first run of the iterator should be optimizable to a memcpy or memset if
possible.

The same treatment can in theory be applied to `Vec::extend` but the benchmarks
for that currently get *worse* if the change is applied. This appears to be some
LLVM optimizations going awry but it's seems prudent to land at least the
`collect` portion beforehand.
bors added a commit that referenced this pull request Feb 13, 2015
This PR is an optimization of the `FromIterator` implementation of `Vec`

Benchmark: https://gist.github.com/alexcrichton/03d666159a28a80e7c70

Before:

    test macro_repeat1     ... bench:        57 ns/iter (+/- 1)
    test macro_repeat2     ... bench:        56 ns/iter (+/- 1)
    test map_clone1        ... bench:       828 ns/iter (+/- 13)
    test map_clone2        ... bench:       828 ns/iter (+/- 8)
    test repeat1           ... bench:      1104 ns/iter (+/- 10)
    test repeat2           ... bench:      1106 ns/iter (+/- 11)

After:

    test macro_repeat1     ... bench:        75 ns/iter (+/- 21)
    test macro_repeat2     ... bench:        59 ns/iter (+/- 31)
    test map_clone1        ... bench:        34 ns/iter (+/- 22)
    test map_clone2        ... bench:        52 ns/iter (+/- 21)
    test repeat1           ... bench:        34 ns/iter (+/- 11)
    test repeat2           ... bench:        33 ns/iter (+/- 12)

The idea behind this optimization is to avoid all bounds checks for space
already allocated into the vector. This may involve running the iterator twice,
but the first run of the iterator should be optimizable to a memcpy or memset if
possible.

The same treatment can in theory be applied to `Vec::extend` but the benchmarks
for that currently get *worse* if the change is applied. This appears to be some
LLVM optimizations going awry but it's seems prudent to land at least the
`collect` portion beforehand.
@bors
Copy link
Collaborator

bors commented Feb 13, 2015

⌛ Testing commit 985fc7d with merge b9ba643...

@@ -1380,7 +1380,17 @@ impl<T> FromIterator<T> for Vec<T> {
fn from_iter<I:Iterator<Item=T>>(iterator: I) -> Vec<T> {
let (lower, _) = iterator.size_hint();
let mut vector = Vec::with_capacity(lower);
for element in iterator {

let mut i = iterator.fuse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps obvious to you superbrains but it took me a while to puzzle out why the fuse was necessary here -- maybe worth a comment?

If I understand correctly, the problem is just that the first loop may exhaust the iterator, in which case the second loop would potentially re-run some iterators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I'll open a new PR to add a comment, your understanding is spot on.

@bors bors merged commit 985fc7d into rust-lang:master Feb 13, 2015
@alexcrichton alexcrichton deleted the opt-vec-collect branch February 16, 2015 05:00
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 16, 2015
Requested by Niko in rust-lang#22200 (and is good to have anyway)
Manishearth pushed a commit to Manishearth/rust that referenced this pull request Feb 17, 2015
Requested by Niko in rust-lang#22200 (and is good to have anyway)
Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 17, 2015
… r=brson

 Requested by Niko in rust-lang#22200 (and is good to have anyway)
Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 17, 2015
… r=brson

Requested by Niko in rust-lang#22200 (and is good to have anyway)
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 17, 2015
Requested by Niko in rust-lang#22200 (and is good to have anyway)
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants