Skip to content

Iterator::max with reference-type items cannot leverage SIMD instructions, resulting in low performance #106539

Open
@jfaixo

Description

@jfaixo

When manipulating array of numbers, it is pretty common to have to find the min/max/sum/... of it. While discussing about internals with fellow developers, someone pointed out that the C# max method leverages SIMD. By curiosity I checked for both C++ and Rust.

My findings are as follow:

  • LLVM is able to auto vectorize this kind of stuff
  • the C++ STL max_element function leverages that
  • my custom implementation is able to leverages that
  • the Rust Iter functions (max, min) cannot

This last bullet is due to the fact that the implementation does not expect the type to implement the Copy trait, and operates over references, and not actual type of the array.

let my_array = (0..ITEM_COUNT).collect::<Vec<_>>();

// This is slow
#[inline(never)]
pub fn stdlib_max<T: Ord + Copy>(a: &[T]) -> Option<T> {
    a.iter().max().copied()
}

// This is fast
#[inline(never)]
pub fn custom_max<T: Ord + Copy>(a: &[T]) -> Option<T> {
    let first = *a.first()?;
    Some(a.iter().fold(first, |x, y| std::cmp::max(x, *y)))
}

=> Still, as an end user, I would have expected that the "rust way" to do the thing (with iterator) would be optimal, and it is not.

I link a small repository with a sample and bench pointing the issue:

[https://github.com/jfaixo/rust-max-bench]

For finding the max of a [i32; 100_000] array :

❯ rustc -vV
rustc 1.68.0-nightly (388538fc9 2023-01-05)
binary: rustc
commit-hash: 388538fc963e07a94e3fc3ac8948627fd2d28d29
commit-date: 2023-01-05
host: x86_64-unknown-linux-gnu
release: 1.68.0-nightly
LLVM version: 15.0.6

❯ cargo bench
    Finished bench [optimized] target(s) in 0.00s
     Running unittests src/lib.rs (target/release/deps/rust_max_bench-a5d988f9520f9dde)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench.rs (target/release/deps/bench-cf556ddbd1b864fb)

running 3 tests
test custom    ... bench:       8,052 ns/iter (+/- 385)
test itertools ... bench:      94,027 ns/iter (+/- 816)
test stdlib    ... bench:      94,477 ns/iter (+/- 1,545)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out; finished in 2.40s

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions