-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Iterator::max with reference-type items cannot leverage SIMD instructions, resulting in low performance #106539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As suspected, underlying difference is that stdlib does max on Placing the Its possible a clippy lint could help migrate this pitfall, but ultimatly the best solution would be for LLVM to recognize this, and lift the copy to before the max. |
This would be extremely difficult for LLVM, because the I think this would be much better as a |
Updated the title because // This is fast
pub fn stdlib_max<T: Ord + Copy>(a: &[T]) -> Option<T> {
a.iter().copied().max()
}
pub fn demo(z: &[i32]) -> Option<i32> {
stdlib_max(z)
} The SIMD is most obvious in LLVM IR: %12 = tail call <4 x i32> @llvm.smax.v4i32(<4 x i32> %vec.phi, <4 x i32> %wide.load)
%13 = tail call <4 x i32> @llvm.smax.v4i32(<4 x i32> %vec.phi1, <4 x i32> %wide.load3) |
I agree LLVM cannot be expected to handle this for us. Is it possible that we could use MIR-level knowledge to lift the copy ourselves? |
It might be possible to specialize the
Well, uh, the opposite sure would be preferable here. But even then it should be possible. |
I did not know that copied can be called on Iter, that's on me !
That would be great for scalar types that implement Ord :)
Also more obvious with avx2 I'm not fluent at all in assembly, but the iterator copy seems to be fully optimised out, is that right ? So this syntax addresses my concern :) |
When manipulating array of numbers, it is pretty common to have to find the min/max/sum/... of it. While discussing about internals with fellow developers, someone pointed out that the C# max method leverages SIMD. By curiosity I checked for both C++ and Rust.
My findings are as follow:
This last bullet is due to the fact that the implementation does not expect the type to implement the Copy trait, and operates over references, and not actual type of the array.
=> Still, as an end user, I would have expected that the "rust way" to do the thing (with iterator) would be optimal, and it is not.
I link a small repository with a sample and bench pointing the issue:
[https://github.com/jfaixo/rust-max-bench]
For finding the max of a
[i32; 100_000]
array :The text was updated successfully, but these errors were encountered: