-
Notifications
You must be signed in to change notification settings - Fork 91
Open
Labels
I-slowImpact: SlowwwImpact: Slowww
Description
I tried to test the performance of SIMD on my machine. The test code is as follows:
const STEP_BY: usize = 8;
fn fast_sum(x: &[f64]) -> f64 {
assert!(x.len() % 8 == 0);
let mut sum = f64x8::splat(0.); // [0., 0., 0., 0.]
for i in (0..x.len()).step_by(STEP_BY) {
sum += f64x8::from_slice(&x[i..]);
}
sum.reduce_sum()
}
fn slow_sum(x: &[f64]) -> f64 {
assert!(x.len() % 8 == 0);
let mut sum: f64 = 0.;
for i in (0..x.len()).step_by(STEP_BY) {
sum += f64x8::from_slice(&x[i..]).reduce_sum();
}
sum
}
#[bench]
fn bench_fast(b: &mut Bencher) {
let data = (0..1024000).map(|v| v as f64).collect::<Vec<_>>();
b.iter(|| fast_sum(&data))
}
#[bench]
fn bench_slow(b: &mut Bencher) {
let data = (0..1024000).map(|v| v as f64).collect::<Vec<_>>();
b.iter(|| slow_sum(&data))
}
fn normal_sum(x: &[f64]) -> f64 {
let mut sum = 0.0;
x.iter().for_each(|&val| sum += val);
sum
}
#[bench]
fn bench_normal(b: &mut Bencher) {
let data = (0..1024000).map(|v| v as f64).collect::<Vec<_>>();
b.iter(|| normal_sum(&data))
}
I run the benchmark with : RUSTFLAGS='-C target-cpu=native' cargo bench
and my cpu is Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
it support avx512. The result is awesome, SIMD outperforms normal sum a lot
test bench_fast ... bench: 333,909 ns/iter (+/- 12,105)
test bench_normal ... bench: 1,333,889 ns/iter (+/- 6,561)
test bench_slow ... bench: 386,461 ns/iter (+/- 5,928)
However, when I replace the f64 with u64 or i64, SIMD version is slower than the normal sum:
test bench_fast ... bench: 339,867 ns/iter (+/- 16,114)
test bench_normal ... bench: 327,762 ns/iter (+/- 8,146)
test bench_slow ... bench: 347,187 ns/iter (+/- 8,486)
Why does it happens?
Metadata
Metadata
Assignees
Labels
I-slowImpact: SlowwwImpact: Slowww