-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Optimize jumps in PartialOrd le #83819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize jumps in PartialOrd le #83819
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
ASM changesCodegen difference for code: #[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
pub enum Foo {
Zero,
One,
Two,
}
pub fn compare(a: Foo, b: Foo)->bool{
a <= b
} Before: _ZN8test_cmp7compare17h5c31fd9ed0a26eefE:
xorl %eax, %eax
xorl %r8d, %r8d
cmpl %edx, %ecx
setne %r8b
movq $-1, %rcx
cmovaeq %r8, %rcx
movl $0, %edx
cmovneq %rcx, %rdx
addq $1, %rdx
cmpq $1, %rdx
ja .LBB0_2
movb $1, %al
.LBB0_2:
retq Now: _ZN8test_cmp7compare17h5c31fd9ed0a26eefE:
cmpl %edx, %ecx
setbe %al
retq Benchmark resultsCode: use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
enum Foo {
Zero = 0,
One = 1,
Two = 2,
}
struct Data {
sorted: Vec<Foo>,
unordered: Vec<Foo>,
}
fn generate_data(n: usize) -> Data {
use rand::prelude::{Rng, SeedableRng};
use rand_chacha::ChaCha8Rng;
let mut rng = ChaCha8Rng::seed_from_u64(6464u64);
let distribution = rand::distributions::Uniform::new(0, 3);
let mut data = Vec::with_capacity(n);
for _ in 0..n {
let num: usize = rng.sample(distribution);
data.push(num);
}
fn convert(num: usize) -> Foo {
[Foo::Zero, Foo::One, Foo::Two][num]
}
let mut sorted = data.clone();
sorted.sort_unstable();
Data {
sorted: sorted.into_iter().map(convert).collect(),
unordered: data.into_iter().map(convert).collect(),
}
}
pub fn criterion_benchmark(c: &mut Criterion) {
let Data { sorted, unordered } = generate_data(1000);
let mut group = c.benchmark_group("cmp");
let pairs = [("Sorted", sorted), ("Unordered", unordered)];
for (name, data) in pairs.iter().cloned() {
group.bench_with_input(BenchmarkId::new("comparisons", name), &data, |b, data| {
b.iter_batched(
|| -> (Vec<bool>, Vec<Foo>) {
let buffer = Vec::with_capacity(data.len());
let data = data.clone();
(buffer, data)
},
|(mut out_buff, data)| {
let comparisons = data.windows(2).map(|x| {
assert_eq!(x.len(), 2);
x[0] <= x[1]
});
out_buff.extend(comparisons);
(out_buff, data)
},
criterion::BatchSize::LargeInput,
);
});
}
group.finish();
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches); Results:
|
Probably need to run benchmarks suite for this to be sure that it doesn't break other optimizations. |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit c7e8066c39472b752863d5f469f6e617b194bf60 with merge 8a60369cea1e3d87175820371a0a7420df3581e3... |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 6b9a89a9c60f0aa7a83501505d72e9bda3d321e4 with merge d9c05af42499911666ab5eeb19f11d4b589532e1... |
☀️ Try build successful - checks-actions |
Queued d9c05af42499911666ab5eeb19f11d4b589532e1 with parent 88e7862, future comparison URL. |
Finished benchmarking try commit (d9c05af42499911666ab5eeb19f11d4b589532e1): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
#[no_mangle] | ||
pub fn compare(a: Foo, b: Foo)->bool{ | ||
// CHECK-NOT: br {{.*}} | ||
a <= b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add similar tests for the other comparison operators?
Closes #73338 This change stops default implementation of `le()` method from generating jumps.
#[no_mangle] | ||
pub fn compare_greater(a: Foo, b: Foo)->bool{ | ||
// CHECK-NOT: br {{.*}} | ||
a > b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that this did not need a modification to ge -- maybe hints at some LLVM bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the Option<Ordering>
has such u8 values
Some(Less) == 255
Some(Equal) == 0
Some(Greater) == 1
None == 2
so, >=
compiles to cmp_result as u8 < 2
which easy to optimize.
Old implementation of <=
couldn't be optimized such easily because it was (255 == cmp_result || 0 == cmp_result)
which LLVM failed to optimize. After my change it becomes !((cmp_result+1) > 1)
which optimized much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, current version of LLVM handle comparison with 2 consequtive numbers better than with 2 different numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, OK. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what the fallout would be if Ordering
was changed from (Less, Equal, Greater) == (-1, 0, 1)
to (Less, Equal, Greater) == (0, 1, 2)
? I didn't find a guarantee for the underlying values being stable and impl Ord for Ordering
only relies on Less < Equal < Greater
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough for this, I think.
Current solution could be profitable in conversion from memcmp
results (compiler can do sign(memcmp(a, b))
to get Ordering).
@bors r+ |
📌 Commit ed0d8fa has been approved by |
☀️ Test successful - checks-actions |
Closes #73338
This change stops default implementation of
le()
method of PartialOrd from generating jumps.