Optimize jumps in PartialOrd le #83819

AngelicosPhosphoros · 2021-04-03T18:02:32Z

Closes #73338
This change stops default implementation of le() method of PartialOrd from generating jumps.

rust-highfive · 2021-04-03T18:02:35Z

(rust-highfive has picked a reviewer for you, use r? to override)

AngelicosPhosphoros · 2021-04-03T18:05:03Z

ASM changes

Codegen difference for code:

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
pub enum Foo {
    Zero,
    One,
    Two,
}

pub fn compare(a: Foo, b: Foo)->bool{
    a <= b
}

Before:

_ZN8test_cmp7compare17h5c31fd9ed0a26eefE:
	xorl	%eax, %eax
	xorl	%r8d, %r8d
	cmpl	%edx, %ecx
	setne	%r8b
	movq	$-1, %rcx
	cmovaeq	%r8, %rcx
	movl	$0, %edx
	cmovneq	%rcx, %rdx
	addq	$1, %rdx
	cmpq	$1, %rdx
	ja	.LBB0_2
	movb	$1, %al
.LBB0_2:
	retq

Now:

_ZN8test_cmp7compare17h5c31fd9ed0a26eefE:
	cmpl	%edx, %ecx
	setbe	%al
	retq

Benchmark results

Code:

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
enum Foo {
    Zero = 0,
    One = 1,
    Two = 2,
}

struct Data {
    sorted: Vec<Foo>,
    unordered: Vec<Foo>,
}

fn generate_data(n: usize) -> Data {
    use rand::prelude::{Rng, SeedableRng};
    use rand_chacha::ChaCha8Rng;
    let mut rng = ChaCha8Rng::seed_from_u64(6464u64);
    let distribution = rand::distributions::Uniform::new(0, 3);
    let mut data = Vec::with_capacity(n);
    for _ in 0..n {
        let num: usize = rng.sample(distribution);
        data.push(num);
    }
    fn convert(num: usize) -> Foo {
        [Foo::Zero, Foo::One, Foo::Two][num]
    }
    let mut sorted = data.clone();
    sorted.sort_unstable();

    Data {
        sorted: sorted.into_iter().map(convert).collect(),
        unordered: data.into_iter().map(convert).collect(),
    }
}

pub fn criterion_benchmark(c: &mut Criterion) {
    let Data { sorted, unordered } = generate_data(1000);

    let mut group = c.benchmark_group("cmp");

    let pairs = [("Sorted", sorted), ("Unordered", unordered)];
    for (name, data) in pairs.iter().cloned() {
        group.bench_with_input(BenchmarkId::new("comparisons", name), &data, |b, data| {
            b.iter_batched(
                || -> (Vec<bool>, Vec<Foo>) {
                    let buffer = Vec::with_capacity(data.len());
                    let data = data.clone();
                    (buffer, data)
                },
                |(mut out_buff, data)| {
                    let comparisons = data.windows(2).map(|x| {
                        assert_eq!(x.len(), 2);
                        x[0] <= x[1]
                    });
                    out_buff.extend(comparisons);
                    (out_buff, data)
                },
                criterion::BatchSize::LargeInput,
            );
        });
    }

    group.finish();
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Results:

cmp/comparisons/Sorted  time:   [520.30 ns 520.94 ns 521.63 ns]
                        change: [-47.523% -47.229% -46.966%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
cmp/comparisons/Unordered
                        time:   [517.17 ns 518.03 ns 518.94 ns]
                        change: [-52.738% -52.573% -52.410%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

AngelicosPhosphoros · 2021-04-03T18:06:29Z

Probably need to run benchmarks suite for this to be sure that it doesn't break other optimizations.

jonas-schievink · 2021-04-03T18:08:23Z

@bors try @rust-timer queue

rust-timer · 2021-04-03T18:08:24Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-04-03T18:08:30Z

⌛ Trying commit c7e8066c39472b752863d5f469f6e617b194bf60 with merge 8a60369cea1e3d87175820371a0a7420df3581e3...

cjgillot · 2021-04-04T10:20:54Z

@bors try @rust-timer queue

rust-timer · 2021-04-04T10:20:56Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-04-04T10:21:02Z

⌛ Trying commit 6b9a89a9c60f0aa7a83501505d72e9bda3d321e4 with merge d9c05af42499911666ab5eeb19f11d4b589532e1...

bors · 2021-04-04T11:11:18Z

☀️ Try build successful - checks-actions
Build commit: d9c05af42499911666ab5eeb19f11d4b589532e1 (d9c05af42499911666ab5eeb19f11d4b589532e1)

rust-timer · 2021-04-04T11:11:20Z

Queued d9c05af42499911666ab5eeb19f11d4b589532e1 with parent 88e7862, future comparison URL.

rust-timer · 2021-04-04T16:10:41Z

Finished benchmarking try commit (d9c05af42499911666ab5eeb19f11d4b589532e1): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

cjgillot · 2021-04-04T16:38:08Z

src/test/codegen/issue-73338-effecient-le.rs

+#[no_mangle]
+pub fn compare(a: Foo, b: Foo)->bool{
+    // CHECK-NOT: br {{.*}}
+    a <= b


Can you add similar tests for the other comparison operators?

Closes #73338 This change stops default implementation of `le()` method from generating jumps.

Mark-Simulacrum · 2021-04-04T17:55:13Z

src/test/codegen/issue-73338-effecient-cmp.rs

+#[no_mangle]
+pub fn compare_greater(a: Foo, b: Foo)->bool{
+    // CHECK-NOT: br {{.*}}
+    a > b


It's interesting that this did not need a modification to ge -- maybe hints at some LLVM bug?

Well, the Option<Ordering> has such u8 values

Some(Less) == 255 Some(Equal) == 0 Some(Greater) == 1 None == 2

so, >= compiles to cmp_result as u8 < 2 which easy to optimize.

Old implementation of <= couldn't be optimized such easily because it was (255 == cmp_result || 0 == cmp_result) which LLVM failed to optimize. After my change it becomes !((cmp_result+1) > 1) which optimized much better.

Possibly, current version of LLVM handle comparison with 2 consequtive numbers better than with 2 different numbers.

Makes sense, OK. Thanks!

Any idea what the fallout would be if Ordering was changed from (Less, Equal, Greater) == (-1, 0, 1) to (Less, Equal, Greater) == (0, 1, 2)? I didn't find a guarantee for the underlying values being stable and impl Ord for Ordering only relies on Less < Equal < Greater.

I don't know enough for this, I think.

Current solution could be profitable in conversion from memcmp results (compiler can do sign(memcmp(a, b)) to get Ordering).

Mark-Simulacrum · 2021-04-04T18:19:12Z

@bors r+

bors · 2021-04-04T18:19:13Z

📌 Commit ed0d8fa has been approved by Mark-Simulacrum

bors · 2021-04-05T03:55:13Z

⌛ Testing commit ed0d8fa with merge b1b0a15...

bors · 2021-04-05T06:21:09Z

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing b1b0a15 to master...

rust-highfive assigned Mark-Simulacrum Apr 3, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 3, 2021

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 3, 2021

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 4, 2021

cjgillot reviewed Apr 4, 2021

View reviewed changes

Optimize PartialOrd le

ed0d8fa

Closes #73338 This change stops default implementation of `le()` method from generating jumps.

AngelicosPhosphoros requested a review from cjgillot April 4, 2021 17:38

Mark-Simulacrum reviewed Apr 4, 2021

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 4, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 5, 2021

bors merged commit b1b0a15 into rust-lang:master Apr 5, 2021

rustbot added this to the 1.53.0 milestone Apr 5, 2021

AngelicosPhosphoros deleted the issue-73338-fix-partial-eq-impl branch April 5, 2021 16:00

Optimize jumps in PartialOrd le #83819

Optimize jumps in PartialOrd le #83819

Uh oh!

Conversation

AngelicosPhosphoros commented Apr 3, 2021

Uh oh!

rust-highfive commented Apr 3, 2021

Uh oh!

AngelicosPhosphoros commented Apr 3, 2021

ASM changes

Benchmark results

Uh oh!

AngelicosPhosphoros commented Apr 3, 2021

Uh oh!

jonas-schievink commented Apr 3, 2021

Uh oh!

rust-timer commented Apr 3, 2021

Uh oh!

bors commented Apr 3, 2021

Uh oh!

This comment has been minimized.

cjgillot commented Apr 4, 2021

Uh oh!

rust-timer commented Apr 4, 2021

Uh oh!

bors commented Apr 4, 2021

Uh oh!

bors commented Apr 4, 2021

Uh oh!

rust-timer commented Apr 4, 2021

Uh oh!

rust-timer commented Apr 4, 2021

Uh oh!

cjgillot Apr 4, 2021

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum Apr 4, 2021

Choose a reason for hiding this comment

Uh oh!

AngelicosPhosphoros Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AngelicosPhosphoros Apr 4, 2021

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum Apr 4, 2021

Choose a reason for hiding this comment

Uh oh!

LingMan Apr 4, 2021

Choose a reason for hiding this comment

Uh oh!

AngelicosPhosphoros Apr 4, 2021

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum commented Apr 4, 2021

Uh oh!

bors commented Apr 4, 2021

Uh oh!

bors commented Apr 5, 2021

Uh oh!

bors commented Apr 5, 2021

Uh oh!

Uh oh!

AngelicosPhosphoros Apr 4, 2021 •

edited

Loading