Skip to content

Optimize jumps in PartialOrd le #83819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 5, 2021
Merged

Optimize jumps in PartialOrd le #83819

merged 1 commit into from
Apr 5, 2021

Conversation

AngelicosPhosphoros
Copy link
Contributor

Closes #73338
This change stops default implementation of le() method of PartialOrd from generating jumps.

@rust-highfive
Copy link
Contributor

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 3, 2021
@AngelicosPhosphoros
Copy link
Contributor Author

ASM changes

Codegen difference for code:

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
pub enum Foo {
    Zero,
    One,
    Two,
}

pub fn compare(a: Foo, b: Foo)->bool{
    a <= b
}

Before:

_ZN8test_cmp7compare17h5c31fd9ed0a26eefE:
	xorl	%eax, %eax
	xorl	%r8d, %r8d
	cmpl	%edx, %ecx
	setne	%r8b
	movq	$-1, %rcx
	cmovaeq	%r8, %rcx
	movl	$0, %edx
	cmovneq	%rcx, %rdx
	addq	$1, %rdx
	cmpq	$1, %rdx
	ja	.LBB0_2
	movb	$1, %al
.LBB0_2:
	retq

Now:

_ZN8test_cmp7compare17h5c31fd9ed0a26eefE:
	cmpl	%edx, %ecx
	setbe	%al
	retq

Benchmark results

Code:

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
enum Foo {
    Zero = 0,
    One = 1,
    Two = 2,
}

struct Data {
    sorted: Vec<Foo>,
    unordered: Vec<Foo>,
}

fn generate_data(n: usize) -> Data {
    use rand::prelude::{Rng, SeedableRng};
    use rand_chacha::ChaCha8Rng;
    let mut rng = ChaCha8Rng::seed_from_u64(6464u64);
    let distribution = rand::distributions::Uniform::new(0, 3);
    let mut data = Vec::with_capacity(n);
    for _ in 0..n {
        let num: usize = rng.sample(distribution);
        data.push(num);
    }
    fn convert(num: usize) -> Foo {
        [Foo::Zero, Foo::One, Foo::Two][num]
    }
    let mut sorted = data.clone();
    sorted.sort_unstable();

    Data {
        sorted: sorted.into_iter().map(convert).collect(),
        unordered: data.into_iter().map(convert).collect(),
    }
}

pub fn criterion_benchmark(c: &mut Criterion) {
    let Data { sorted, unordered } = generate_data(1000);

    let mut group = c.benchmark_group("cmp");

    let pairs = [("Sorted", sorted), ("Unordered", unordered)];
    for (name, data) in pairs.iter().cloned() {
        group.bench_with_input(BenchmarkId::new("comparisons", name), &data, |b, data| {
            b.iter_batched(
                || -> (Vec<bool>, Vec<Foo>) {
                    let buffer = Vec::with_capacity(data.len());
                    let data = data.clone();
                    (buffer, data)
                },
                |(mut out_buff, data)| {
                    let comparisons = data.windows(2).map(|x| {
                        assert_eq!(x.len(), 2);
                        x[0] <= x[1]
                    });
                    out_buff.extend(comparisons);
                    (out_buff, data)
                },
                criterion::BatchSize::LargeInput,
            );
        });
    }

    group.finish();
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Results:

cmp/comparisons/Sorted  time:   [520.30 ns 520.94 ns 521.63 ns]
                        change: [-47.523% -47.229% -46.966%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
cmp/comparisons/Unordered
                        time:   [517.17 ns 518.03 ns 518.94 ns]
                        change: [-52.738% -52.573% -52.410%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

@AngelicosPhosphoros
Copy link
Contributor Author

Probably need to run benchmarks suite for this to be sure that it doesn't break other optimizations.

@jonas-schievink
Copy link
Contributor

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 3, 2021
@bors
Copy link
Collaborator

bors commented Apr 3, 2021

⌛ Trying commit c7e8066c39472b752863d5f469f6e617b194bf60 with merge 8a60369cea1e3d87175820371a0a7420df3581e3...

@rust-log-analyzer

This comment has been minimized.

@cjgillot
Copy link
Contributor

cjgillot commented Apr 4, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Collaborator

bors commented Apr 4, 2021

⌛ Trying commit 6b9a89a9c60f0aa7a83501505d72e9bda3d321e4 with merge d9c05af42499911666ab5eeb19f11d4b589532e1...

@bors
Copy link
Collaborator

bors commented Apr 4, 2021

☀️ Try build successful - checks-actions
Build commit: d9c05af42499911666ab5eeb19f11d4b589532e1 (d9c05af42499911666ab5eeb19f11d4b589532e1)

@rust-timer
Copy link
Collaborator

Queued d9c05af42499911666ab5eeb19f11d4b589532e1 with parent 88e7862, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (d9c05af42499911666ab5eeb19f11d4b589532e1): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 4, 2021
#[no_mangle]
pub fn compare(a: Foo, b: Foo)->bool{
// CHECK-NOT: br {{.*}}
a <= b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add similar tests for the other comparison operators?

Closes #73338
This change stops default implementation of `le()` method from generating jumps.
#[no_mangle]
pub fn compare_greater(a: Foo, b: Foo)->bool{
// CHECK-NOT: br {{.*}}
a > b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that this did not need a modification to ge -- maybe hints at some LLVM bug?

Copy link
Contributor Author

@AngelicosPhosphoros AngelicosPhosphoros Apr 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the Option<Ordering> has such u8 values

Some(Less) == 255
Some(Equal) == 0
Some(Greater) == 1
None == 2

so, >= compiles to cmp_result as u8 < 2 which easy to optimize.

Old implementation of <= couldn't be optimized such easily because it was (255 == cmp_result || 0 == cmp_result) which LLVM failed to optimize. After my change it becomes !((cmp_result+1) > 1) which optimized much better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, current version of LLVM handle comparison with 2 consequtive numbers better than with 2 different numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, OK. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what the fallout would be if Ordering was changed from (Less, Equal, Greater) == (-1, 0, 1) to (Less, Equal, Greater) == (0, 1, 2)? I didn't find a guarantee for the underlying values being stable and impl Ord for Ordering only relies on Less < Equal < Greater.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough for this, I think.

Current solution could be profitable in conversion from memcmp results (compiler can do sign(memcmp(a, b)) to get Ordering).

@Mark-Simulacrum
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Apr 4, 2021

📌 Commit ed0d8fa has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 4, 2021
@bors
Copy link
Collaborator

bors commented Apr 5, 2021

⌛ Testing commit ed0d8fa with merge b1b0a15...

@bors
Copy link
Collaborator

bors commented Apr 5, 2021

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing b1b0a15 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 5, 2021
@bors bors merged commit b1b0a15 into rust-lang:master Apr 5, 2021
@rustbot rustbot added this to the 1.53.0 milestone Apr 5, 2021
@AngelicosPhosphoros AngelicosPhosphoros deleted the issue-73338-fix-partial-eq-impl branch April 5, 2021 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PartialOrd derived for C-like enum isn't properly optimized for the <= operator
10 participants