Skip to content

Performance regression in "Rewrite pass management with LLVM" #8890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DaGenix opened this issue Aug 30, 2013 · 4 comments
Closed

Performance regression in "Rewrite pass management with LLVM" #8890

DaGenix opened this issue Aug 30, 2013 · 4 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@DaGenix
Copy link

DaGenix commented Aug 30, 2013

Please see #8782 for more details since I'm running the same test cases as there.

revision speed of test.rc (MB/s)
6a649e6 (parent revision) 80.45
7354055 63.77

I'm not quite sure how to check all the optimizations that were enabled before and that are enabled now at --opt-level 3, so, I don't know if that changed. Maybe this isn't an issue and the answer is just that I need to manually enable some pass now since LLVM/Clang don't enable that pass at any optimization level by default. This seems like a pretty significant drop in performance, though, so I figured I'd open up an issue for it.

@alexcrichton
Copy link
Member

This is disturbing, thanks for narrowing it down to this! I will investigate this tonight.

@thestinger
Copy link
Contributor

@DaGenix: can you try with the -Z flag to disable SLP vectorization?

@alexcrichton
Copy link
Member

Or maybe I will investigate it now!

The threshold for inlining was raised from 225 to 275 during that change, and turning it back down to 225 restores the old performance.

I will both revert the numbers for now, and provide a way of overriding this.

bors added a commit that referenced this issue Aug 31, 2013
The only changes to the default passes is that O1 now doesn't run the inline
pass, just always-inline with lifetime intrinsics. O2 also now has a threshold
of 225 instead of 275. Otherwise the default passes being run is the same.

I've also added a few more options for configuring the pass pipeline. Namely you
can now specify arguments to LLVM directly via the `--llvm-args` command line
option which operates similarly to `--passes`. I also added the ability to turn
off pre-population of the pass manager in case you want to run *only* your own
passes.

I would consider this as closing #8890. I don't think that we should change the default inlining threshold because LLVM/clang will probably have chosen those numbers more carefully than we would. Regardless, here's the performance numbers from this commit:

```
$ ./x86_64-apple-darwin/stage0/bin/rustc ./gistfile1.rs --test --opt-level=3 -o before
warning: no debug symbols in executable (-arch x86_64)
$ ./before --bench

running 1 test
test bench::aes_bench_x8 ... bench: 1602 ns/iter (+/- 66) = 7990 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured

$ ./x86_64-apple-darwin/stage1/bin/rustc ./gistfile1.rs --test --opt-level=3 -o after
warning: no debug symbols in executable (-arch x86_64)
$ ./after --bench

running 1 test
test bench::aes_bench_x8 ... bench: 2103 ns/iter (+/- 175) = 6086 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured

$ ./x86_64-apple-darwin/stage1/bin/rustc ./gistfile1.rs --test --opt-level=3 -o after --llvm-args '-inline-threshold=225'
warning: no debug symbols in executable (-arch x86_64)
$ ./after --bench

running 1 test
test bench::aes_bench_x8 ... bench: 1600 ns/iter (+/- 71) = 8000 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured

```
@thestinger
Copy link
Contributor

The performance at --opt-level=2 is now the same as it was before and the inlining threshold can be set manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants