-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Performance regression in "Rewrite pass management with LLVM" #8890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is disturbing, thanks for narrowing it down to this! I will investigate this tonight. |
@DaGenix: can you try with the |
Or maybe I will investigate it now! The threshold for inlining was raised from 225 to 275 during that change, and turning it back down to 225 restores the old performance. I will both revert the numbers for now, and provide a way of overriding this. |
The only changes to the default passes is that O1 now doesn't run the inline pass, just always-inline with lifetime intrinsics. O2 also now has a threshold of 225 instead of 275. Otherwise the default passes being run is the same. I've also added a few more options for configuring the pass pipeline. Namely you can now specify arguments to LLVM directly via the `--llvm-args` command line option which operates similarly to `--passes`. I also added the ability to turn off pre-population of the pass manager in case you want to run *only* your own passes. I would consider this as closing #8890. I don't think that we should change the default inlining threshold because LLVM/clang will probably have chosen those numbers more carefully than we would. Regardless, here's the performance numbers from this commit: ``` $ ./x86_64-apple-darwin/stage0/bin/rustc ./gistfile1.rs --test --opt-level=3 -o before warning: no debug symbols in executable (-arch x86_64) $ ./before --bench running 1 test test bench::aes_bench_x8 ... bench: 1602 ns/iter (+/- 66) = 7990 MB/s test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured $ ./x86_64-apple-darwin/stage1/bin/rustc ./gistfile1.rs --test --opt-level=3 -o after warning: no debug symbols in executable (-arch x86_64) $ ./after --bench running 1 test test bench::aes_bench_x8 ... bench: 2103 ns/iter (+/- 175) = 6086 MB/s test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured $ ./x86_64-apple-darwin/stage1/bin/rustc ./gistfile1.rs --test --opt-level=3 -o after --llvm-args '-inline-threshold=225' warning: no debug symbols in executable (-arch x86_64) $ ./after --bench running 1 test test bench::aes_bench_x8 ... bench: 1600 ns/iter (+/- 71) = 8000 MB/s test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured ```
The performance at --opt-level=2 is now the same as it was before and the inlining threshold can be set manually. |
Please see #8782 for more details since I'm running the same test cases as there.
I'm not quite sure how to check all the optimizations that were enabled before and that are enabled now at --opt-level 3, so, I don't know if that changed. Maybe this isn't an issue and the answer is just that I need to manually enable some pass now since LLVM/Clang don't enable that pass at any optimization level by default. This seems like a pretty significant drop in performance, though, so I figured I'd open up an issue for it.
The text was updated successfully, but these errors were encountered: