-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[CodeGen][MachinePipeliner] Limit register pressure when scheduling #74807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c9f0fbe
to
3a22d00
Compare
@ceseo Could you please check? |
ping |
@kasuga-fj could you please check the failing check? It seems to be crashing the MLIR tests on Windows. |
@luporl Could you please help with this review? |
3a22d00
to
7933686
Compare
Thank you for your reply! The failure is now resolved (just rebased) |
Yes, I can take a look at the changes, but please note that I'm not familiar with this part of llvm. |
ping |
Hi. I plan to take a look this later this week / early next week. |
Do you have any benchmark numbers showing this performance degradation and how this patch improves it? |
A few suggestions to improve the description text:
how many variables can actually be alive at the same time.
As a result, a lot of register spills/fills can be generated after register allocation, ...
So, does it mean that, when What about |
IIUIC, by following the discourse thread and checking the results at #65609 (comment), this patch reduces the number of cycles needed to execute the loop of https://github.com/AMReX-Codes/amrex/blob/9e35dc19489dc5d312e92781cb0471d282cf8370/Src/LinearSolvers/MLMG/AMReX_MLNodeLap_2D_K.H#L584, with some modifications. In this test code, with this patch applied, II would be changed from 11 to 20, to avoid spills/fills, which results in the number of cycles per iteration going down from 29.3 to 16.5, without MVE (#65609), and from 19.6 to 15.7 with MVE. Is that correct? It would be nice to add a short version of the results to the description of this patch, to give an idea of the performance improvement, without the need to go through discourse and the MVE patch. Also, it would be nice if the modifications made to the test code could be made public, so that others can try to reproduce the results. Are there any improvements with the unmodified version too? Finally, it would help if you could try this patch with other benchmarks, like SPEC CPU 2017, if it's not too much work, to check how it impacts the performance of other workloads. This will be important when considering enabling |
Thank you for your reply!. Here are the answers to your questions (the description is also updated).
Yes. We'd like to.
As you said,
You are right. Sorry for going out of your way to find it.
My colleague @ytmukai are working to publish it. I believe that it will be published soon.
The improvement without modification has not been confirmed because the analysis required for the pipeliner doesn't work well and MachinePipeliner cannot be applied. We recognize that this is an issue and would like to resolve.
For the same reason as above, we've not been able to check the performance with other benchmarks. Please let this be a future work. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks, I missed that.
I'd add a benchmark to the LLVM test-suite. It would make it easier to catch any future performance regressions. |
Thanks for all the answers and improvements!
So is the issue caused by some loops failing to match MachinePipeliner's expectations and then being skipped by it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but it would be better if someone who knows this part better could take a look.
# CHECK: Rejecte the schedule because of too high register pressure | ||
# CHECK: Try to schedule with 24 | ||
# CHECK: Rejecte the schedule because of too high register pressure | ||
# CHECK: Try to schedule with 25 | ||
# CHECK: Rejecte the schedule because of too high register pressure | ||
# CHECK: Try to schedule with 26 | ||
# CHECK: Rejecte the schedule because of too high register pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These messages need to be updated, to fix the CI error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I forgot. Thanks!
Thank you for your review!
Yes. However, we are working on resolving this issue and have recently been able to increase the number of programs to which MachinePipeliner can be applied. We've not yet confirmed the details of the results, but we may be able to present improvements in other benchmarks in the near future. Once we have the results, I will ask someone familiar with this part to review the patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I tried to give this a try on some Arm tests with it enabled, and seemed to it look OK. A few things got a little better and nothing seemed to break, which is a good sign.
I read through the code and it LGTM.
Thanks for checking. |
31b4b50
to
e1a964a
Compare
In software pipelining, when searching for the Initiation Interval (II), `MachinePipeliner` tries to reduce register pressure, but doesn't check how many variables can actually alive at the same time. This can result a lot of register spills/fills can be generated after register allocation, which might cause performance degradation. To prevent such cases, this patch adds a check phase that calculates the maximum register pressure of the scheduled loop and reject it if the pressure is too high. This can be enabled this by specifying `pipeliner-register-pressure`. Additionally, an II search range is currently fixed at 10, which is too small to find a schedule when the above algorithm is applied. Threfore this patch also adds a new option `pipeliner-ii-search-range` to specify the length of the range to search. There is one more new option `pipeliner-register-pressure-margin`, which can be used to estimate a register pressure limit less than actual for conservative analysis. Discourse thread: https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725
e1a964a
to
012e158
Compare
LLVM_DEBUG({ | ||
for (auto Reg : FixedRegs) { | ||
dbgs() << printReg(Reg, TRI, 0, &MRI) << ": ["; | ||
const int *Sets = TRI->getRegUnitPressureSets(Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to dig up this old review. This is passing a physical register to an interface that expects a register unit which is a different encoding space. MCRegUnitIterator
needs to be used to translate from physical to register unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, I didn't know that. Thanks for finding it.
In software pipelining, when searching for the Initiation Interval (II),
MachinePipeliner
tries to reduce register pressure, but doesn't check how many variables can actually be alive at the same time. As a result, a lot of register spills/fills can be generated after register allocation, which might cause performance degradation. To prevent such cases, this patch adds a check phase that calculates the maximum register pressure of the scheduled loop and reject it if the pressure is too high. This can be enabled this by specifyingpipeliner-register-pressure
. Additionally, an II search range is currently fixed at 10, which is too small to find a schedule when the above algorithm is applied. Therefore this patch also adds a new optionpipeliner-ii-search-range
to specify the length of the range to search. There is one more new optionpipeliner-register-pressure-margin
, which can be used to estimate a register pressure limit less than actual for conservative analysis.Discourse thread: https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725
Note that this patch provides only minimal functionality and is disabled by default. We'd like to do the following in the future:
MachinePipeliner
for AArch64. When applying that patch, we'd like to enablepipeliner-register-pressure
in AArch64.MachinePipeliner
by default on AArch64. That is, whenMachinePipeliner
is enabled in AArch64pipeliner-register-pressuer
will be enabled by default.pipeliner-register-pressure-margin
will remain available for fine tuning.MachinePipeliner
, for example, PowerPC. We are not going to determine whether or not to enable the functionality of this patch on those architectures.MachinePipeliner
, for example, improved II search method.pipeliner-ii-search-range
) without compile time degradation.EDIT: Here is a brief summary of the performance improvement we observed when applying this patch. It's measured by using the loop based on https://github.com/AMReX-Codes/amrex/blob/9e35dc19489dc5d312e92781cb0471d282cf8370/Src/LinearSolvers/MLMG/AMReX_MLNodeLap_2D_K.H#L584 with some modification. Please see #65609 (comment) for more details.