-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Potential Miscompiles with RegUnits-based MachineLICM liveness calculation #96146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-aarch64 Author: Pierre van Houtryve (Pierre-vh)
After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.
This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile. The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively. I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost. One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru |
Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue.
Preserved masks should also really be in terms of regunits, not registers |
This wasn't just AArch64, I saw this affect X86 as well. On Windows, XMM6-XMM15 are callee-saved, but #95746 resulted in YMM9 being moved out of a loop that contained function calls. The XMM9-part of YMM9 is callee-saved, but the rest of YMM9 is not. #95926 appears to work around / fix this for X86 as well, thanks for the quick action. Adding a comment to make sure this is known so it can be taken into account for follow up work. |
@llvm/issue-subscribers-backend-x86 Author: Pierre van Houtryve (Pierre-vh)
After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.
This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile. The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively. I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost. One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926 |
FWIW, I tried adding an extra regunitt in the AArch case, but it causes big changes to codegen. I think it changes regalloc somehow - perhaps allocation order changes? I will likely need the help of some people really familiar with LLVM RegAlloc infra to make it work. |
The allocation order is explicit and per register class. The order shouldn't have changed from adding a new unit |
Reverts the behavior introduced by 770393b while keeping the refactored code. Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue
After #94608 and #95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.
This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.
The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.
I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.
One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru
The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926 - so this is not an urgent fix needed, but it's a sign of something wrong with reg units and I think it needs attention.
The text was updated successfully, but these errors were encountered: