Skip to content

Potential Miscompiles with RegUnits-based MachineLICM liveness calculation #96146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Pierre-vh opened this issue Jun 20, 2024 · 8 comments
Open

Comments

@Pierre-vh
Copy link
Contributor

Pierre-vh commented Jun 20, 2024

After #94608 and #95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.

This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.

The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.

I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.

One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru

The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926 - so this is not an urgent fix needed, but it's a sign of something wrong with reg units and I think it needs attention.

@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Pierre van Houtryve (Pierre-vh)

After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.

This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.

The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.

I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.

One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru

@Pierre-vh
Copy link
Contributor Author

Pierre-vh added a commit to Pierre-vh/llvm-project that referenced this issue Jun 20, 2024
Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU.

llvm#96146 opened to investigate the issue.
Pierre-vh added a commit that referenced this issue Jun 20, 2024
Reverts the behavior introduced by 770393b while keeping the refactored
code.

Fixes a miscompile on AArch64, at the cost of a small regression on
AMDGPU.
#96146 opened to investigate the issue
@jayfoad
Copy link
Contributor

jayfoad commented Jun 20, 2024

@arsenm
Copy link
Contributor

arsenm commented Jun 20, 2024

Preserved masks should also really be in terms of regunits, not registers

@hvdijk
Copy link
Contributor

hvdijk commented Jun 20, 2024

This wasn't just AArch64, I saw this affect X86 as well. On Windows, XMM6-XMM15 are callee-saved, but #95746 resulted in YMM9 being moved out of a loop that contained function calls. The XMM9-part of YMM9 is callee-saved, but the rest of YMM9 is not. #95926 appears to work around / fix this for X86 as well, thanks for the quick action. Adding a comment to make sure this is known so it can be taken into account for follow up work.

@Pierre-vh Pierre-vh changed the title AArch64 Miscompile with RegUnits-based MachineLICM liveness calculation Potential Miscompiles with RegUnits-based MachineLICM liveness calculation Jun 20, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/issue-subscribers-backend-x86

Author: Pierre van Houtryve (Pierre-vh)

After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.

This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.

The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.

I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.

One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru

The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926

@Pierre-vh
Copy link
Contributor Author

FWIW, I tried adding an extra regunitt in the AArch case, but it causes big changes to codegen. I think it changes regalloc somehow - perhaps allocation order changes? I will likely need the help of some people really familiar with LLVM RegAlloc infra to make it work.

@arsenm
Copy link
Contributor

arsenm commented Jul 3, 2024

FWIW, I tried adding an extra regunitt in the AArch case, but it causes big changes to codegen. I think it changes regalloc somehow - perhaps allocation order changes?

The allocation order is explicit and per register class. The order shouldn't have changed from adding a new unit

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this issue Jul 9, 2024
Reverts the behavior introduced by 770393b while keeping the refactored
code.

Fixes a miscompile on AArch64, at the cost of a small regression on
AMDGPU.
llvm#96146 opened to investigate the issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants