-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU] Fix nondeterminism in SIFixSGPRCopies #70644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesThere are a couple of loops that iterate over V2SCopies. The iteration Full diff: https://github.com/llvm/llvm-project/pull/70644.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
index b32ed9fef5dd34e..3e6ed2d793ae563 100644
--- a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
@@ -125,7 +125,7 @@ class SIFixSGPRCopies : public MachineFunctionPass {
SmallVector<MachineInstr*, 4> PHINodes;
SmallVector<MachineInstr*, 4> S2VCopies;
unsigned NextVGPRToSGPRCopyID;
- DenseMap<unsigned, V2SCopyInfo> V2SCopies;
+ MapVector<unsigned, V2SCopyInfo> V2SCopies;
DenseMap<MachineInstr *, SetVector<unsigned>> SiblingPenalty;
public:
@@ -988,7 +988,7 @@ bool SIFixSGPRCopies::needToBeConvertedToVALU(V2SCopyInfo *Info) {
for (auto J : Info->Siblings) {
auto InfoIt = V2SCopies.find(J);
if (InfoIt != V2SCopies.end()) {
- MachineInstr *SiblingCopy = InfoIt->getSecond().Copy;
+ MachineInstr *SiblingCopy = InfoIt->second.Copy;
if (SiblingCopy->isImplicitDef())
// the COPY has already been MoveToVALUed
continue;
@@ -1023,12 +1023,12 @@ void SIFixSGPRCopies::lowerVGPR2SGPRCopies(MachineFunction &MF) {
unsigned CurID = LoweringWorklist.pop_back_val();
auto CurInfoIt = V2SCopies.find(CurID);
if (CurInfoIt != V2SCopies.end()) {
- V2SCopyInfo C = CurInfoIt->getSecond();
+ V2SCopyInfo C = CurInfoIt->second;
LLVM_DEBUG(dbgs() << "Processing ...\n"; C.dump());
for (auto S : C.Siblings) {
auto SibInfoIt = V2SCopies.find(S);
if (SibInfoIt != V2SCopies.end()) {
- V2SCopyInfo &SI = SibInfoIt->getSecond();
+ V2SCopyInfo &SI = SibInfoIt->second;
LLVM_DEBUG(dbgs() << "Sibling:\n"; SI.dump());
if (!SI.NeedToBeConvertedToVALU) {
SI.SChain.set_subtract(C.SChain);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test? I assume it would be a flaky test before this patch, but should pass every time after this patch.
https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h
The usage manual recommends removing elements in bulk, since with this container removal is slower. Can you extract the removal at L1043 V2SCopies.erase(C.ID)
to a removeList and do it a after the loop?
There are a couple of loops that iterate over V2SCopies. The iteration order needs to be deterministic, otherwise we can call moveToVALU in different orders, which causes temporary vregs to be allocated in different orders, which can affect register allocation heuristics.
OK I've force pushed (sorry) to demonstrate adding a test with the codegen I happened to get on my machine, and then the second commit shows how the codegen changes to something that hopefully should be the same for everyone. |
I'm not totally comfortable doing that myself because I don't fully understand the first couple of loops in |
I'm the original reporter of the bug in AMD's internal issue tracker. I verify that this resolves the issue, which caused shader compilation to be randomly influenced by shaders compiled before them. |
Local branch amd-gfx adee082 Merged main:75b3c3d267bf into amd-gfx:d648e114f351 Remote branch main a6dabed [AMDGPU] Fix nondeterminism in SIFixSGPRCopies (llvm#70644)
There are a couple of loops that iterate over V2SCopies. The iteration
order needs to be deterministic, otherwise we can call moveToVALU in
different orders, which causes temporary vregs to be allocated in
different orders, which can affect register allocation heuristics.