Skip to content

[AMDGPU] Fix nondeterminism in SIFixSGPRCopies #70644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 31, 2023
Merged

[AMDGPU] Fix nondeterminism in SIFixSGPRCopies #70644

merged 5 commits into from
Oct 31, 2023

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Oct 30, 2023

There are a couple of loops that iterate over V2SCopies. The iteration
order needs to be deterministic, otherwise we can call moveToVALU in
different orders, which causes temporary vregs to be allocated in
different orders, which can affect register allocation heuristics.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

There are a couple of loops that iterate over V2SCopies. The iteration
order needs to be deterministic, otherwise we can call moveToVALU in
different orders, which causes temporary vregs to be allocated in
different orders, which can affect register allocation heuristics.


Full diff: https://github.com/llvm/llvm-project/pull/70644.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp (+4-4)
diff --git a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
index b32ed9fef5dd34e..3e6ed2d793ae563 100644
--- a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
@@ -125,7 +125,7 @@ class SIFixSGPRCopies : public MachineFunctionPass {
   SmallVector<MachineInstr*, 4> PHINodes;
   SmallVector<MachineInstr*, 4> S2VCopies;
   unsigned NextVGPRToSGPRCopyID;
-  DenseMap<unsigned, V2SCopyInfo> V2SCopies;
+  MapVector<unsigned, V2SCopyInfo> V2SCopies;
   DenseMap<MachineInstr *, SetVector<unsigned>> SiblingPenalty;
 
 public:
@@ -988,7 +988,7 @@ bool SIFixSGPRCopies::needToBeConvertedToVALU(V2SCopyInfo *Info) {
   for (auto J : Info->Siblings) {
     auto InfoIt = V2SCopies.find(J);
     if (InfoIt != V2SCopies.end()) {
-      MachineInstr *SiblingCopy = InfoIt->getSecond().Copy;
+      MachineInstr *SiblingCopy = InfoIt->second.Copy;
       if (SiblingCopy->isImplicitDef())
         // the COPY has already been MoveToVALUed
         continue;
@@ -1023,12 +1023,12 @@ void SIFixSGPRCopies::lowerVGPR2SGPRCopies(MachineFunction &MF) {
     unsigned CurID = LoweringWorklist.pop_back_val();
     auto CurInfoIt = V2SCopies.find(CurID);
     if (CurInfoIt != V2SCopies.end()) {
-      V2SCopyInfo C = CurInfoIt->getSecond();
+      V2SCopyInfo C = CurInfoIt->second;
       LLVM_DEBUG(dbgs() << "Processing ...\n"; C.dump());
       for (auto S : C.Siblings) {
         auto SibInfoIt = V2SCopies.find(S);
         if (SibInfoIt != V2SCopies.end()) {
-          V2SCopyInfo &SI = SibInfoIt->getSecond();
+          V2SCopyInfo &SI = SibInfoIt->second;
           LLVM_DEBUG(dbgs() << "Sibling:\n"; SI.dump());
           if (!SI.NeedToBeConvertedToVALU) {
             SI.SChain.set_subtract(C.SChain);

@jayfoad
Copy link
Contributor Author

jayfoad commented Oct 30, 2023

@marekolsak

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test? I assume it would be a flaky test before this patch, but should pass every time after this patch.

https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h
The usage manual recommends removing elements in bulk, since with this container removal is slower. Can you extract the removal at L1043 V2SCopies.erase(C.ID)
to a removeList and do it a after the loop?

There are a couple of loops that iterate over V2SCopies. The iteration
order needs to be deterministic, otherwise we can call moveToVALU in
different orders, which causes temporary vregs to be allocated in
different orders, which can affect register allocation heuristics.
@jayfoad
Copy link
Contributor Author

jayfoad commented Oct 30, 2023

Can you add a test? I assume it would be a flaky test before this patch, but should pass every time after this patch.

OK I've force pushed (sorry) to demonstrate adding a test with the codegen I happened to get on my machine, and then the second commit shows how the codegen changes to something that hopefully should be the same for everyone.

@jayfoad
Copy link
Contributor Author

jayfoad commented Oct 30, 2023

https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h The usage manual recommends removing elements in bulk, since with this container removal is slower. Can you extract the removal at L1043 V2SCopies.erase(C.ID) to a removeList and do it a after the loop?

I'm not totally comfortable doing that myself because I don't fully understand the first couple of loops in SIFixSGPRCopies::lowerVGPR2SGPRCopies, where some items from V2SCopies are pushed onto LoweringWorklist and then items from LoweringWorklist are looked up in V2SCopies and potentially removed.

@alex-t could you make the change that @Sisyph requested?

@marekolsak
Copy link

marekolsak commented Oct 31, 2023

I'm the original reporter of the bug in AMD's internal issue tracker.

I verify that this resolves the issue, which caused shader compilation to be randomly influenced by shaders compiled before them.

@jayfoad jayfoad merged commit a6dabed into llvm:main Oct 31, 2023
@jayfoad jayfoad deleted the iterate-sifixsgprcopies branch October 31, 2023 11:47
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Nov 2, 2023
Local branch amd-gfx adee082 Merged main:75b3c3d267bf into amd-gfx:d648e114f351
Remote branch main a6dabed [AMDGPU] Fix nondeterminism in SIFixSGPRCopies (llvm#70644)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants