Skip to content

Conversation

@jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Oct 30, 2023

There are a couple of loops that iterate over V2SCopies. The iteration
order needs to be deterministic, otherwise we can call moveToVALU in
different orders, which causes temporary vregs to be allocated in
different orders, which can affect register allocation heuristics.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

There are a couple of loops that iterate over V2SCopies. The iteration
order needs to be deterministic, otherwise we can call moveToVALU in
different orders, which causes temporary vregs to be allocated in
different orders, which can affect register allocation heuristics.


Full diff: https://github.com/llvm/llvm-project/pull/70644.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp (+4-4)
diff --git a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
index b32ed9fef5dd34e..3e6ed2d793ae563 100644
--- a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
@@ -125,7 +125,7 @@ class SIFixSGPRCopies : public MachineFunctionPass {
   SmallVector<MachineInstr*, 4> PHINodes;
   SmallVector<MachineInstr*, 4> S2VCopies;
   unsigned NextVGPRToSGPRCopyID;
-  DenseMap<unsigned, V2SCopyInfo> V2SCopies;
+  MapVector<unsigned, V2SCopyInfo> V2SCopies;
   DenseMap<MachineInstr *, SetVector<unsigned>> SiblingPenalty;
 
 public:
@@ -988,7 +988,7 @@ bool SIFixSGPRCopies::needToBeConvertedToVALU(V2SCopyInfo *Info) {
   for (auto J : Info->Siblings) {
     auto InfoIt = V2SCopies.find(J);
     if (InfoIt != V2SCopies.end()) {
-      MachineInstr *SiblingCopy = InfoIt->getSecond().Copy;
+      MachineInstr *SiblingCopy = InfoIt->second.Copy;
       if (SiblingCopy->isImplicitDef())
         // the COPY has already been MoveToVALUed
         continue;
@@ -1023,12 +1023,12 @@ void SIFixSGPRCopies::lowerVGPR2SGPRCopies(MachineFunction &MF) {
     unsigned CurID = LoweringWorklist.pop_back_val();
     auto CurInfoIt = V2SCopies.find(CurID);
     if (CurInfoIt != V2SCopies.end()) {
-      V2SCopyInfo C = CurInfoIt->getSecond();
+      V2SCopyInfo C = CurInfoIt->second;
       LLVM_DEBUG(dbgs() << "Processing ...\n"; C.dump());
       for (auto S : C.Siblings) {
         auto SibInfoIt = V2SCopies.find(S);
         if (SibInfoIt != V2SCopies.end()) {
-          V2SCopyInfo &SI = SibInfoIt->getSecond();
+          V2SCopyInfo &SI = SibInfoIt->second;
           LLVM_DEBUG(dbgs() << "Sibling:\n"; SI.dump());
           if (!SI.NeedToBeConvertedToVALU) {
             SI.SChain.set_subtract(C.SChain);

@jayfoad
Copy link
Contributor Author

jayfoad commented Oct 30, 2023

@marekolsak

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test? I assume it would be a flaky test before this patch, but should pass every time after this patch.

https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h
The usage manual recommends removing elements in bulk, since with this container removal is slower. Can you extract the removal at L1043 V2SCopies.erase(C.ID)
to a removeList and do it a after the loop?

There are a couple of loops that iterate over V2SCopies. The iteration
order needs to be deterministic, otherwise we can call moveToVALU in
different orders, which causes temporary vregs to be allocated in
different orders, which can affect register allocation heuristics.
@jayfoad
Copy link
Contributor Author

jayfoad commented Oct 30, 2023

Can you add a test? I assume it would be a flaky test before this patch, but should pass every time after this patch.

OK I've force pushed (sorry) to demonstrate adding a test with the codegen I happened to get on my machine, and then the second commit shows how the codegen changes to something that hopefully should be the same for everyone.

@jayfoad
Copy link
Contributor Author

jayfoad commented Oct 30, 2023

https://llvm.org/docs/ProgrammersManual.html#llvm-adt-mapvector-h The usage manual recommends removing elements in bulk, since with this container removal is slower. Can you extract the removal at L1043 V2SCopies.erase(C.ID) to a removeList and do it a after the loop?

I'm not totally comfortable doing that myself because I don't fully understand the first couple of loops in SIFixSGPRCopies::lowerVGPR2SGPRCopies, where some items from V2SCopies are pushed onto LoweringWorklist and then items from LoweringWorklist are looked up in V2SCopies and potentially removed.

@alex-t could you make the change that @Sisyph requested?

@marekolsak
Copy link

marekolsak commented Oct 31, 2023

I'm the original reporter of the bug in AMD's internal issue tracker.

I verify that this resolves the issue, which caused shader compilation to be randomly influenced by shaders compiled before them.

@jayfoad jayfoad merged commit a6dabed into llvm:main Oct 31, 2023
@jayfoad jayfoad deleted the iterate-sifixsgprcopies branch October 31, 2023 11:47
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Nov 2, 2023
Local branch amd-gfx adee082 Merged main:75b3c3d267bf into amd-gfx:d648e114f351
Remote branch main a6dabed [AMDGPU] Fix nondeterminism in SIFixSGPRCopies (llvm#70644)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants