Skip to content

[AMDGPU][CodeGen] Using MBB's liveIn check in tandem with MCRegAliasIterator in SILowerSGPRSpills #129848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 18, 2025

Conversation

vg0204
Copy link
Contributor

@vg0204 vg0204 commented Mar 5, 2025

This patch replaces use of MachineRegisterInfo's liveIn check with the machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with the entry MBB liveIns, when it comes to the machine verifier checks.

This patch stacks up on #129847.

PS: Its an alternative solution with respect to #126926.

@vg0204 vg0204 requested review from jayfoad, redstar, cdevadas and arsenm and removed request for jayfoad and redstar March 5, 2025 08:12
@vg0204 vg0204 self-assigned this Mar 5, 2025
@vg0204 vg0204 marked this pull request as ready for review March 5, 2025 08:14
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Vikash Gupta (vg0204)

Changes

This patch replaces use of MachineRegisterInfo's liveIn check with the machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with the entry MBB liveIns, when it comes to the machine verifier checks.

This patch stacks up on #129847


Full diff: https://github.com/llvm/llvm-project/pull/129848.diff

6 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineBasicBlock.h (+15)
  • (modified) llvm/lib/CodeGen/MachineBasicBlock.cpp (+29-5)
  • (modified) llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir (+17-20)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir (+32-32)
  • (modified) llvm/test/CodeGen/ARM/aes-erratum-fix.ll (+4-4)
diff --git a/llvm/include/llvm/CodeGen/MachineBasicBlock.h b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
index 2de96fa85b936..ab88177b63fed 100644
--- a/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+++ b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CODEGEN_MACHINEBASICBLOCK_H
 #define LLVM_CODEGEN_MACHINEBASICBLOCK_H
 
+#include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/ADT/GraphTraits.h"
 #include "llvm/ADT/SparseBitVector.h"
@@ -158,6 +159,7 @@ class MachineBasicBlock
 
   MachineFunction *xParent;
   Instructions Insts;
+  const TargetRegisterInfo *TRI;
 
   /// Keep track of the predecessor / successor basic blocks.
   SmallVector<MachineBasicBlock *, 4> Predecessors;
@@ -177,6 +179,10 @@ class MachineBasicBlock
   using LiveInVector = std::vector<RegisterMaskPair>;
   LiveInVector LiveIns;
 
+  /// Keeps track of live register units for those physical registers which
+  /// are livein of the basicblock.
+  BitVector LiveInRegUnits;
+
   /// Alignment of the basic block. One if the basic block does not need to be
   /// aligned.
   Align Alignment;
@@ -458,11 +464,17 @@ class MachineBasicBlock
   void addLiveIn(MCRegister PhysReg,
                  LaneBitmask LaneMask = LaneBitmask::getAll()) {
     LiveIns.push_back(RegisterMaskPair(PhysReg, LaneMask));
+    addLiveInRegUnit(PhysReg, LaneMask);
   }
   void addLiveIn(const RegisterMaskPair &RegMaskPair) {
     LiveIns.push_back(RegMaskPair);
+    addLiveInRegUnit(RegMaskPair.PhysReg, RegMaskPair.LaneMask);
   }
 
+  // Sets the register units for Reg based on the LaneMask in the
+  // LiveInRegUnits.
+  void addLiveInRegUnit(MCRegister Reg, LaneBitmask LaneMask);
+
   /// Sorts and uniques the LiveIns vector. It can be significantly faster to do
   /// this than repeatedly calling isLiveIn before calling addLiveIn for every
   /// LiveIn insertion.
@@ -484,6 +496,9 @@ class MachineBasicBlock
   void removeLiveIn(MCRegister Reg,
                     LaneBitmask LaneMask = LaneBitmask::getAll());
 
+  /// Resets the register units from LiveInRegUnits for the specified regsiters.
+  void removeLiveInRegUnit(MCRegister Reg);
+
   /// Return true if the specified register is in the live in set.
   bool isLiveIn(MCRegister Reg,
                 LaneBitmask LaneMask = LaneBitmask::getAll()) const;
diff --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp b/llvm/lib/CodeGen/MachineBasicBlock.cpp
index b3a71d1144726..0e5055d7bec2c 100644
--- a/llvm/lib/CodeGen/MachineBasicBlock.cpp
+++ b/llvm/lib/CodeGen/MachineBasicBlock.cpp
@@ -35,6 +35,7 @@
 #include "llvm/IR/ModuleSlotTracker.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
@@ -51,10 +52,12 @@ static cl::opt<bool> PrintSlotIndexes(
     cl::init(true), cl::Hidden);
 
 MachineBasicBlock::MachineBasicBlock(MachineFunction &MF, const BasicBlock *B)
-    : BB(B), Number(-1), xParent(&MF) {
+    : BB(B), Number(-1), xParent(&MF),
+      TRI(MF.getSubtarget().getRegisterInfo()) {
   Insts.Parent = this;
   if (B)
     IrrLoopHeaderWeight = B->getIrrLoopHeaderWeight();
+  LiveInRegUnits.resize(TRI->getNumRegUnits());
 }
 
 MachineBasicBlock::~MachineBasicBlock() = default;
@@ -597,6 +600,14 @@ void MachineBasicBlock::printAsOperand(raw_ostream &OS,
   printName(OS, 0);
 }
 
+void MachineBasicBlock::addLiveInRegUnit(MCRegister Reg, LaneBitmask LaneMask) {
+  for (MCRegUnitMaskIterator Unit(Reg, TRI); Unit.isValid(); ++Unit) {
+    LaneBitmask UnitMask = (*Unit).second;
+    if ((UnitMask & LaneMask).any())
+      LiveInRegUnits.set((*Unit).first);
+  }
+}
+
 void MachineBasicBlock::removeLiveIn(MCRegister Reg, LaneBitmask LaneMask) {
   LiveInVector::iterator I = find_if(
       LiveIns, [Reg](const RegisterMaskPair &LI) { return LI.PhysReg == Reg; });
@@ -604,21 +615,32 @@ void MachineBasicBlock::removeLiveIn(MCRegister Reg, LaneBitmask LaneMask) {
     return;
 
   I->LaneMask &= ~LaneMask;
-  if (I->LaneMask.none())
+  if (I->LaneMask.none()) {
     LiveIns.erase(I);
+    removeLiveInRegUnit(I->PhysReg);
+  }
 }
 
 MachineBasicBlock::livein_iterator
 MachineBasicBlock::removeLiveIn(MachineBasicBlock::livein_iterator I) {
   // Get non-const version of iterator.
   LiveInVector::iterator LI = LiveIns.begin() + (I - LiveIns.begin());
+  removeLiveInRegUnit(LI->PhysReg);
   return LiveIns.erase(LI);
 }
 
+void MachineBasicBlock::removeLiveInRegUnit(MCRegister Reg) {
+  for (MCRegUnit Unit : TRI->regunits(Reg))
+    LiveInRegUnits.reset(Unit);
+}
+
 bool MachineBasicBlock::isLiveIn(MCRegister Reg, LaneBitmask LaneMask) const {
-  livein_iterator I = find_if(
-      LiveIns, [Reg](const RegisterMaskPair &LI) { return LI.PhysReg == Reg; });
-  return I != livein_end() && (I->LaneMask & LaneMask).any();
+  for (MCRegUnitMaskIterator Unit(Reg, TRI); Unit.isValid(); ++Unit) {
+    LaneBitmask UnitMask = (*Unit).second;
+    if ((UnitMask & LaneMask).any() && LiveInRegUnits.test((*Unit).first))
+      return true;
+  }
+  return false;
 }
 
 void MachineBasicBlock::sortUniqueLiveIns() {
@@ -1751,12 +1773,14 @@ MachineBasicBlock::getEndClobberMask(const TargetRegisterInfo *TRI) const {
 
 void MachineBasicBlock::clearLiveIns() {
   LiveIns.clear();
+  LiveInRegUnits.reset();
 }
 
 void MachineBasicBlock::clearLiveIns(
     std::vector<RegisterMaskPair> &OldLiveIns) {
   assert(OldLiveIns.empty() && "Vector must be empty");
   std::swap(LiveIns, OldLiveIns);
+  LiveInRegUnits.reset();
 }
 
 MachineBasicBlock::livein_iterator MachineBasicBlock::livein_begin() const {
diff --git a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
index d27c523425feb..ad3dea7af5b71 100644
--- a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
@@ -128,7 +128,7 @@ static void insertCSRSaves(MachineBasicBlock &SaveBlock,
       // incoming register value, so don't kill at the spill point. This happens
       // since we pass some special inputs (workgroup IDs) in the callee saved
       // range.
-      const bool IsLiveIn = MRI.isLiveIn(Reg);
+      const bool IsLiveIn = SaveBlock.isLiveIn(Reg);
       TII.storeRegToStackSlot(SaveBlock, I, Reg, !IsLiveIn, CS.getFrameIdx(),
                               RC, TRI, Register());
 
diff --git a/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir b/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir
index ab960a7084528..3b616ea8140f9 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir
+++ b/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir
@@ -1,25 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: not --crash llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills %s -o /dev/null 2>&1 | FileCheck -check-prefix=VERIFIER %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s | FileCheck %s
 
-# FIXME : Currently, MRI's liveIn check for registers does not take the corresponding live-in's sub-registers into account. As a result
-# in SILowerSGPRSpills, the SubReg spill gets marked KILLED even though its SuperReg is in the function Live-ins. This causes machine
-# verifier to now fail at direct usage of that SubReg, which intially should not be any problem before adding spill.
-
-# VERIFIER: After SI lower SGPR spill instructions
-
-# VERIFIER: *** Bad machine code: Using an undefined physical register ***
-# VERIFIER: - instruction: S_NOP 0, implicit $sgpr50
-# VERIFIER-NEXT: - operand 1:   implicit $sgpr50
-
-# VERIFIER: *** Bad machine code: Using an undefined physical register ***
-# VERIFIER: - instruction: S_NOP 0, implicit $sgpr52
-# VERIFIER-NEXT: - operand 1:   implicit $sgpr52
-
-# VERIFIER: *** Bad machine code: Using an undefined physical register ***
-# VERIFIER: - instruction: S_NOP 0, implicit $sgpr55
-# VERIFIER-NEXT: - operand 1:   implicit $sgpr55
-
-# VERIFIER: LLVM ERROR: Found 3 machine code errors. 
 ---
 name: spill_partial_live_csr_sgpr_test
 tracksRegLiveness: true
@@ -31,6 +12,22 @@ body:             |
   bb.0:
     liveins: $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr54_sgpr55
 
+    ; CHECK-LABEL: name: spill_partial_live_csr_sgpr_test
+    ; CHECK: liveins: $sgpr50, $sgpr52, $sgpr53, $sgpr54, $sgpr55, $sgpr56, $vgpr63, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr54_sgpr55
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr50, 0, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr52, 1, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr53, 2, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr54, 3, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr55, 4, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr56, 5, $vgpr63
+    ; CHECK-NEXT: S_NOP 0, implicit $sgpr50
+    ; CHECK-NEXT: $sgpr50 = S_MOV_B32 0
+    ; CHECK-NEXT: S_NOP 0, implicit $sgpr52
+    ; CHECK-NEXT: $sgpr52_sgpr53 = S_MOV_B64 0
+    ; CHECK-NEXT: S_NOP 0, implicit $sgpr55
+    ; CHECK-NEXT: $sgpr54_sgpr55 = S_MOV_B64 0
+    ; CHECK-NEXT: $sgpr56 = S_MOV_B32 0
     S_NOP 0, implicit $sgpr50
     $sgpr50 = S_MOV_B32 0
     S_NOP 0, implicit $sgpr52
diff --git a/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir b/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
index dff2bd7f7aef9..63160fd2e948b 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
+++ b/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
@@ -55,38 +55,38 @@ body:             |
     ; GCN-LABEL: name: sgpr_spill_lane_crossover
     ; GCN: liveins: $sgpr10, $sgpr64, $sgpr65, $sgpr66, $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, $sgpr72, $sgpr73, $sgpr74, $sgpr75, $sgpr76, $sgpr77, $sgpr78, $sgpr79, $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, $sgpr85, $sgpr86, $sgpr87, $sgpr88, $sgpr89, $sgpr90, $sgpr91, $sgpr92, $sgpr93, $sgpr94, $sgpr95, $vgpr63, $sgpr30_sgpr31, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
     ; GCN-NEXT: {{  $}}
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr64, 0, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr65, 1, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr66, 2, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr67, 3, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr68, 4, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr69, 5, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr70, 6, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr71, 7, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr72, 8, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr73, 9, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr74, 10, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr75, 11, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr76, 12, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr77, 13, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr78, 14, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr79, 15, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr80, 16, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr81, 17, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr82, 18, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr83, 19, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr84, 20, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr85, 21, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr86, 22, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr87, 23, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr88, 24, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr89, 25, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr90, 26, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr91, 27, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr92, 28, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr93, 29, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr94, 30, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr95, 31, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr64, 0, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr65, 1, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr66, 2, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr67, 3, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr68, 4, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr69, 5, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr70, 6, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr71, 7, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr72, 8, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr73, 9, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr74, 10, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr75, 11, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr76, 12, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr77, 13, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr78, 14, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr79, 15, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr80, 16, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr81, 17, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr82, 18, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr83, 19, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr84, 20, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr85, 21, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr86, 22, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr87, 23, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr88, 24, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr89, 25, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr90, 26, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr91, 27, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr92, 28, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr93, 29, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr94, 30, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr95, 31, $vgpr63
     ; GCN-NEXT: S_NOP 0
     ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
     ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = SI_SPILL_S32_TO_VGPR killed $sgpr10, 0, [[DEF]]
diff --git a/llvm/test/CodeGen/ARM/aes-erratum-fix.ll b/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
index 82f5bfd02a56e..e1361d6efa780 100644
--- a/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
+++ b/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
@@ -68,8 +68,8 @@ define arm_aapcs_vfpcc void @aese_via_call2(half %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf16
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aese.8 q8, q0
 ; CHECK-FIX-NEXT:    aesmc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -89,8 +89,8 @@ define arm_aapcs_vfpcc void @aese_via_call3(float %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf32
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aese.8 q8, q0
 ; CHECK-FIX-NEXT:    aesmc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -2222,8 +2222,8 @@ define arm_aapcs_vfpcc void @aesd_via_call2(half %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf16
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aesd.8 q8, q0
 ; CHECK-FIX-NEXT:    aesimc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -2243,8 +2243,8 @@ define arm_aapcs_vfpcc void @aesd_via_call3(float %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf32
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aesd.8 q8, q0
 ; CHECK-FIX-NEXT:    aesimc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]

@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-backend-arm

Author: Vikash Gupta (vg0204)

Changes

This patch replaces use of MachineRegisterInfo's liveIn check with the machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with the entry MBB liveIns, when it comes to the machine verifier checks.

This patch stacks up on #129847


Full diff: https://github.com/llvm/llvm-project/pull/129848.diff

6 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineBasicBlock.h (+15)
  • (modified) llvm/lib/CodeGen/MachineBasicBlock.cpp (+29-5)
  • (modified) llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir (+17-20)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir (+32-32)
  • (modified) llvm/test/CodeGen/ARM/aes-erratum-fix.ll (+4-4)
diff --git a/llvm/include/llvm/CodeGen/MachineBasicBlock.h b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
index 2de96fa85b936..ab88177b63fed 100644
--- a/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+++ b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CODEGEN_MACHINEBASICBLOCK_H
 #define LLVM_CODEGEN_MACHINEBASICBLOCK_H
 
+#include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/ADT/GraphTraits.h"
 #include "llvm/ADT/SparseBitVector.h"
@@ -158,6 +159,7 @@ class MachineBasicBlock
 
   MachineFunction *xParent;
   Instructions Insts;
+  const TargetRegisterInfo *TRI;
 
   /// Keep track of the predecessor / successor basic blocks.
   SmallVector<MachineBasicBlock *, 4> Predecessors;
@@ -177,6 +179,10 @@ class MachineBasicBlock
   using LiveInVector = std::vector<RegisterMaskPair>;
   LiveInVector LiveIns;
 
+  /// Keeps track of live register units for those physical registers which
+  /// are livein of the basicblock.
+  BitVector LiveInRegUnits;
+
   /// Alignment of the basic block. One if the basic block does not need to be
   /// aligned.
   Align Alignment;
@@ -458,11 +464,17 @@ class MachineBasicBlock
   void addLiveIn(MCRegister PhysReg,
                  LaneBitmask LaneMask = LaneBitmask::getAll()) {
     LiveIns.push_back(RegisterMaskPair(PhysReg, LaneMask));
+    addLiveInRegUnit(PhysReg, LaneMask);
   }
   void addLiveIn(const RegisterMaskPair &RegMaskPair) {
     LiveIns.push_back(RegMaskPair);
+    addLiveInRegUnit(RegMaskPair.PhysReg, RegMaskPair.LaneMask);
   }
 
+  // Sets the register units for Reg based on the LaneMask in the
+  // LiveInRegUnits.
+  void addLiveInRegUnit(MCRegister Reg, LaneBitmask LaneMask);
+
   /// Sorts and uniques the LiveIns vector. It can be significantly faster to do
   /// this than repeatedly calling isLiveIn before calling addLiveIn for every
   /// LiveIn insertion.
@@ -484,6 +496,9 @@ class MachineBasicBlock
   void removeLiveIn(MCRegister Reg,
                     LaneBitmask LaneMask = LaneBitmask::getAll());
 
+  /// Resets the register units from LiveInRegUnits for the specified regsiters.
+  void removeLiveInRegUnit(MCRegister Reg);
+
   /// Return true if the specified register is in the live in set.
   bool isLiveIn(MCRegister Reg,
                 LaneBitmask LaneMask = LaneBitmask::getAll()) const;
diff --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp b/llvm/lib/CodeGen/MachineBasicBlock.cpp
index b3a71d1144726..0e5055d7bec2c 100644
--- a/llvm/lib/CodeGen/MachineBasicBlock.cpp
+++ b/llvm/lib/CodeGen/MachineBasicBlock.cpp
@@ -35,6 +35,7 @@
 #include "llvm/IR/ModuleSlotTracker.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
@@ -51,10 +52,12 @@ static cl::opt<bool> PrintSlotIndexes(
     cl::init(true), cl::Hidden);
 
 MachineBasicBlock::MachineBasicBlock(MachineFunction &MF, const BasicBlock *B)
-    : BB(B), Number(-1), xParent(&MF) {
+    : BB(B), Number(-1), xParent(&MF),
+      TRI(MF.getSubtarget().getRegisterInfo()) {
   Insts.Parent = this;
   if (B)
     IrrLoopHeaderWeight = B->getIrrLoopHeaderWeight();
+  LiveInRegUnits.resize(TRI->getNumRegUnits());
 }
 
 MachineBasicBlock::~MachineBasicBlock() = default;
@@ -597,6 +600,14 @@ void MachineBasicBlock::printAsOperand(raw_ostream &OS,
   printName(OS, 0);
 }
 
+void MachineBasicBlock::addLiveInRegUnit(MCRegister Reg, LaneBitmask LaneMask) {
+  for (MCRegUnitMaskIterator Unit(Reg, TRI); Unit.isValid(); ++Unit) {
+    LaneBitmask UnitMask = (*Unit).second;
+    if ((UnitMask & LaneMask).any())
+      LiveInRegUnits.set((*Unit).first);
+  }
+}
+
 void MachineBasicBlock::removeLiveIn(MCRegister Reg, LaneBitmask LaneMask) {
   LiveInVector::iterator I = find_if(
       LiveIns, [Reg](const RegisterMaskPair &LI) { return LI.PhysReg == Reg; });
@@ -604,21 +615,32 @@ void MachineBasicBlock::removeLiveIn(MCRegister Reg, LaneBitmask LaneMask) {
     return;
 
   I->LaneMask &= ~LaneMask;
-  if (I->LaneMask.none())
+  if (I->LaneMask.none()) {
     LiveIns.erase(I);
+    removeLiveInRegUnit(I->PhysReg);
+  }
 }
 
 MachineBasicBlock::livein_iterator
 MachineBasicBlock::removeLiveIn(MachineBasicBlock::livein_iterator I) {
   // Get non-const version of iterator.
   LiveInVector::iterator LI = LiveIns.begin() + (I - LiveIns.begin());
+  removeLiveInRegUnit(LI->PhysReg);
   return LiveIns.erase(LI);
 }
 
+void MachineBasicBlock::removeLiveInRegUnit(MCRegister Reg) {
+  for (MCRegUnit Unit : TRI->regunits(Reg))
+    LiveInRegUnits.reset(Unit);
+}
+
 bool MachineBasicBlock::isLiveIn(MCRegister Reg, LaneBitmask LaneMask) const {
-  livein_iterator I = find_if(
-      LiveIns, [Reg](const RegisterMaskPair &LI) { return LI.PhysReg == Reg; });
-  return I != livein_end() && (I->LaneMask & LaneMask).any();
+  for (MCRegUnitMaskIterator Unit(Reg, TRI); Unit.isValid(); ++Unit) {
+    LaneBitmask UnitMask = (*Unit).second;
+    if ((UnitMask & LaneMask).any() && LiveInRegUnits.test((*Unit).first))
+      return true;
+  }
+  return false;
 }
 
 void MachineBasicBlock::sortUniqueLiveIns() {
@@ -1751,12 +1773,14 @@ MachineBasicBlock::getEndClobberMask(const TargetRegisterInfo *TRI) const {
 
 void MachineBasicBlock::clearLiveIns() {
   LiveIns.clear();
+  LiveInRegUnits.reset();
 }
 
 void MachineBasicBlock::clearLiveIns(
     std::vector<RegisterMaskPair> &OldLiveIns) {
   assert(OldLiveIns.empty() && "Vector must be empty");
   std::swap(LiveIns, OldLiveIns);
+  LiveInRegUnits.reset();
 }
 
 MachineBasicBlock::livein_iterator MachineBasicBlock::livein_begin() const {
diff --git a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
index d27c523425feb..ad3dea7af5b71 100644
--- a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
@@ -128,7 +128,7 @@ static void insertCSRSaves(MachineBasicBlock &SaveBlock,
       // incoming register value, so don't kill at the spill point. This happens
       // since we pass some special inputs (workgroup IDs) in the callee saved
       // range.
-      const bool IsLiveIn = MRI.isLiveIn(Reg);
+      const bool IsLiveIn = SaveBlock.isLiveIn(Reg);
       TII.storeRegToStackSlot(SaveBlock, I, Reg, !IsLiveIn, CS.getFrameIdx(),
                               RC, TRI, Register());
 
diff --git a/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir b/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir
index ab960a7084528..3b616ea8140f9 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir
+++ b/llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir
@@ -1,25 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: not --crash llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills %s -o /dev/null 2>&1 | FileCheck -check-prefix=VERIFIER %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass=si-lower-sgpr-spills -o - %s | FileCheck %s
 
-# FIXME : Currently, MRI's liveIn check for registers does not take the corresponding live-in's sub-registers into account. As a result
-# in SILowerSGPRSpills, the SubReg spill gets marked KILLED even though its SuperReg is in the function Live-ins. This causes machine
-# verifier to now fail at direct usage of that SubReg, which intially should not be any problem before adding spill.
-
-# VERIFIER: After SI lower SGPR spill instructions
-
-# VERIFIER: *** Bad machine code: Using an undefined physical register ***
-# VERIFIER: - instruction: S_NOP 0, implicit $sgpr50
-# VERIFIER-NEXT: - operand 1:   implicit $sgpr50
-
-# VERIFIER: *** Bad machine code: Using an undefined physical register ***
-# VERIFIER: - instruction: S_NOP 0, implicit $sgpr52
-# VERIFIER-NEXT: - operand 1:   implicit $sgpr52
-
-# VERIFIER: *** Bad machine code: Using an undefined physical register ***
-# VERIFIER: - instruction: S_NOP 0, implicit $sgpr55
-# VERIFIER-NEXT: - operand 1:   implicit $sgpr55
-
-# VERIFIER: LLVM ERROR: Found 3 machine code errors. 
 ---
 name: spill_partial_live_csr_sgpr_test
 tracksRegLiveness: true
@@ -31,6 +12,22 @@ body:             |
   bb.0:
     liveins: $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr54_sgpr55
 
+    ; CHECK-LABEL: name: spill_partial_live_csr_sgpr_test
+    ; CHECK: liveins: $sgpr50, $sgpr52, $sgpr53, $sgpr54, $sgpr55, $sgpr56, $vgpr63, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr54_sgpr55
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr50, 0, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr52, 1, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr53, 2, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr54, 3, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr55, 4, $vgpr63
+    ; CHECK-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr56, 5, $vgpr63
+    ; CHECK-NEXT: S_NOP 0, implicit $sgpr50
+    ; CHECK-NEXT: $sgpr50 = S_MOV_B32 0
+    ; CHECK-NEXT: S_NOP 0, implicit $sgpr52
+    ; CHECK-NEXT: $sgpr52_sgpr53 = S_MOV_B64 0
+    ; CHECK-NEXT: S_NOP 0, implicit $sgpr55
+    ; CHECK-NEXT: $sgpr54_sgpr55 = S_MOV_B64 0
+    ; CHECK-NEXT: $sgpr56 = S_MOV_B32 0
     S_NOP 0, implicit $sgpr50
     $sgpr50 = S_MOV_B32 0
     S_NOP 0, implicit $sgpr52
diff --git a/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir b/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
index dff2bd7f7aef9..63160fd2e948b 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
+++ b/llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir
@@ -55,38 +55,38 @@ body:             |
     ; GCN-LABEL: name: sgpr_spill_lane_crossover
     ; GCN: liveins: $sgpr10, $sgpr64, $sgpr65, $sgpr66, $sgpr67, $sgpr68, $sgpr69, $sgpr70, $sgpr71, $sgpr72, $sgpr73, $sgpr74, $sgpr75, $sgpr76, $sgpr77, $sgpr78, $sgpr79, $sgpr80, $sgpr81, $sgpr82, $sgpr83, $sgpr84, $sgpr85, $sgpr86, $sgpr87, $sgpr88, $sgpr89, $sgpr90, $sgpr91, $sgpr92, $sgpr93, $sgpr94, $sgpr95, $vgpr63, $sgpr30_sgpr31, $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71, $sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79, $sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87, $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
     ; GCN-NEXT: {{  $}}
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr64, 0, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr65, 1, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr66, 2, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr67, 3, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr68, 4, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr69, 5, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr70, 6, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr71, 7, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr72, 8, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr73, 9, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr74, 10, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr75, 11, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr76, 12, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr77, 13, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr78, 14, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr79, 15, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr80, 16, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr81, 17, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr82, 18, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr83, 19, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr84, 20, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr85, 21, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr86, 22, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr87, 23, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr88, 24, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr89, 25, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr90, 26, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr91, 27, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr92, 28, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr93, 29, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr94, 30, $vgpr63
-    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr95, 31, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr64, 0, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr65, 1, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr66, 2, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr67, 3, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr68, 4, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr69, 5, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr70, 6, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr71, 7, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr72, 8, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr73, 9, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr74, 10, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr75, 11, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr76, 12, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr77, 13, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr78, 14, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr79, 15, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr80, 16, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr81, 17, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr82, 18, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr83, 19, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr84, 20, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr85, 21, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr86, 22, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr87, 23, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr88, 24, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr89, 25, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr90, 26, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr91, 27, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr92, 28, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr93, 29, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr94, 30, $vgpr63
+    ; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr95, 31, $vgpr63
     ; GCN-NEXT: S_NOP 0
     ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
     ; GCN-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = SI_SPILL_S32_TO_VGPR killed $sgpr10, 0, [[DEF]]
diff --git a/llvm/test/CodeGen/ARM/aes-erratum-fix.ll b/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
index 82f5bfd02a56e..e1361d6efa780 100644
--- a/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
+++ b/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
@@ -68,8 +68,8 @@ define arm_aapcs_vfpcc void @aese_via_call2(half %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf16
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aese.8 q8, q0
 ; CHECK-FIX-NEXT:    aesmc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -89,8 +89,8 @@ define arm_aapcs_vfpcc void @aese_via_call3(float %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf32
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aese.8 q8, q0
 ; CHECK-FIX-NEXT:    aesmc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -2222,8 +2222,8 @@ define arm_aapcs_vfpcc void @aesd_via_call2(half %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf16
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aesd.8 q8, q0
 ; CHECK-FIX-NEXT:    aesimc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -2243,8 +2243,8 @@ define arm_aapcs_vfpcc void @aesd_via_call3(float %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf32
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aesd.8 q8, q0
 ; CHECK-FIX-NEXT:    aesimc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]

Comment on lines 182 to 184
/// Keeps track of live register units for those physical registers which
/// are livein of the basicblock.
BitVector LiveInRegUnits;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocks should not track both LiveInRegUnits and LiveIns. We should change the representation here, but you should not do that as a part of a patch to fix SILowerSGPRSpills. You do not need to change the internal representation to fix the downstream use issue

Copy link
Contributor Author

@vg0204 vg0204 Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If we want MBB's isLiveIn to work as it's supposed to than LiveInRegUnits is needed.
  2. As for about fixing SILoweSGPRSpills patch, its not related to downstream use fix. It was hidden uptil now as due to use of implicit and implicit-def of suprregs alongwith killed subregs (SEE spill-sgpr-to-virtual-vgpr.mir) . It got found while I was developing the solution to get rid of implicit keywords usage.
  3. So, this fix is important stepping stone towards my solution related to http://ontrack-internal.amd.com/browse/SWDEV-498533.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we then proceed for #126926 @arsenm at the moment?

Copy link
Contributor Author

@vg0204 vg0204 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocks should not track both LiveInRegUnits and LiveIns.

What I feel both is important! LiveIns maintains the list of exact physReg with laneBitmask (what is live to use) as register form representation but need LiveInRegUnits to utilize that LiveIns (via regunits check) to precisely perform checks for isLiveIn like functions!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should decouple these issues. In this PR, just use the MCRegAliasIterator at the use point to get it functional. Fixing the APIs is for later

Copy link
Contributor Author

@vg0204 vg0204 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should decouple these issues. In this PR, just use the MCRegAliasIterator at the use point to get it functional. Fixing the APIs is for later

Can you bit more specific about the use site, are you talking about SILoweSGPRSpills or uses site as in isLiveIn of MBB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed it!

@vg0204 vg0204 requested a review from MatzeB March 5, 2025 13:22
@vg0204
Copy link
Contributor Author

vg0204 commented Mar 5, 2025

@MatzeB , as you were the one to come up with the liveIns of MBB as physReg+LaneBitmask representation. What's your view about this?

Copy link

github-actions bot commented Mar 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@vg0204 vg0204 force-pushed the vg0204/use-MBB-liveInCheck-siLowerSgprSpills branch from e532575 to c2c5c3e Compare March 7, 2025 08:20
@vg0204 vg0204 changed the title [AMDGPU][CodeGen] Utilized MBB's liveIn check in SILowerSGPRSpills [AMDGPU][CodeGen] Using MBB's liveIn check in tandem with MCRegAliasIterator in SILowerSGPRSpills Mar 7, 2025
@vg0204 vg0204 requested a review from arsenm March 7, 2025 13:39
@vg0204
Copy link
Contributor Author

vg0204 commented Mar 10, 2025

Kind Ping!

vg0204 added 3 commits March 10, 2025 10:24
Currently, the machine basicblock does not fully utilizes the
laneBitmask associated with physReg liveIns to check for the
precise liveness. Conservatively, it acts fully correct now, only
if all liveIns check for MBB is in form it defines it for itself.

So, now with the use of register units tracking for MBB's liveIns
, its possible to track & check liveness for all sorts of physRegs.
This patch replaces use of MachineRegisterInfo's liveIn check with the
machine basicBlock's liveIn. As the MRI's liveIn is inconsistent with
the entry MBB liveIns, when it comes to the machine verifier checks
@vg0204 vg0204 force-pushed the vg0204/use-MBB-liveInCheck-siLowerSgprSpills branch from c2c5c3e to 0268630 Compare March 10, 2025 11:12
@vg0204
Copy link
Contributor Author

vg0204 commented Mar 17, 2025

@arsenm please review this one!

Comment on lines 104 to 105
static bool isLiveInIntoMBB(MCRegister Reg, MachineBasicBlock &MBB,
const TargetRegisterInfo *TRI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static bool isLiveInIntoMBB(MCRegister Reg, MachineBasicBlock &MBB,
const TargetRegisterInfo *TRI) {
static bool isLiveIntoMBB(MCRegister Reg, MachineBasicBlock &MBB,
const TargetRegisterInfo *TRI) {

@@ -128,7 +136,7 @@ static void insertCSRSaves(MachineBasicBlock &SaveBlock,
// incoming register value, so don't kill at the spill point. This happens
// since we pass some special inputs (workgroup IDs) in the callee saved
// range.
const bool IsLiveIn = MRI.isLiveIn(Reg);
bool IsLiveIn = isLiveInIntoMBB(Reg, SaveBlock, TRI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool IsLiveIn = isLiveInIntoMBB(Reg, SaveBlock, TRI);
bool IsLiveIn = isLiveIntoMBB(Reg, SaveBlock, TRI);

; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr85, 13, $vgpr63
; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr86, 14, $vgpr63
; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr87, 15, $vgpr63
; GCN-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr64, 0, $vgpr63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing these kill flags isn't ideal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just another case mimicing the added test case accounting for this change!

@vg0204 vg0204 force-pushed the vg0204/use-MBB-liveInCheck-siLowerSgprSpills branch from 97b3075 to b2f303e Compare March 18, 2025 05:19
@vg0204 vg0204 merged commit bdb6320 into llvm:main Mar 18, 2025
6 of 10 checks passed
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 19, 2025
…egAliasIterator in SILowerSGPRSpills (llvm#129848)"

This reverts commit bdb6320.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants