Skip to content

[WIP][CodeGen] Modifying MBB's liveins representation as into regUnits #129847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

vg0204
Copy link
Contributor

@vg0204 vg0204 commented Mar 5, 2025

Currently, the machine basicblock does not fully utilizes the laneBitmask associated with physReg liveIns to check for the precise liveness. Conservatively, it acts fully correct now, only if all liveIns check for MBB is in form it defines it for itself.

So, now with the use of register units tracking for MBB's liveIns , its possible to track & check liveness for all sorts of physRegs.

@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-tools-llvm-exegesis
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-arm

Author: Vikash Gupta (vg0204)

Changes

Currently, the machine basicblock does not fully utilizes the laneBitmask associated with physReg liveIns to check for the precise liveness. Conservatively, it acts fully correct now, only if all liveIns check for MBB is in form it defines it for itself.

So, now with the use of register units tracking for MBB's liveIns , its possible to track & check liveness for all sorts of physRegs.

It is needed to handle #129848


Full diff: https://github.com/llvm/llvm-project/pull/129847.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineBasicBlock.h (+15)
  • (modified) llvm/lib/CodeGen/MachineBasicBlock.cpp (+29-5)
  • (modified) llvm/test/CodeGen/ARM/aes-erratum-fix.ll (+4-4)
diff --git a/llvm/include/llvm/CodeGen/MachineBasicBlock.h b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
index 2de96fa85b936..ab88177b63fed 100644
--- a/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+++ b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CODEGEN_MACHINEBASICBLOCK_H
 #define LLVM_CODEGEN_MACHINEBASICBLOCK_H
 
+#include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/ADT/GraphTraits.h"
 #include "llvm/ADT/SparseBitVector.h"
@@ -158,6 +159,7 @@ class MachineBasicBlock
 
   MachineFunction *xParent;
   Instructions Insts;
+  const TargetRegisterInfo *TRI;
 
   /// Keep track of the predecessor / successor basic blocks.
   SmallVector<MachineBasicBlock *, 4> Predecessors;
@@ -177,6 +179,10 @@ class MachineBasicBlock
   using LiveInVector = std::vector<RegisterMaskPair>;
   LiveInVector LiveIns;
 
+  /// Keeps track of live register units for those physical registers which
+  /// are livein of the basicblock.
+  BitVector LiveInRegUnits;
+
   /// Alignment of the basic block. One if the basic block does not need to be
   /// aligned.
   Align Alignment;
@@ -458,11 +464,17 @@ class MachineBasicBlock
   void addLiveIn(MCRegister PhysReg,
                  LaneBitmask LaneMask = LaneBitmask::getAll()) {
     LiveIns.push_back(RegisterMaskPair(PhysReg, LaneMask));
+    addLiveInRegUnit(PhysReg, LaneMask);
   }
   void addLiveIn(const RegisterMaskPair &RegMaskPair) {
     LiveIns.push_back(RegMaskPair);
+    addLiveInRegUnit(RegMaskPair.PhysReg, RegMaskPair.LaneMask);
   }
 
+  // Sets the register units for Reg based on the LaneMask in the
+  // LiveInRegUnits.
+  void addLiveInRegUnit(MCRegister Reg, LaneBitmask LaneMask);
+
   /// Sorts and uniques the LiveIns vector. It can be significantly faster to do
   /// this than repeatedly calling isLiveIn before calling addLiveIn for every
   /// LiveIn insertion.
@@ -484,6 +496,9 @@ class MachineBasicBlock
   void removeLiveIn(MCRegister Reg,
                     LaneBitmask LaneMask = LaneBitmask::getAll());
 
+  /// Resets the register units from LiveInRegUnits for the specified regsiters.
+  void removeLiveInRegUnit(MCRegister Reg);
+
   /// Return true if the specified register is in the live in set.
   bool isLiveIn(MCRegister Reg,
                 LaneBitmask LaneMask = LaneBitmask::getAll()) const;
diff --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp b/llvm/lib/CodeGen/MachineBasicBlock.cpp
index b3a71d1144726..0e5055d7bec2c 100644
--- a/llvm/lib/CodeGen/MachineBasicBlock.cpp
+++ b/llvm/lib/CodeGen/MachineBasicBlock.cpp
@@ -35,6 +35,7 @@
 #include "llvm/IR/ModuleSlotTracker.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
@@ -51,10 +52,12 @@ static cl::opt<bool> PrintSlotIndexes(
     cl::init(true), cl::Hidden);
 
 MachineBasicBlock::MachineBasicBlock(MachineFunction &MF, const BasicBlock *B)
-    : BB(B), Number(-1), xParent(&MF) {
+    : BB(B), Number(-1), xParent(&MF),
+      TRI(MF.getSubtarget().getRegisterInfo()) {
   Insts.Parent = this;
   if (B)
     IrrLoopHeaderWeight = B->getIrrLoopHeaderWeight();
+  LiveInRegUnits.resize(TRI->getNumRegUnits());
 }
 
 MachineBasicBlock::~MachineBasicBlock() = default;
@@ -597,6 +600,14 @@ void MachineBasicBlock::printAsOperand(raw_ostream &OS,
   printName(OS, 0);
 }
 
+void MachineBasicBlock::addLiveInRegUnit(MCRegister Reg, LaneBitmask LaneMask) {
+  for (MCRegUnitMaskIterator Unit(Reg, TRI); Unit.isValid(); ++Unit) {
+    LaneBitmask UnitMask = (*Unit).second;
+    if ((UnitMask & LaneMask).any())
+      LiveInRegUnits.set((*Unit).first);
+  }
+}
+
 void MachineBasicBlock::removeLiveIn(MCRegister Reg, LaneBitmask LaneMask) {
   LiveInVector::iterator I = find_if(
       LiveIns, [Reg](const RegisterMaskPair &LI) { return LI.PhysReg == Reg; });
@@ -604,21 +615,32 @@ void MachineBasicBlock::removeLiveIn(MCRegister Reg, LaneBitmask LaneMask) {
     return;
 
   I->LaneMask &= ~LaneMask;
-  if (I->LaneMask.none())
+  if (I->LaneMask.none()) {
     LiveIns.erase(I);
+    removeLiveInRegUnit(I->PhysReg);
+  }
 }
 
 MachineBasicBlock::livein_iterator
 MachineBasicBlock::removeLiveIn(MachineBasicBlock::livein_iterator I) {
   // Get non-const version of iterator.
   LiveInVector::iterator LI = LiveIns.begin() + (I - LiveIns.begin());
+  removeLiveInRegUnit(LI->PhysReg);
   return LiveIns.erase(LI);
 }
 
+void MachineBasicBlock::removeLiveInRegUnit(MCRegister Reg) {
+  for (MCRegUnit Unit : TRI->regunits(Reg))
+    LiveInRegUnits.reset(Unit);
+}
+
 bool MachineBasicBlock::isLiveIn(MCRegister Reg, LaneBitmask LaneMask) const {
-  livein_iterator I = find_if(
-      LiveIns, [Reg](const RegisterMaskPair &LI) { return LI.PhysReg == Reg; });
-  return I != livein_end() && (I->LaneMask & LaneMask).any();
+  for (MCRegUnitMaskIterator Unit(Reg, TRI); Unit.isValid(); ++Unit) {
+    LaneBitmask UnitMask = (*Unit).second;
+    if ((UnitMask & LaneMask).any() && LiveInRegUnits.test((*Unit).first))
+      return true;
+  }
+  return false;
 }
 
 void MachineBasicBlock::sortUniqueLiveIns() {
@@ -1751,12 +1773,14 @@ MachineBasicBlock::getEndClobberMask(const TargetRegisterInfo *TRI) const {
 
 void MachineBasicBlock::clearLiveIns() {
   LiveIns.clear();
+  LiveInRegUnits.reset();
 }
 
 void MachineBasicBlock::clearLiveIns(
     std::vector<RegisterMaskPair> &OldLiveIns) {
   assert(OldLiveIns.empty() && "Vector must be empty");
   std::swap(LiveIns, OldLiveIns);
+  LiveInRegUnits.reset();
 }
 
 MachineBasicBlock::livein_iterator MachineBasicBlock::livein_begin() const {
diff --git a/llvm/test/CodeGen/ARM/aes-erratum-fix.ll b/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
index 82f5bfd02a56e..e1361d6efa780 100644
--- a/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
+++ b/llvm/test/CodeGen/ARM/aes-erratum-fix.ll
@@ -68,8 +68,8 @@ define arm_aapcs_vfpcc void @aese_via_call2(half %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf16
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aese.8 q8, q0
 ; CHECK-FIX-NEXT:    aesmc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -89,8 +89,8 @@ define arm_aapcs_vfpcc void @aese_via_call3(float %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf32
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aese.8 q8, q0
 ; CHECK-FIX-NEXT:    aesmc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -2222,8 +2222,8 @@ define arm_aapcs_vfpcc void @aesd_via_call2(half %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf16
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aesd.8 q8, q0
 ; CHECK-FIX-NEXT:    aesimc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]
@@ -2243,8 +2243,8 @@ define arm_aapcs_vfpcc void @aesd_via_call3(float %0, ptr %1) nounwind {
 ; CHECK-FIX-NEXT:    push {r4, lr}
 ; CHECK-FIX-NEXT:    mov r4, r0
 ; CHECK-FIX-NEXT:    bl get_inputf32
-; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    vld1.64 {d16, d17}, [r4]
+; CHECK-FIX-NEXT:    vorr q0, q0, q0
 ; CHECK-FIX-NEXT:    aesd.8 q8, q0
 ; CHECK-FIX-NEXT:    aesimc.8 q8, q8
 ; CHECK-FIX-NEXT:    vst1.64 {d16, d17}, [r4]

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm test changes look correct. I haven't checked the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this file look correct - the vorr qN, qN, qN is necessary and has to be done to the input of the aes instruction pair, which it is still.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing most of the changes necessary to do this. We need a full migration away from the existing storage as reg + lane mask, to the register units. This requires MIR print and parsing changes, and I'm sure there will be downstream fallout in use sites

@vg0204
Copy link
Contributor Author

vg0204 commented Mar 7, 2025

This is missing most of the changes necessary to do this. We need a full migration away from the existing storage as reg + lane mask, to the register units. This requires MIR print and parsing changes, and I'm sure there will be downstream fallout in use sites

So, should I start this as separate task entirely? Also, don't we need MCphysReg mandatorily as part of liveIn representation too, as that is used entirely all over the llvm codegen, as regunits alone cannot hold that sense? Once we go into realm of regUnits, it would not be possible to know their exact source MCPhysReg, if not tracked parallely.

Doesn't something like reg + liveRegUnit is better, as its still technically as same as reg + lane. Also, now rather than maintininng list of liveIn regs, we can maintain a set of all liveIn regUnits (basically union of liveRegUnit for all regs in MBB)

@vg0204
Copy link
Contributor Author

vg0204 commented Mar 10, 2025

Also, the liveIn registers represent physReg which need not be entirely live into MBB (so the laneMask makes sense), but migrating entirely into regUnits (representing only live parts of liveIn physRegs) could cause severe fallout as you mentioned at use sites. So, we strongly cannot eliminate the register concept as liveIn of MBB I fee!

Could you through some light on it @arsenm!

@vg0204
Copy link
Contributor Author

vg0204 commented Mar 17, 2025

KInd Ping @arsenm !

@vg0204
Copy link
Contributor Author

vg0204 commented Mar 25, 2025

@arsenm can you comments on my thought (the latest comments) regarding the usage of both LiveRegUnits & LiveInList hand-in-hand, so I can proceed with it accordingly!

cc : @jayfoad , @MatzeB

@arsenm
Copy link
Contributor

arsenm commented Mar 25, 2025

This patch is about 5% of the changes required. This requires fully removing the existing block tracking in terms of physical registers. As a first step, I think you should address introduction of LiveRegUnitMasks. We need a new MachineOperand type, a MIR section to print a compacted regunit mask with an ID number for referencing in the instruction stream (similar to how frame indexes work).

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 1, 2025

This patch is about 5% of the changes required. This requires fully removing the existing block tracking in terms of physical registers. As a first step, I think you should address introduction of LiveRegUnitMasks. We need a new MachineOperand type, a MIR section to print a compacted regunit mask with an ID number for referencing in the instruction stream (similar to how frame indexes work).

Just a brief of things I need to do as per my study (please let me know if anything is misunderstood) :

  1. Need to have a list of LaneBitmasks in MF (or in MFI, where it makes more sense ??) used & its corresponding YAML serialization object inr MIRYamlMapping. Needed both MIRParser & MIRPrinter logics for it respectively.
  2. Defining lexing and parsing for LaneBitmask identifier token object respectively in MILexer & MIParser.
  3. Finally need a new index MO to track & reference the LaneBitmask objects in the MIs.

Did I miss anything @arsenm ?

@arsenm
Copy link
Contributor

arsenm commented Apr 1, 2025

I guess those points are related work we need to move all infrastructure to work on register units. If you're just talking about changing the block live in representation, we can probably get away without it.

We do need MIR print/parse changes for the liveins in terms of register units. It may or may not make sense to share code with regunit masks for that.

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 1, 2025

We need a new MachineOperand type, a MIR section to print a compacted regunit mask with an ID number for referencing in the instruction stream (similar to how frame indexes work).

If we want to achieve something like this, I am unable to see how can we get away from all steps mentioned, as I deduced those after back-tracking the usage of ID number referencing for FI in the instructions.

Secondly, liveins as the regunit mask in someway (alongwith physReg) & representing them completely as register units apparently seems same but are 2 things as in the latter, the concept of liveins as physReg will vanish irreversibly causing significant downfall at use-site of liveins()

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 1, 2025

We do need MIR print/parse changes for the liveins in terms of register units. It may or may not make sense to share code with regunit masks for that.

This seems not difficult to do once the clarity on livein representation has been reached!

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 3, 2025

Gentle ping @arsenm !

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 8, 2025

@arsenm , any final thoughts & suggestions on my doubts ?

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 14, 2025

We do need MIR print/parse changes for the liveins in terms of register units. It may or may not make sense to share code with regunit masks for that.

@arsenm don't you think it's as a big change (from its impact perspective, even not if implementation) that it can break everything, as unlike just introduction of optional laneBitmask to physeg, as supported now, is an extension which supported backward compatability, but this will need compulsory changes in every test files for all archs, as we are supposedly completely stripping physReg concept, replacing with regunits as block liveins , so what about this?

Secondly, liveins as the regunit mask in someway (alongwith physReg) & representing them completely as register units apparently seems same but are 2 things as in the latter, the concept of liveins as physReg will vanish irreversibly causing significant downfall at use-site of liveins()

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 28, 2025

@arsenm don't you think it's as a big change (from its impact perspective, even not if implementation) that it can break everything, as unlike just introduction of optional laneBitmask to physeg, as supported now, is an extension which supported backward compatability, but this will need compulsory changes in every test files for all archs, as we are supposedly completely stripping physReg concept, replacing with regunits as block liveins , so what about this?

Secondly, liveins as the regunit mask in someway (alongwith physReg) & representing them completely as register units apparently seems same but are 2 things as in the latter, the concept of liveins as physReg will vanish irreversibly causing significant downfall at use-site of liveins()

Hey @arsenm, if you could spare sometime and comment on this, it would be great!

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 29, 2025

Kind Ping @jayfoad @MatzeB !

@vg0204
Copy link
Contributor Author

vg0204 commented Apr 29, 2025

Could one solution be : We can store only the live regunits (for either of its root) as MCPhysReg in LiveInList using the its attached laneBitMask value provided (or entire Reg in case laneBitmask is set as All), rather than storing the laneBitMask as part of the LiveIn. For Example :

liveins: $vgpr0_vgpr1:0x00003, $vgpr4, $vgpr5:0x000001

will be transformed into :

liveins: $vgpr0_lo16, $vgpr0_hi16, $vgpr4, $vgpr5_hi16

Does it make sense, along the line of my doubts just mentioned above?

@arsenm
Copy link
Contributor

arsenm commented Apr 29, 2025

@arsenm don't you think it's as a big change (from its impact perspective, even not if implementation) that it can break everything, as unlike just introduction of optional laneBitmask to physeg, as supported now, is an extension which supported backward compatability, but this will need compulsory changes in every test files for all archs, as we are supposedly completely stripping physReg concept, replacing with regunits as block liveins , so what about this?

There is no compatibility concern, and every target will need to be updated. Just introducing a parallel structure with the same purpose is actively harmful. This is a full migration effort, I don't see how it makes sense for both livein representations to coexist. It will only provide a new source of inconsistency bugs and waste compile time maintaining both cases.

Secondly, liveins as the regunit mask in someway (alongwith physReg) & representing them completely as register units apparently seems same but are 2 things as in the latter, the concept of liveins as physReg will vanish irreversibly causing significant downfall at use-site of liveins()

What do you mean by "significant downfall"? Conceptually I don't see why you should need to enumerate the live in registers frequently. Most uses care about whether the register is live or not

will be transformed into :
liveins: $vgpr0_lo16, $vgpr0_hi16, $vgpr4, $vgpr5_hi16
Does it make sense, along the line of my doubts just mentioned above?

This is the whole point, re-encode the live-ins in terms of register units

@qcolombet
Copy link
Collaborator

should the issue #138552 need to be resolved completely before this works can be taken forward

Yes, I think we need to give leaf registers their own unique regunit, even in the presence of ad hoc aliases.

In the VE case, SX0 has two subregs SW0 and SF0 which alias each other. Currently they all have the same single regunit (number 26) so a regnit-based representation can't even distinguish a superreg from its subregs.

cc @qcolombet

I haven't read carefully the whole thread, but I agree with this.

@qcolombet
Copy link
Collaborator

Also, side question, what is the point of changing the liveins representation?
What does it solve/allow?

@vg0204
Copy link
Contributor Author

vg0204 commented May 7, 2025

Also, side question, what is the point of changing the liveins representation?
What does it solve/allow?

I changed it as tracking of live-ins as regUnits (which are constant in number) rather than registers are quite less expensive. As the number of registers in arch like ourselves can go up as 9k due to aliasing & RegisterTuple stuff, which can be avoided due to exact nature of it

@vg0204
Copy link
Contributor Author

vg0204 commented May 7, 2025

Yes, I think we need to give leaf registers their own unique regunit, even in the presence of ad hoc aliases.

As, I looked I think we need to make changes here to handle it :

// Initialize RegUnitList. Because getSubRegs is called recursively, this

So, anyone taking care of it as of now, because its needed to be done for my work to move forward?

@jayfoad
Copy link
Contributor

jayfoad commented May 7, 2025

So, anyone taking care of it as of now

No-one is working on it, as far as I know.

@vg0204
Copy link
Contributor Author

vg0204 commented May 7, 2025

So, anyone taking care of it as of now

No-one is working on it, as far as I know.

Its quiet easy to achieve from implemetation point of view (cannot say the same if its have conceptually implication at big). So, I am just in the process of trying it locally.

@arsenm
Copy link
Contributor

arsenm commented May 7, 2025

Also, side question, what is the point of changing the liveins representation? What does it solve/allow?

It's confusing and unclear how to interpret mixed aliasing register entries with the lanemasks. There's also a compile time cost every time we need to translate between the physical register representation and the actual liveness tracking in regunits. Since these are purely used for liveness, they should always be in terms of regunits

@vg0204
Copy link
Contributor Author

vg0204 commented May 7, 2025

So, anyone taking care of it as of now

No-one is working on it, as far as I know.

Its quiet easy to achieve from implemetation point of view (cannot say the same if its have conceptually implication at big). So, I am just in the process of trying it locally.

It worked as expected, should I proceed with it as a PR for review @jayfoad !

@jayfoad
Copy link
Contributor

jayfoad commented May 8, 2025

So, anyone taking care of it as of now

No-one is working on it, as far as I know.

Its quiet easy to achieve from implemetation point of view (cannot say the same if its have conceptually implication at big). So, I am just in the process of trying it locally.

It worked as expected, should I proceed with it as a PR for review @jayfoad !

Sure.

Copy link

github-actions bot commented May 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@vg0204
Copy link
Contributor Author

vg0204 commented May 16, 2025

@jayfoad , @arsenm , can you review this, as now only about 2k LIT tests (all across different targets) needs to be updated (as all having ExitCode : 1). How should do we think they should be handled?

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I quite like the idea of tracking regunit roots instead of regunits themselves. At least it makes this a smaller incremental change.

@@ -13,7 +13,9 @@
#ifndef LLVM_CODEGEN_MACHINEBASICBLOCK_H
#define LLVM_CODEGEN_MACHINEBASICBLOCK_H

#include "llvm/ADT/BitVector.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this

@@ -174,8 +176,7 @@ class MachineBasicBlock
std::optional<uint64_t> IrrLoopHeaderWeight;

/// Keep track of the physical registers that are livein of the basicblock.
using LiveInVector = std::vector<RegisterMaskPair>;
LiveInVector LiveIns;
DenseSet<MCRegister> LiveIns;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's a good idea to change from std::vector to DenseSet. The patch would be less invasive if you just changed it to std:vector<MCRegister>

You should at least benchmark it using http://llvm-compile-time-tracker.com/. You can do this by asking @nikic to add your fork of LLVM, and then any branches you create called "perf/*" will automatically get benchmarked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's a good idea to change from std::vector to DenseSet. The patch would be less invasive if you just changed it to std:vector

It makes sense so aliasing registers (or tuples) will have common regunits (or their roots) to track them uniquely (in set) and check for isLiveIn would be cheaper.

You should at least benchmark it using http://llvm-compile-time-tracker.com/. You can do this by asking @nikic to add your fork of LLVM, and then any branches you create called "perf/*" will automatically get benchmarked.

Sure!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic can you add my forked repo https://github.com/vg0204/llvm-project.git for benchmarking it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense so aliasing registers (or tuples) will have common regunits (or their roots) to track them uniquely

Right, but the existing code already did that by calling sortUniqueLiveIns after adding all the individual livein registers. Changing to a set might be better, but it should be done as an orthogonal change (either before or after this patch) and it should be benchmarked.

@@ -88,7 +88,7 @@ void LiveRegUnits::accumulate(const MachineInstr &MI) {
static void addBlockLiveIns(LiveRegUnits &LiveUnits,
const MachineBasicBlock &MBB) {
for (const auto &LI : MBB.liveins())
LiveUnits.addRegMasked(LI.PhysReg, LI.LaneMask);
LiveUnits.addRegMasked(LI, LaneBitmask::getAll());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further simplification is possible here, since this is the only call to addRegMasked so it no longer needs a Mask argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further simplification is possible here, since this is the only call to addRegMasked so it no longer needs a Mask argument.

Yes, agreed, for that, addRegmasked requires a default lanemask arg which is not the current case!

OS << LS << printReg(LI.PhysReg, &TRI);
if (!LI.LaneMask.all())
OS << ":0x" << PrintLaneMask(LI.LaneMask);
bool First = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should continue to use LS instead of First. You only need to remove the part that prints the lanemask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new to me, as it might have happened when rebased the tip on latest llvm main. But, noted!

Comment on lines 71 to 72
Classes[Reg] = reinterpret_cast<TargetRegisterClass *>(-1);
KillIndices[Reg] = BBSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change the Reg.id() part. It just makes the patch bigger and harder to review.

OS << LS << printReg(LI.PhysReg, TRI);
if (!LI.LaneMask.all())
OS << ":0x" << PrintLaneMask(LI.LaneMask);
for (const MCRegister Reg : liveins()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the const. (Pointers to const and const references are good, but plain const is not.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason not to use DenseMap: the iteration order here will be undefined, so we will get spurious differences in MIR output printing.

LaneMask |= J->LaneMask;
Out->PhysReg = PhysReg;
Out->LaneMask = LaneMask;
if(!Reg.isPhysical())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an assert

It caused failure for 2-3 LIT testcases, as its invoked at few places for MCRegs(which are virtRegs as assigned by MRI), not MCPhysReg. As you can see asserted at other livein-related API's!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised #140527 for this. If you find other problems like this, please take the time to raise a separate PR to fix them, so that this PR stays focussed on doing one thing.

Comment on lines +627 to +635
if (I == LiveIns.end())
return I;

DenseSet<MCRegister>::iterator start = LiveIns.begin();
while (start != I)
start++;
MachineBasicBlock::livein_iterator next = start;
LiveIns.erase(start);
return next++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty horrible and I'm not sure it works.

Either leave liveins as a std::vector, in which case this code does not need to change.

Or remove this method altogether. It is only used in one place, in X86FloatingPoint, and that code can ber rewritten to do this in some other way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Or remove this method altogether. It is only used in one place, in X86FloatingPoint, and that code can ber rewritten to do this in some other way.

Does this alternative make sense using make_early_inc_range

static unsigned calcLiveInMask(MachineBasicBlock *MBB, bool RemoveFPs) {
  unsigned Mask = 0;
  for (auto &I : llvm::make_early_inc_range(MBB->liveins())) {
    MCPhysReg Reg = *I;
    static_assert(X86::FP6 - X86::FP0 == 6, "sequential regnums");
    if (Reg >= X86::FP0 && Reg <= X86::FP6) {
      Mask |= 1 << (Reg - X86::FP0);
      if (RemoveFPs) {
        MBB->removeLiveIn(I);
        continue;
      }
    }
  }
  return Mask;
}

}

Register
MachineBasicBlock::addLiveIn(MCRegister PhysReg, const TargetRegisterClass *RC) {
assert(getParent() && "MBB must be inserted in function");
assert(PhysReg.isPhysical() && "Expected physreg");
assert(PhysReg && "Expected physreg");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have overlooked to revert it, while trying different solutions locally. I'll take care of it!

@jayfoad
Copy link
Contributor

jayfoad commented May 16, 2025

can you review this, as now only about 2k LIT tests (all across different targets) needs to be updated

2000 is still a lot! We need to understand which ones only fail because the way liveins are printed in MIR has changed (these ones can simply be updated) and which ones actually have different codegen, or fail in some other way.

@jayfoad
Copy link
Contributor

jayfoad commented May 16, 2025

There are genuine codegen differences in:

test/CodeGen/AArch64/zero-call-used-regs.ll
test/CodeGen/X86/zero-call-used-regs.ll

Looks like PEIImpl::insertZeroCallUsedRegs will need updating to understand that liveins() may only contain subregs of the registers that it is querying.

@jayfoad
Copy link
Contributor

jayfoad commented May 16, 2025

Rebasing on #140248 should fix one of 2000 failures.

@jayfoad
Copy link
Contributor

jayfoad commented May 16, 2025

There are codegen diffs in test/CodeGen/Thumb2/LowOverheadLoops/. It looks like this part of LowOverheadLoop::ValidateLiveOuts does not cope well with subregs in liveins():

for (const MachineBasicBlock::RegisterMaskPair &RegMask : ExitBB->liveins()) {

@vg0204
Copy link
Contributor Author

vg0204 commented May 19, 2025

@jayfoad , what I feel is that, lets deal with MIR-print(related to livein) changes (2k LIT tests basically) first which comprises about surely 80-85% test cases, then incrementally look into other tests causing codegen diff as few examples given by you and track them separately.

Sounds Good?

One last thing which I asked previously also, its fine to have these many file changes in single PR, from review as well tracking perspective?

@vg0204
Copy link
Contributor Author

vg0204 commented May 19, 2025

2000 is still a lot! We need to understand which ones only fail because the way liveins are printed in MIR has changed (these ones can simply be updated) and which ones actually have different codegen, or fail in some other way.

There are observed only 2 types of failure:

  1. Codegen diff (Exit Code : 1) -> either due tu way liveins got printed or some real codegen changes as suggested by you.
  2. MachineVerifier failure (Exit Code : 2) -> due to use of register whose regUnits are completely killed in some other form (its subreg or alias). There are only such 2 tests, that can be easily modified to solve it keeping it semantically same.

vg0204 added 2 commits May 19, 2025 09:39
Currently, the machine basicblock does not fully utilizes the
laneBitmask associated with physReg liveIns to check for the
precise liveness. Conservatively, it acts fully correct now, only
if all liveIns check for MBB is in form it defines it for itself.

So, now with the use of register units tracking for MBB's liveIns
, its possible to track & check liveness for all sorts of physRegs.
@vg0204
Copy link
Contributor Author

vg0204 commented May 19, 2025

Rebasing on #140248 should fix one of 2000 failures.

But, doesn't it make incorporates use of lanemask in livein, which is eventually non-exisistent in my patch!

@vg0204 vg0204 force-pushed the vg0204/add-liveRegUnit-compute-in-MBB branch from a1b695d to a71dc1b Compare May 19, 2025 09:44
@jayfoad
Copy link
Contributor

jayfoad commented May 19, 2025

Rebasing on #140248 should fix one of 2000 failures.

But, doesn't it make incorporates use of lanemask in livein, which is eventually non-exisistent in my patch!

The reason for doing #140248 is that it makes test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll behave the same before and after #129847. So #129847 will no longer make this test fail, so it's one less test failure to worry about, and #129847 does not have to include any diffs for this test, which keeps it simpler.

@jayfoad
Copy link
Contributor

jayfoad commented May 19, 2025

  1. MachineVerifier failure (Exit Code : 2) -> due to use of register whose regUnits are completely killed in some other form (its subreg or alias). There are only such 2 tests, that can be easily modified to solve it keeping it semantically same.

I took a look at test/CodeGen/RISCV/make-compressible-zfinx.mir. I disagree with "can be easily modified to solve it". This is another case where regunits cannot distinguish between a subreg and a superreg.

The RISCV example is: x10_h is a subreg of x10_w which is a subreg of x10. But all of them have only a single regunit (number 56 in the dumps I looked at).

This has been discussed before: #96146, #95926 (comment)
FYI @topperc, @Pierre-vh, @qcolombet

We need a proper solution for this issue before continuing with the current patch.

@vg0204
Copy link
Contributor Author

vg0204 commented May 20, 2025

  1. MachineVerifier failure (Exit Code : 2) -> due to use of register whose regUnits are completely killed in some other form (its subreg or alias). There are only such 2 tests, that can be easily modified to solve it keeping it semantically same.

I took a look at test/CodeGen/RISCV/make-compressible-zfinx.mir. I disagree with "can be easily modified to solve it". This is another case where regunits cannot distinguish between a subreg and a superreg.

The RISCV example is: x10_h is a subreg of x10_w which is a subreg of x10. But all of them have only a single regunit (number 56 in the dumps I looked at).

This has been discussed before: #96146, #95926 (comment) FYI @topperc, @Pierre-vh, @qcolombet

We need a proper solution for this issue before continuing with the current patch.

Yes! There is one more scenario, that's not considered in this discussion when regunits are same despite different registers, when supperegs is composed of only addressable subregs. For example: in X86, both EAX and RAX have same set of regunits (HAX, AH, AL), as RAX (64bits) has got lower 32 bit subreg i.e. EAX & got nothing to represent upper 32bits. As in #96146, fake high regunits can now be seen. Such instance is quite commonplace. Its same as Qn & Dn case, but it's been handled as for now using artificial HI parts.

I think as per current implementation the regunits are something that really not register composed of uniquely, but are the bare minimum chunks that can represent aliasing and overlaps (for addressable parts only OR defined fake parts) in registers. What do you think, as its make migration to regunit itself a very tricky thing in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants