-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[MCP] Move dependencies if they block copy propagation #105562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-regalloc Author: Gábor Spaits (spaits) ChangesAs we have discussed in a previous PR (#98087) here is an implementation using ScheduleDAG in the MCP. This PR is not fully finished yet. I have not really done any precise benchmarking. The only thing I have done is that, I have tested how much time does the generation of some regression tests take before my patch and after my path. I have not seen any increases there on my machine. But this is not a precise way of measuring. I have not updated all the tests yet. Could you please take a quick look at this PR and give some feedback? Patch is 1.38 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/105562.diff 241 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index b34e0939d1c7c6..493d7cd7d8c920 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -48,19 +48,27 @@
//
//===----------------------------------------------------------------------===//
+#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/iterator_range.h"
+#include "llvm/Analysis/AliasAnalysis.h"
+#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/ScheduleDAG.h"
+#include "llvm/CodeGen/ScheduleDAGInstrs.h"
+#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
@@ -70,9 +78,15 @@
#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/DebugCounter.h"
+#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
+#include <algorithm>
#include <cassert>
#include <iterator>
+#include <optional>
+#include <queue>
+#include <utility>
+#include <variant>
using namespace llvm;
@@ -92,6 +106,113 @@ static cl::opt<cl::boolOrDefault>
EnableSpillageCopyElimination("enable-spill-copy-elim", cl::Hidden);
namespace {
+// A ScheduleDAG subclass that is used as a dependency graph.
+class ScheduleDAGMCP : public ScheduleDAGInstrs {
+public:
+ void schedule() override {
+ llvm_unreachable("This schedule dag is only used as a dependency graph for "
+ "Machine Copy Propagation\n");
+ }
+
+ ScheduleDAGMCP(MachineFunction &MF, const MachineLoopInfo *MLI,
+ bool RemoveKillFlags = false)
+ : ScheduleDAGInstrs(MF, MLI, RemoveKillFlags) {
+ CanHandleTerminators = true;
+ }
+};
+
+static bool moveInstructionsOutOfTheWayIfWeCan(SUnit *Dst,
+ SUnit *Src,
+ ScheduleDAGMCP &DG) {
+ MachineInstr *DstInstr = Dst->getInstr();
+ MachineInstr *SrcInstr = Src->getInstr();
+ MachineBasicBlock *MBB = SrcInstr->getParent();
+
+ if (DstInstr == nullptr || SrcInstr == nullptr)
+ return false;
+ assert("This function only operates on a basic block level." &&
+ MBB == SrcInstr->getParent());
+
+ int SectionSize =
+ std::distance(SrcInstr->getIterator(), DstInstr->getIterator());
+
+ // The bit vector representing the instructions in the section.
+ // This vector stores which instruction needs to be moved and which does not.
+ BitVector SectionInstr(SectionSize, false);
+
+ // The queue for the breadth first search.
+ std::queue<const SUnit *> Edges;
+
+ // Process the children of a node.
+ // Basically every node are checked before it is being put into the queue.
+ // A node is enqueued if it has no dependencies on the source of the copy
+ // (only if we are not talking about the destination node which is a special
+ // case indicated by a flag) and is located between the source of the copy and
+ // the destination of the copy.
+ auto ProcessSNodeChildren = [SrcInstr, &SectionSize, &SectionInstr](
+ std::queue<const SUnit *> &Queue,
+ const SUnit *Node, bool IsRoot) -> bool {
+ for (llvm::SDep I : Node->Preds) {
+ SUnit *SU = I.getSUnit();
+ MachineInstr &MI = *(SU->getInstr());
+ if (!IsRoot && &MI == SrcInstr)
+ return false;
+
+ int DestinationFromSource =
+ std::distance(SrcInstr->getIterator(), MI.getIterator());
+
+ if (&MI != SrcInstr && DestinationFromSource > 0 &&
+ DestinationFromSource < SectionSize) {
+ // If an instruction is already in the Instructions to move map, than
+ // that means that it has already been processes with all of their
+ // dependence. We do not need to do anything with it again.
+ if (!SectionInstr[DestinationFromSource]) {
+ SectionInstr[DestinationFromSource] = true;
+ Queue.push(SU);
+ }
+ }
+ }
+ return true;
+ };
+
+ // The BFS happens here.
+ //
+ // Could not use the ADT implementation of BFS here.
+ // In ADT graph traversals we don't have the chance to select exactly which
+ // children are being put into the "nodes to traverse" queue or stack.
+ //
+ // We couldn't work around this by checking the need for the node in the
+ // processing stage. In some context it does matter what the parent of the
+ // instruction was: Namely when we are starting the traversal with the source
+ // of the copy propagation. This instruction must have the destination as a
+ // dependency. In case of other instruction than has the destination as a dependency, this
+ // dependency would mean the end of the traversal, but in this scenario this
+ // must be ignored. Let's say that we can not control what nodes to process
+ // and we come across the copy source. How do I know what node has that copy
+ // source as their dependency? We can check of which node is the copy source
+ // the dependency of. This list will alway contain the source. To decide if we
+ // have it as dependency of another instruction, we must check in the already
+ // traversed list if any of the instructions that is depended on the source is
+ // contained. This would introduce extra costs.
+ ProcessSNodeChildren(Edges, Dst, true);
+ while (!Edges.empty()) {
+ const auto *Current = Edges.front();
+ Edges.pop();
+ if (!ProcessSNodeChildren(Edges, Current, false))
+ return false;
+ }
+
+ // If all of the dependencies were deemed valid during the BFS then we
+ // are moving them before the copy source here keeping their relative
+ // order to each other.
+ auto CurrentInst = SrcInstr->getIterator();
+ for (int I = 0; I < SectionSize; I++) {
+ if (SectionInstr[I])
+ MBB->splice(SrcInstr->getIterator(), MBB, CurrentInst->getIterator());
+ ++CurrentInst;
+ }
+ return true;
+}
static std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI,
const TargetInstrInfo &TII,
@@ -114,6 +235,7 @@ class CopyTracker {
};
DenseMap<MCRegUnit, CopyInfo> Copies;
+ DenseMap<MCRegUnit, CopyInfo> InvalidCopies;
public:
/// Mark all of the given registers and their subregisters as unavailable for
@@ -130,9 +252,14 @@ class CopyTracker {
}
}
+ int getInvalidCopiesSize() {
+ return InvalidCopies.size();
+ }
+
/// Remove register from copy maps.
void invalidateRegister(MCRegister Reg, const TargetRegisterInfo &TRI,
- const TargetInstrInfo &TII, bool UseCopyInstr) {
+ const TargetInstrInfo &TII, bool UseCopyInstr,
+ bool MayStillBePropagated = false) {
// Since Reg might be a subreg of some registers, only invalidate Reg is not
// enough. We have to find the COPY defines Reg or registers defined by Reg
// and invalidate all of them. Similarly, we must invalidate all of the
@@ -158,8 +285,11 @@ class CopyTracker {
InvalidateCopy(MI);
}
}
- for (MCRegUnit Unit : RegUnitsToInvalidate)
+ for (MCRegUnit Unit : RegUnitsToInvalidate) {
+ if (Copies.contains(Unit) && MayStillBePropagated)
+ InvalidCopies[Unit] = Copies[Unit];
Copies.erase(Unit);
+ }
}
/// Clobber a single register, removing it from the tracker's copy maps.
@@ -252,6 +382,10 @@ class CopyTracker {
return !Copies.empty();
}
+ bool hasAnyInvalidCopies() {
+ return !InvalidCopies.empty();
+ }
+
MachineInstr *findCopyForUnit(MCRegUnit RegUnit,
const TargetRegisterInfo &TRI,
bool MustBeAvailable = false) {
@@ -263,6 +397,17 @@ class CopyTracker {
return CI->second.MI;
}
+ MachineInstr *findInvalidCopyForUnit(MCRegUnit RegUnit,
+ const TargetRegisterInfo &TRI,
+ bool MustBeAvailable = false) {
+ auto CI = InvalidCopies.find(RegUnit);
+ if (CI == InvalidCopies.end())
+ return nullptr;
+ if (MustBeAvailable && !CI->second.Avail)
+ return nullptr;
+ return CI->second.MI;
+ }
+
MachineInstr *findCopyDefViaUnit(MCRegUnit RegUnit,
const TargetRegisterInfo &TRI) {
auto CI = Copies.find(RegUnit);
@@ -274,12 +419,28 @@ class CopyTracker {
return findCopyForUnit(RU, TRI, true);
}
+ MachineInstr *findInvalidCopyDefViaUnit(MCRegUnit RegUnit,
+ const TargetRegisterInfo &TRI) {
+ auto CI = InvalidCopies.find(RegUnit);
+ if (CI == InvalidCopies.end())
+ return nullptr;
+ if (CI->second.DefRegs.size() != 1)
+ return nullptr;
+ MCRegUnit RU = *TRI.regunits(CI->second.DefRegs[0]).begin();
+ return findInvalidCopyForUnit(RU, TRI, false);
+ }
+
+ // TODO: This is ugly there shall be a more elegant solution to invalid
+ // copy searching. Create a variant that either returns a valid an invalid
+ // copy or no copy at all (std::monotype).
MachineInstr *findAvailBackwardCopy(MachineInstr &I, MCRegister Reg,
const TargetRegisterInfo &TRI,
const TargetInstrInfo &TII,
- bool UseCopyInstr) {
+ bool UseCopyInstr,
+ bool SearchInvalid = false) {
MCRegUnit RU = *TRI.regunits(Reg).begin();
- MachineInstr *AvailCopy = findCopyDefViaUnit(RU, TRI);
+ MachineInstr *AvailCopy = SearchInvalid ? findInvalidCopyDefViaUnit(RU, TRI)
+ : findCopyDefViaUnit(RU, TRI);
if (!AvailCopy)
return nullptr;
@@ -377,13 +538,20 @@ class CopyTracker {
void clear() {
Copies.clear();
+ InvalidCopies.clear();
}
};
+using Copy = MachineInstr*;
+using InvalidCopy = std::pair<Copy, MachineInstr *>;
+using CopyLookupResult = std::variant<std::monostate, Copy, InvalidCopy>;
+
class MachineCopyPropagation : public MachineFunctionPass {
+ LiveIntervals *LIS = nullptr;
const TargetRegisterInfo *TRI = nullptr;
const TargetInstrInfo *TII = nullptr;
const MachineRegisterInfo *MRI = nullptr;
+ AAResults *AA = nullptr;
// Return true if this is a copy instruction and false otherwise.
bool UseCopyInstr;
@@ -398,6 +566,7 @@ class MachineCopyPropagation : public MachineFunctionPass {
void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();
+ AU.addUsedIfAvailable<LiveIntervalsWrapperPass>();
MachineFunctionPass::getAnalysisUsage(AU);
}
@@ -414,11 +583,11 @@ class MachineCopyPropagation : public MachineFunctionPass {
void ReadRegister(MCRegister Reg, MachineInstr &Reader, DebugType DT);
void readSuccessorLiveIns(const MachineBasicBlock &MBB);
void ForwardCopyPropagateBlock(MachineBasicBlock &MBB);
- void BackwardCopyPropagateBlock(MachineBasicBlock &MBB);
+ void BackwardCopyPropagateBlock(MachineBasicBlock &MBB, bool ResolveAntiDeps = false);
void EliminateSpillageCopies(MachineBasicBlock &MBB);
bool eraseIfRedundant(MachineInstr &Copy, MCRegister Src, MCRegister Def);
void forwardUses(MachineInstr &MI);
- void propagateDefs(MachineInstr &MI);
+ void propagateDefs(MachineInstr &MI, ScheduleDAGMCP &DG, bool ResolveAntiDeps = false);
bool isForwardableRegClassCopy(const MachineInstr &Copy,
const MachineInstr &UseI, unsigned UseIdx);
bool isBackwardPropagatableRegClassCopy(const MachineInstr &Copy,
@@ -427,7 +596,7 @@ class MachineCopyPropagation : public MachineFunctionPass {
bool hasImplicitOverlap(const MachineInstr &MI, const MachineOperand &Use);
bool hasOverlappingMultipleDef(const MachineInstr &MI,
const MachineOperand &MODef, Register Def);
-
+
/// Candidates for deletion.
SmallSetVector<MachineInstr *, 8> MaybeDeadCopies;
@@ -986,8 +1155,10 @@ static bool isBackwardPropagatableCopy(const DestSourcePair &CopyOperands,
return CopyOperands.Source->isRenamable() && CopyOperands.Source->isKill();
}
-void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
- if (!Tracker.hasAnyCopies())
+void MachineCopyPropagation::propagateDefs(MachineInstr &MI,
+ ScheduleDAGMCP &DG,
+ bool MoveDependenciesForBetterCopyPropagation) {
+ if (!Tracker.hasAnyCopies() && !Tracker.hasAnyInvalidCopies())
return;
for (unsigned OpIdx = 0, OpEnd = MI.getNumOperands(); OpIdx != OpEnd;
@@ -1010,8 +1181,30 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
MachineInstr *Copy = Tracker.findAvailBackwardCopy(
MI, MODef.getReg().asMCReg(), *TRI, *TII, UseCopyInstr);
- if (!Copy)
- continue;
+ if (!Copy) {
+ if (!MoveDependenciesForBetterCopyPropagation)
+ continue;
+
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Couldn't find any backward copy that has no dependency.\n");
+ Copy = Tracker.findAvailBackwardCopy(MI, MODef.getReg().asMCReg(), *TRI,
+ *TII, UseCopyInstr, true);
+ if (!Copy) {
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Couldn't find any backward copy that has dependency.\n");
+ continue;
+ }
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Found potential backward copy that has dependency.\n");
+ SUnit *DstSUnit = DG.getSUnit(Copy);
+ SUnit *SrcSUnit = DG.getSUnit(&MI);
+
+ if (!moveInstructionsOutOfTheWayIfWeCan(DstSUnit, SrcSUnit, DG))
+ continue;
+ }
std::optional<DestSourcePair> CopyOperands =
isCopyInstr(*Copy, *TII, UseCopyInstr);
@@ -1033,23 +1226,35 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
LLVM_DEBUG(dbgs() << "MCP: Replacing " << printReg(MODef.getReg(), TRI)
<< "\n with " << printReg(Def, TRI) << "\n in "
<< MI << " from " << *Copy);
+ if (!MoveDependenciesForBetterCopyPropagation) {
+ MODef.setReg(Def);
+ MODef.setIsRenamable(CopyOperands->Destination->isRenamable());
- MODef.setReg(Def);
- MODef.setIsRenamable(CopyOperands->Destination->isRenamable());
-
- LLVM_DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
- MaybeDeadCopies.insert(Copy);
- Changed = true;
- ++NumCopyBackwardPropagated;
+ LLVM_DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
+ MaybeDeadCopies.insert(Copy);
+ Changed = true;
+ ++NumCopyBackwardPropagated;
+ }
}
}
void MachineCopyPropagation::BackwardCopyPropagateBlock(
- MachineBasicBlock &MBB) {
+ MachineBasicBlock &MBB, bool MoveDependenciesForBetterCopyPropagation) {
+ ScheduleDAGMCP DG{*(MBB.getParent()), nullptr, false};
+ if (MoveDependenciesForBetterCopyPropagation) {
+ DG.startBlock(&MBB);
+ DG.enterRegion(&MBB, MBB.begin(), MBB.end(), MBB.size());
+ DG.buildSchedGraph(nullptr);
+ // DG.viewGraph();
+ }
+
+
LLVM_DEBUG(dbgs() << "MCP: BackwardCopyPropagateBlock " << MBB.getName()
<< "\n");
for (MachineInstr &MI : llvm::make_early_inc_range(llvm::reverse(MBB))) {
+ //llvm::errs() << "Next MI: ";
+ //MI.dump();
// Ignore non-trivial COPYs.
std::optional<DestSourcePair> CopyOperands =
isCopyInstr(MI, *TII, UseCopyInstr);
@@ -1062,7 +1267,7 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
// just let forward cp do COPY-to-COPY propagation.
if (isBackwardPropagatableCopy(*CopyOperands, *MRI)) {
Tracker.invalidateRegister(SrcReg.asMCReg(), *TRI, *TII,
- UseCopyInstr);
+ UseCopyInstr, MoveDependenciesForBetterCopyPropagation);
Tracker.invalidateRegister(DefReg.asMCReg(), *TRI, *TII,
UseCopyInstr);
Tracker.trackCopy(&MI, *TRI, *TII, UseCopyInstr);
@@ -1077,10 +1282,10 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
MCRegister Reg = MO.getReg().asMCReg();
if (!Reg)
continue;
- Tracker.invalidateRegister(Reg, *TRI, *TII, UseCopyInstr);
+ Tracker.invalidateRegister(Reg, *TRI, *TII, UseCopyInstr, false);
}
- propagateDefs(MI);
+ propagateDefs(MI, DG, MoveDependenciesForBetterCopyPropagation);
for (const MachineOperand &MO : MI.operands()) {
if (!MO.isReg())
continue;
@@ -1104,7 +1309,7 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
}
} else {
Tracker.invalidateRegister(MO.getReg().asMCReg(), *TRI, *TII,
- UseCopyInstr);
+ UseCopyInstr, MoveDependenciesForBetterCopyPropagation);
}
}
}
@@ -1122,6 +1327,15 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
Copy->eraseFromParent();
++NumDeletes;
}
+ if (MoveDependenciesForBetterCopyPropagation) {
+ DG.exitRegion();
+ DG.finishBlock();
+ // QUESTION: Does it makes sense to keep the kill flags here?
+ // On the other parts of this pass we juts throw out
+ // the kill flags.
+ DG.fixupKills(MBB);
+ }
+
MaybeDeadCopies.clear();
CopyDbgUsers.clear();
@@ -1472,11 +1686,29 @@ bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {
TRI = MF.getSubtarget().getRegisterInfo();
TII = MF.getSubtarget().getInstrInfo();
MRI = &MF.getRegInfo();
+ auto *LISWrapper = getAnalysisIfAvailable<LiveIntervalsWrapperPass>();
+ LIS = LISWrapper ? &LISWrapper->getLIS() : nullptr;
for (MachineBasicBlock &MBB : MF) {
if (isSpillageCopyElimEnabled)
EliminateSpillageCopies(MBB);
+
+ // BackwardCopyPropagateBlock happens in two stages.
+ // First we move those unnecessary dependencies out of the way
+ // that may block copy propagations.
+ //
+ // The reason for this two stage approach is that the ScheduleDAG can not
+ // handle register renaming.
+ // QUESTION: I think these two stages could be merged together, if I were to change
+ // the renaming mechanism.
+ //
+ // The renaming wouldn't happen instantly. There would be a data structure
+ // that contained what register should be renamed to what. Then after the
+ // backward propagation has concluded the renaming would happen.
+ BackwardCopyPropagateBlock(MBB, true);
+ // Then we do the actual copy propagation.
BackwardCopyPropagateBlock(MBB);
+
ForwardCopyPropagateBlock(MBB);
}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
index de3f323891a36a..92575d701f4281 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
@@ -6026,8 +6026,8 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w29, -16
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w19, -24
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w20, -32
-; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLINE-O1-NEXT: mov w19, w1
+; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLINE-O1-NEXT: mov w1, w2
; CHECK-OUTLINE-O1-NEXT: mov w0, w19
; CHECK-OUTLINE-O1-NEXT: mov x2, x3
@@ -6133,8 +6133,8 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w29, -16
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w19, -24
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w20, -32
-; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLI...
[truncated]
|
@llvm/pr-subscribers-backend-x86 Author: Gábor Spaits (spaits) ChangesAs we have discussed in a previous PR (#98087) here is an implementation using ScheduleDAG in the MCP. This PR is not fully finished yet. I have not really done any precise benchmarking. The only thing I have done is that, I have tested how much time does the generation of some regression tests take before my patch and after my path. I have not seen any increases there on my machine. But this is not a precise way of measuring. I have not updated all the tests yet. Could you please take a quick look at this PR and give some feedback? Patch is 1.38 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/105562.diff 241 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index b34e0939d1c7c6..493d7cd7d8c920 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -48,19 +48,27 @@
//
//===----------------------------------------------------------------------===//
+#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/iterator_range.h"
+#include "llvm/Analysis/AliasAnalysis.h"
+#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/ScheduleDAG.h"
+#include "llvm/CodeGen/ScheduleDAGInstrs.h"
+#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
@@ -70,9 +78,15 @@
#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/DebugCounter.h"
+#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
+#include <algorithm>
#include <cassert>
#include <iterator>
+#include <optional>
+#include <queue>
+#include <utility>
+#include <variant>
using namespace llvm;
@@ -92,6 +106,113 @@ static cl::opt<cl::boolOrDefault>
EnableSpillageCopyElimination("enable-spill-copy-elim", cl::Hidden);
namespace {
+// A ScheduleDAG subclass that is used as a dependency graph.
+class ScheduleDAGMCP : public ScheduleDAGInstrs {
+public:
+ void schedule() override {
+ llvm_unreachable("This schedule dag is only used as a dependency graph for "
+ "Machine Copy Propagation\n");
+ }
+
+ ScheduleDAGMCP(MachineFunction &MF, const MachineLoopInfo *MLI,
+ bool RemoveKillFlags = false)
+ : ScheduleDAGInstrs(MF, MLI, RemoveKillFlags) {
+ CanHandleTerminators = true;
+ }
+};
+
+static bool moveInstructionsOutOfTheWayIfWeCan(SUnit *Dst,
+ SUnit *Src,
+ ScheduleDAGMCP &DG) {
+ MachineInstr *DstInstr = Dst->getInstr();
+ MachineInstr *SrcInstr = Src->getInstr();
+ MachineBasicBlock *MBB = SrcInstr->getParent();
+
+ if (DstInstr == nullptr || SrcInstr == nullptr)
+ return false;
+ assert("This function only operates on a basic block level." &&
+ MBB == SrcInstr->getParent());
+
+ int SectionSize =
+ std::distance(SrcInstr->getIterator(), DstInstr->getIterator());
+
+ // The bit vector representing the instructions in the section.
+ // This vector stores which instruction needs to be moved and which does not.
+ BitVector SectionInstr(SectionSize, false);
+
+ // The queue for the breadth first search.
+ std::queue<const SUnit *> Edges;
+
+ // Process the children of a node.
+ // Basically every node are checked before it is being put into the queue.
+ // A node is enqueued if it has no dependencies on the source of the copy
+ // (only if we are not talking about the destination node which is a special
+ // case indicated by a flag) and is located between the source of the copy and
+ // the destination of the copy.
+ auto ProcessSNodeChildren = [SrcInstr, &SectionSize, &SectionInstr](
+ std::queue<const SUnit *> &Queue,
+ const SUnit *Node, bool IsRoot) -> bool {
+ for (llvm::SDep I : Node->Preds) {
+ SUnit *SU = I.getSUnit();
+ MachineInstr &MI = *(SU->getInstr());
+ if (!IsRoot && &MI == SrcInstr)
+ return false;
+
+ int DestinationFromSource =
+ std::distance(SrcInstr->getIterator(), MI.getIterator());
+
+ if (&MI != SrcInstr && DestinationFromSource > 0 &&
+ DestinationFromSource < SectionSize) {
+ // If an instruction is already in the Instructions to move map, than
+ // that means that it has already been processes with all of their
+ // dependence. We do not need to do anything with it again.
+ if (!SectionInstr[DestinationFromSource]) {
+ SectionInstr[DestinationFromSource] = true;
+ Queue.push(SU);
+ }
+ }
+ }
+ return true;
+ };
+
+ // The BFS happens here.
+ //
+ // Could not use the ADT implementation of BFS here.
+ // In ADT graph traversals we don't have the chance to select exactly which
+ // children are being put into the "nodes to traverse" queue or stack.
+ //
+ // We couldn't work around this by checking the need for the node in the
+ // processing stage. In some context it does matter what the parent of the
+ // instruction was: Namely when we are starting the traversal with the source
+ // of the copy propagation. This instruction must have the destination as a
+ // dependency. In case of other instruction than has the destination as a dependency, this
+ // dependency would mean the end of the traversal, but in this scenario this
+ // must be ignored. Let's say that we can not control what nodes to process
+ // and we come across the copy source. How do I know what node has that copy
+ // source as their dependency? We can check of which node is the copy source
+ // the dependency of. This list will alway contain the source. To decide if we
+ // have it as dependency of another instruction, we must check in the already
+ // traversed list if any of the instructions that is depended on the source is
+ // contained. This would introduce extra costs.
+ ProcessSNodeChildren(Edges, Dst, true);
+ while (!Edges.empty()) {
+ const auto *Current = Edges.front();
+ Edges.pop();
+ if (!ProcessSNodeChildren(Edges, Current, false))
+ return false;
+ }
+
+ // If all of the dependencies were deemed valid during the BFS then we
+ // are moving them before the copy source here keeping their relative
+ // order to each other.
+ auto CurrentInst = SrcInstr->getIterator();
+ for (int I = 0; I < SectionSize; I++) {
+ if (SectionInstr[I])
+ MBB->splice(SrcInstr->getIterator(), MBB, CurrentInst->getIterator());
+ ++CurrentInst;
+ }
+ return true;
+}
static std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI,
const TargetInstrInfo &TII,
@@ -114,6 +235,7 @@ class CopyTracker {
};
DenseMap<MCRegUnit, CopyInfo> Copies;
+ DenseMap<MCRegUnit, CopyInfo> InvalidCopies;
public:
/// Mark all of the given registers and their subregisters as unavailable for
@@ -130,9 +252,14 @@ class CopyTracker {
}
}
+ int getInvalidCopiesSize() {
+ return InvalidCopies.size();
+ }
+
/// Remove register from copy maps.
void invalidateRegister(MCRegister Reg, const TargetRegisterInfo &TRI,
- const TargetInstrInfo &TII, bool UseCopyInstr) {
+ const TargetInstrInfo &TII, bool UseCopyInstr,
+ bool MayStillBePropagated = false) {
// Since Reg might be a subreg of some registers, only invalidate Reg is not
// enough. We have to find the COPY defines Reg or registers defined by Reg
// and invalidate all of them. Similarly, we must invalidate all of the
@@ -158,8 +285,11 @@ class CopyTracker {
InvalidateCopy(MI);
}
}
- for (MCRegUnit Unit : RegUnitsToInvalidate)
+ for (MCRegUnit Unit : RegUnitsToInvalidate) {
+ if (Copies.contains(Unit) && MayStillBePropagated)
+ InvalidCopies[Unit] = Copies[Unit];
Copies.erase(Unit);
+ }
}
/// Clobber a single register, removing it from the tracker's copy maps.
@@ -252,6 +382,10 @@ class CopyTracker {
return !Copies.empty();
}
+ bool hasAnyInvalidCopies() {
+ return !InvalidCopies.empty();
+ }
+
MachineInstr *findCopyForUnit(MCRegUnit RegUnit,
const TargetRegisterInfo &TRI,
bool MustBeAvailable = false) {
@@ -263,6 +397,17 @@ class CopyTracker {
return CI->second.MI;
}
+ MachineInstr *findInvalidCopyForUnit(MCRegUnit RegUnit,
+ const TargetRegisterInfo &TRI,
+ bool MustBeAvailable = false) {
+ auto CI = InvalidCopies.find(RegUnit);
+ if (CI == InvalidCopies.end())
+ return nullptr;
+ if (MustBeAvailable && !CI->second.Avail)
+ return nullptr;
+ return CI->second.MI;
+ }
+
MachineInstr *findCopyDefViaUnit(MCRegUnit RegUnit,
const TargetRegisterInfo &TRI) {
auto CI = Copies.find(RegUnit);
@@ -274,12 +419,28 @@ class CopyTracker {
return findCopyForUnit(RU, TRI, true);
}
+ MachineInstr *findInvalidCopyDefViaUnit(MCRegUnit RegUnit,
+ const TargetRegisterInfo &TRI) {
+ auto CI = InvalidCopies.find(RegUnit);
+ if (CI == InvalidCopies.end())
+ return nullptr;
+ if (CI->second.DefRegs.size() != 1)
+ return nullptr;
+ MCRegUnit RU = *TRI.regunits(CI->second.DefRegs[0]).begin();
+ return findInvalidCopyForUnit(RU, TRI, false);
+ }
+
+ // TODO: This is ugly there shall be a more elegant solution to invalid
+ // copy searching. Create a variant that either returns a valid an invalid
+ // copy or no copy at all (std::monotype).
MachineInstr *findAvailBackwardCopy(MachineInstr &I, MCRegister Reg,
const TargetRegisterInfo &TRI,
const TargetInstrInfo &TII,
- bool UseCopyInstr) {
+ bool UseCopyInstr,
+ bool SearchInvalid = false) {
MCRegUnit RU = *TRI.regunits(Reg).begin();
- MachineInstr *AvailCopy = findCopyDefViaUnit(RU, TRI);
+ MachineInstr *AvailCopy = SearchInvalid ? findInvalidCopyDefViaUnit(RU, TRI)
+ : findCopyDefViaUnit(RU, TRI);
if (!AvailCopy)
return nullptr;
@@ -377,13 +538,20 @@ class CopyTracker {
void clear() {
Copies.clear();
+ InvalidCopies.clear();
}
};
+using Copy = MachineInstr*;
+using InvalidCopy = std::pair<Copy, MachineInstr *>;
+using CopyLookupResult = std::variant<std::monostate, Copy, InvalidCopy>;
+
class MachineCopyPropagation : public MachineFunctionPass {
+ LiveIntervals *LIS = nullptr;
const TargetRegisterInfo *TRI = nullptr;
const TargetInstrInfo *TII = nullptr;
const MachineRegisterInfo *MRI = nullptr;
+ AAResults *AA = nullptr;
// Return true if this is a copy instruction and false otherwise.
bool UseCopyInstr;
@@ -398,6 +566,7 @@ class MachineCopyPropagation : public MachineFunctionPass {
void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();
+ AU.addUsedIfAvailable<LiveIntervalsWrapperPass>();
MachineFunctionPass::getAnalysisUsage(AU);
}
@@ -414,11 +583,11 @@ class MachineCopyPropagation : public MachineFunctionPass {
void ReadRegister(MCRegister Reg, MachineInstr &Reader, DebugType DT);
void readSuccessorLiveIns(const MachineBasicBlock &MBB);
void ForwardCopyPropagateBlock(MachineBasicBlock &MBB);
- void BackwardCopyPropagateBlock(MachineBasicBlock &MBB);
+ void BackwardCopyPropagateBlock(MachineBasicBlock &MBB, bool ResolveAntiDeps = false);
void EliminateSpillageCopies(MachineBasicBlock &MBB);
bool eraseIfRedundant(MachineInstr &Copy, MCRegister Src, MCRegister Def);
void forwardUses(MachineInstr &MI);
- void propagateDefs(MachineInstr &MI);
+ void propagateDefs(MachineInstr &MI, ScheduleDAGMCP &DG, bool ResolveAntiDeps = false);
bool isForwardableRegClassCopy(const MachineInstr &Copy,
const MachineInstr &UseI, unsigned UseIdx);
bool isBackwardPropagatableRegClassCopy(const MachineInstr &Copy,
@@ -427,7 +596,7 @@ class MachineCopyPropagation : public MachineFunctionPass {
bool hasImplicitOverlap(const MachineInstr &MI, const MachineOperand &Use);
bool hasOverlappingMultipleDef(const MachineInstr &MI,
const MachineOperand &MODef, Register Def);
-
+
/// Candidates for deletion.
SmallSetVector<MachineInstr *, 8> MaybeDeadCopies;
@@ -986,8 +1155,10 @@ static bool isBackwardPropagatableCopy(const DestSourcePair &CopyOperands,
return CopyOperands.Source->isRenamable() && CopyOperands.Source->isKill();
}
-void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
- if (!Tracker.hasAnyCopies())
+void MachineCopyPropagation::propagateDefs(MachineInstr &MI,
+ ScheduleDAGMCP &DG,
+ bool MoveDependenciesForBetterCopyPropagation) {
+ if (!Tracker.hasAnyCopies() && !Tracker.hasAnyInvalidCopies())
return;
for (unsigned OpIdx = 0, OpEnd = MI.getNumOperands(); OpIdx != OpEnd;
@@ -1010,8 +1181,30 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
MachineInstr *Copy = Tracker.findAvailBackwardCopy(
MI, MODef.getReg().asMCReg(), *TRI, *TII, UseCopyInstr);
- if (!Copy)
- continue;
+ if (!Copy) {
+ if (!MoveDependenciesForBetterCopyPropagation)
+ continue;
+
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Couldn't find any backward copy that has no dependency.\n");
+ Copy = Tracker.findAvailBackwardCopy(MI, MODef.getReg().asMCReg(), *TRI,
+ *TII, UseCopyInstr, true);
+ if (!Copy) {
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Couldn't find any backward copy that has dependency.\n");
+ continue;
+ }
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Found potential backward copy that has dependency.\n");
+ SUnit *DstSUnit = DG.getSUnit(Copy);
+ SUnit *SrcSUnit = DG.getSUnit(&MI);
+
+ if (!moveInstructionsOutOfTheWayIfWeCan(DstSUnit, SrcSUnit, DG))
+ continue;
+ }
std::optional<DestSourcePair> CopyOperands =
isCopyInstr(*Copy, *TII, UseCopyInstr);
@@ -1033,23 +1226,35 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
LLVM_DEBUG(dbgs() << "MCP: Replacing " << printReg(MODef.getReg(), TRI)
<< "\n with " << printReg(Def, TRI) << "\n in "
<< MI << " from " << *Copy);
+ if (!MoveDependenciesForBetterCopyPropagation) {
+ MODef.setReg(Def);
+ MODef.setIsRenamable(CopyOperands->Destination->isRenamable());
- MODef.setReg(Def);
- MODef.setIsRenamable(CopyOperands->Destination->isRenamable());
-
- LLVM_DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
- MaybeDeadCopies.insert(Copy);
- Changed = true;
- ++NumCopyBackwardPropagated;
+ LLVM_DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
+ MaybeDeadCopies.insert(Copy);
+ Changed = true;
+ ++NumCopyBackwardPropagated;
+ }
}
}
void MachineCopyPropagation::BackwardCopyPropagateBlock(
- MachineBasicBlock &MBB) {
+ MachineBasicBlock &MBB, bool MoveDependenciesForBetterCopyPropagation) {
+ ScheduleDAGMCP DG{*(MBB.getParent()), nullptr, false};
+ if (MoveDependenciesForBetterCopyPropagation) {
+ DG.startBlock(&MBB);
+ DG.enterRegion(&MBB, MBB.begin(), MBB.end(), MBB.size());
+ DG.buildSchedGraph(nullptr);
+ // DG.viewGraph();
+ }
+
+
LLVM_DEBUG(dbgs() << "MCP: BackwardCopyPropagateBlock " << MBB.getName()
<< "\n");
for (MachineInstr &MI : llvm::make_early_inc_range(llvm::reverse(MBB))) {
+ //llvm::errs() << "Next MI: ";
+ //MI.dump();
// Ignore non-trivial COPYs.
std::optional<DestSourcePair> CopyOperands =
isCopyInstr(MI, *TII, UseCopyInstr);
@@ -1062,7 +1267,7 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
// just let forward cp do COPY-to-COPY propagation.
if (isBackwardPropagatableCopy(*CopyOperands, *MRI)) {
Tracker.invalidateRegister(SrcReg.asMCReg(), *TRI, *TII,
- UseCopyInstr);
+ UseCopyInstr, MoveDependenciesForBetterCopyPropagation);
Tracker.invalidateRegister(DefReg.asMCReg(), *TRI, *TII,
UseCopyInstr);
Tracker.trackCopy(&MI, *TRI, *TII, UseCopyInstr);
@@ -1077,10 +1282,10 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
MCRegister Reg = MO.getReg().asMCReg();
if (!Reg)
continue;
- Tracker.invalidateRegister(Reg, *TRI, *TII, UseCopyInstr);
+ Tracker.invalidateRegister(Reg, *TRI, *TII, UseCopyInstr, false);
}
- propagateDefs(MI);
+ propagateDefs(MI, DG, MoveDependenciesForBetterCopyPropagation);
for (const MachineOperand &MO : MI.operands()) {
if (!MO.isReg())
continue;
@@ -1104,7 +1309,7 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
}
} else {
Tracker.invalidateRegister(MO.getReg().asMCReg(), *TRI, *TII,
- UseCopyInstr);
+ UseCopyInstr, MoveDependenciesForBetterCopyPropagation);
}
}
}
@@ -1122,6 +1327,15 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
Copy->eraseFromParent();
++NumDeletes;
}
+ if (MoveDependenciesForBetterCopyPropagation) {
+ DG.exitRegion();
+ DG.finishBlock();
+ // QUESTION: Does it makes sense to keep the kill flags here?
+ // On the other parts of this pass we juts throw out
+ // the kill flags.
+ DG.fixupKills(MBB);
+ }
+
MaybeDeadCopies.clear();
CopyDbgUsers.clear();
@@ -1472,11 +1686,29 @@ bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {
TRI = MF.getSubtarget().getRegisterInfo();
TII = MF.getSubtarget().getInstrInfo();
MRI = &MF.getRegInfo();
+ auto *LISWrapper = getAnalysisIfAvailable<LiveIntervalsWrapperPass>();
+ LIS = LISWrapper ? &LISWrapper->getLIS() : nullptr;
for (MachineBasicBlock &MBB : MF) {
if (isSpillageCopyElimEnabled)
EliminateSpillageCopies(MBB);
+
+ // BackwardCopyPropagateBlock happens in two stages.
+ // First we move those unnecessary dependencies out of the way
+ // that may block copy propagations.
+ //
+ // The reason for this two stage approach is that the ScheduleDAG can not
+ // handle register renaming.
+ // QUESTION: I think these two stages could be merged together, if I were to change
+ // the renaming mechanism.
+ //
+ // The renaming wouldn't happen instantly. There would be a data structure
+ // that contained what register should be renamed to what. Then after the
+ // backward propagation has concluded the renaming would happen.
+ BackwardCopyPropagateBlock(MBB, true);
+ // Then we do the actual copy propagation.
BackwardCopyPropagateBlock(MBB);
+
ForwardCopyPropagateBlock(MBB);
}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
index de3f323891a36a..92575d701f4281 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
@@ -6026,8 +6026,8 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w29, -16
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w19, -24
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w20, -32
-; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLINE-O1-NEXT: mov w19, w1
+; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLINE-O1-NEXT: mov w1, w2
; CHECK-OUTLINE-O1-NEXT: mov w0, w19
; CHECK-OUTLINE-O1-NEXT: mov x2, x3
@@ -6133,8 +6133,8 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w29, -16
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w19, -24
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w20, -32
-; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLI...
[truncated]
|
@llvm/pr-subscribers-backend-arm Author: Gábor Spaits (spaits) ChangesAs we have discussed in a previous PR (#98087) here is an implementation using ScheduleDAG in the MCP. This PR is not fully finished yet. I have not really done any precise benchmarking. The only thing I have done is that, I have tested how much time does the generation of some regression tests take before my patch and after my path. I have not seen any increases there on my machine. But this is not a precise way of measuring. I have not updated all the tests yet. Could you please take a quick look at this PR and give some feedback? Patch is 1.38 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/105562.diff 241 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index b34e0939d1c7c6..493d7cd7d8c920 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -48,19 +48,27 @@
//
//===----------------------------------------------------------------------===//
+#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/iterator_range.h"
+#include "llvm/Analysis/AliasAnalysis.h"
+#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/ScheduleDAG.h"
+#include "llvm/CodeGen/ScheduleDAGInstrs.h"
+#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
@@ -70,9 +78,15 @@
#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/DebugCounter.h"
+#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"
+#include <algorithm>
#include <cassert>
#include <iterator>
+#include <optional>
+#include <queue>
+#include <utility>
+#include <variant>
using namespace llvm;
@@ -92,6 +106,113 @@ static cl::opt<cl::boolOrDefault>
EnableSpillageCopyElimination("enable-spill-copy-elim", cl::Hidden);
namespace {
+// A ScheduleDAG subclass that is used as a dependency graph.
+class ScheduleDAGMCP : public ScheduleDAGInstrs {
+public:
+ void schedule() override {
+ llvm_unreachable("This schedule dag is only used as a dependency graph for "
+ "Machine Copy Propagation\n");
+ }
+
+ ScheduleDAGMCP(MachineFunction &MF, const MachineLoopInfo *MLI,
+ bool RemoveKillFlags = false)
+ : ScheduleDAGInstrs(MF, MLI, RemoveKillFlags) {
+ CanHandleTerminators = true;
+ }
+};
+
+static bool moveInstructionsOutOfTheWayIfWeCan(SUnit *Dst,
+ SUnit *Src,
+ ScheduleDAGMCP &DG) {
+ MachineInstr *DstInstr = Dst->getInstr();
+ MachineInstr *SrcInstr = Src->getInstr();
+ MachineBasicBlock *MBB = SrcInstr->getParent();
+
+ if (DstInstr == nullptr || SrcInstr == nullptr)
+ return false;
+ assert("This function only operates on a basic block level." &&
+ MBB == SrcInstr->getParent());
+
+ int SectionSize =
+ std::distance(SrcInstr->getIterator(), DstInstr->getIterator());
+
+ // The bit vector representing the instructions in the section.
+ // This vector stores which instruction needs to be moved and which does not.
+ BitVector SectionInstr(SectionSize, false);
+
+ // The queue for the breadth first search.
+ std::queue<const SUnit *> Edges;
+
+ // Process the children of a node.
+ // Basically every node are checked before it is being put into the queue.
+ // A node is enqueued if it has no dependencies on the source of the copy
+ // (only if we are not talking about the destination node which is a special
+ // case indicated by a flag) and is located between the source of the copy and
+ // the destination of the copy.
+ auto ProcessSNodeChildren = [SrcInstr, &SectionSize, &SectionInstr](
+ std::queue<const SUnit *> &Queue,
+ const SUnit *Node, bool IsRoot) -> bool {
+ for (llvm::SDep I : Node->Preds) {
+ SUnit *SU = I.getSUnit();
+ MachineInstr &MI = *(SU->getInstr());
+ if (!IsRoot && &MI == SrcInstr)
+ return false;
+
+ int DestinationFromSource =
+ std::distance(SrcInstr->getIterator(), MI.getIterator());
+
+ if (&MI != SrcInstr && DestinationFromSource > 0 &&
+ DestinationFromSource < SectionSize) {
+ // If an instruction is already in the Instructions to move map, than
+ // that means that it has already been processes with all of their
+ // dependence. We do not need to do anything with it again.
+ if (!SectionInstr[DestinationFromSource]) {
+ SectionInstr[DestinationFromSource] = true;
+ Queue.push(SU);
+ }
+ }
+ }
+ return true;
+ };
+
+ // The BFS happens here.
+ //
+ // Could not use the ADT implementation of BFS here.
+ // In ADT graph traversals we don't have the chance to select exactly which
+ // children are being put into the "nodes to traverse" queue or stack.
+ //
+ // We couldn't work around this by checking the need for the node in the
+ // processing stage. In some context it does matter what the parent of the
+ // instruction was: Namely when we are starting the traversal with the source
+ // of the copy propagation. This instruction must have the destination as a
+ // dependency. In case of other instruction than has the destination as a dependency, this
+ // dependency would mean the end of the traversal, but in this scenario this
+ // must be ignored. Let's say that we can not control what nodes to process
+ // and we come across the copy source. How do I know what node has that copy
+ // source as their dependency? We can check of which node is the copy source
+ // the dependency of. This list will alway contain the source. To decide if we
+ // have it as dependency of another instruction, we must check in the already
+ // traversed list if any of the instructions that is depended on the source is
+ // contained. This would introduce extra costs.
+ ProcessSNodeChildren(Edges, Dst, true);
+ while (!Edges.empty()) {
+ const auto *Current = Edges.front();
+ Edges.pop();
+ if (!ProcessSNodeChildren(Edges, Current, false))
+ return false;
+ }
+
+ // If all of the dependencies were deemed valid during the BFS then we
+ // are moving them before the copy source here keeping their relative
+ // order to each other.
+ auto CurrentInst = SrcInstr->getIterator();
+ for (int I = 0; I < SectionSize; I++) {
+ if (SectionInstr[I])
+ MBB->splice(SrcInstr->getIterator(), MBB, CurrentInst->getIterator());
+ ++CurrentInst;
+ }
+ return true;
+}
static std::optional<DestSourcePair> isCopyInstr(const MachineInstr &MI,
const TargetInstrInfo &TII,
@@ -114,6 +235,7 @@ class CopyTracker {
};
DenseMap<MCRegUnit, CopyInfo> Copies;
+ DenseMap<MCRegUnit, CopyInfo> InvalidCopies;
public:
/// Mark all of the given registers and their subregisters as unavailable for
@@ -130,9 +252,14 @@ class CopyTracker {
}
}
+ int getInvalidCopiesSize() {
+ return InvalidCopies.size();
+ }
+
/// Remove register from copy maps.
void invalidateRegister(MCRegister Reg, const TargetRegisterInfo &TRI,
- const TargetInstrInfo &TII, bool UseCopyInstr) {
+ const TargetInstrInfo &TII, bool UseCopyInstr,
+ bool MayStillBePropagated = false) {
// Since Reg might be a subreg of some registers, only invalidate Reg is not
// enough. We have to find the COPY defines Reg or registers defined by Reg
// and invalidate all of them. Similarly, we must invalidate all of the
@@ -158,8 +285,11 @@ class CopyTracker {
InvalidateCopy(MI);
}
}
- for (MCRegUnit Unit : RegUnitsToInvalidate)
+ for (MCRegUnit Unit : RegUnitsToInvalidate) {
+ if (Copies.contains(Unit) && MayStillBePropagated)
+ InvalidCopies[Unit] = Copies[Unit];
Copies.erase(Unit);
+ }
}
/// Clobber a single register, removing it from the tracker's copy maps.
@@ -252,6 +382,10 @@ class CopyTracker {
return !Copies.empty();
}
+ bool hasAnyInvalidCopies() {
+ return !InvalidCopies.empty();
+ }
+
MachineInstr *findCopyForUnit(MCRegUnit RegUnit,
const TargetRegisterInfo &TRI,
bool MustBeAvailable = false) {
@@ -263,6 +397,17 @@ class CopyTracker {
return CI->second.MI;
}
+ MachineInstr *findInvalidCopyForUnit(MCRegUnit RegUnit,
+ const TargetRegisterInfo &TRI,
+ bool MustBeAvailable = false) {
+ auto CI = InvalidCopies.find(RegUnit);
+ if (CI == InvalidCopies.end())
+ return nullptr;
+ if (MustBeAvailable && !CI->second.Avail)
+ return nullptr;
+ return CI->second.MI;
+ }
+
MachineInstr *findCopyDefViaUnit(MCRegUnit RegUnit,
const TargetRegisterInfo &TRI) {
auto CI = Copies.find(RegUnit);
@@ -274,12 +419,28 @@ class CopyTracker {
return findCopyForUnit(RU, TRI, true);
}
+ MachineInstr *findInvalidCopyDefViaUnit(MCRegUnit RegUnit,
+ const TargetRegisterInfo &TRI) {
+ auto CI = InvalidCopies.find(RegUnit);
+ if (CI == InvalidCopies.end())
+ return nullptr;
+ if (CI->second.DefRegs.size() != 1)
+ return nullptr;
+ MCRegUnit RU = *TRI.regunits(CI->second.DefRegs[0]).begin();
+ return findInvalidCopyForUnit(RU, TRI, false);
+ }
+
+ // TODO: This is ugly there shall be a more elegant solution to invalid
+ // copy searching. Create a variant that either returns a valid an invalid
+ // copy or no copy at all (std::monotype).
MachineInstr *findAvailBackwardCopy(MachineInstr &I, MCRegister Reg,
const TargetRegisterInfo &TRI,
const TargetInstrInfo &TII,
- bool UseCopyInstr) {
+ bool UseCopyInstr,
+ bool SearchInvalid = false) {
MCRegUnit RU = *TRI.regunits(Reg).begin();
- MachineInstr *AvailCopy = findCopyDefViaUnit(RU, TRI);
+ MachineInstr *AvailCopy = SearchInvalid ? findInvalidCopyDefViaUnit(RU, TRI)
+ : findCopyDefViaUnit(RU, TRI);
if (!AvailCopy)
return nullptr;
@@ -377,13 +538,20 @@ class CopyTracker {
void clear() {
Copies.clear();
+ InvalidCopies.clear();
}
};
+using Copy = MachineInstr*;
+using InvalidCopy = std::pair<Copy, MachineInstr *>;
+using CopyLookupResult = std::variant<std::monostate, Copy, InvalidCopy>;
+
class MachineCopyPropagation : public MachineFunctionPass {
+ LiveIntervals *LIS = nullptr;
const TargetRegisterInfo *TRI = nullptr;
const TargetInstrInfo *TII = nullptr;
const MachineRegisterInfo *MRI = nullptr;
+ AAResults *AA = nullptr;
// Return true if this is a copy instruction and false otherwise.
bool UseCopyInstr;
@@ -398,6 +566,7 @@ class MachineCopyPropagation : public MachineFunctionPass {
void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();
+ AU.addUsedIfAvailable<LiveIntervalsWrapperPass>();
MachineFunctionPass::getAnalysisUsage(AU);
}
@@ -414,11 +583,11 @@ class MachineCopyPropagation : public MachineFunctionPass {
void ReadRegister(MCRegister Reg, MachineInstr &Reader, DebugType DT);
void readSuccessorLiveIns(const MachineBasicBlock &MBB);
void ForwardCopyPropagateBlock(MachineBasicBlock &MBB);
- void BackwardCopyPropagateBlock(MachineBasicBlock &MBB);
+ void BackwardCopyPropagateBlock(MachineBasicBlock &MBB, bool ResolveAntiDeps = false);
void EliminateSpillageCopies(MachineBasicBlock &MBB);
bool eraseIfRedundant(MachineInstr &Copy, MCRegister Src, MCRegister Def);
void forwardUses(MachineInstr &MI);
- void propagateDefs(MachineInstr &MI);
+ void propagateDefs(MachineInstr &MI, ScheduleDAGMCP &DG, bool ResolveAntiDeps = false);
bool isForwardableRegClassCopy(const MachineInstr &Copy,
const MachineInstr &UseI, unsigned UseIdx);
bool isBackwardPropagatableRegClassCopy(const MachineInstr &Copy,
@@ -427,7 +596,7 @@ class MachineCopyPropagation : public MachineFunctionPass {
bool hasImplicitOverlap(const MachineInstr &MI, const MachineOperand &Use);
bool hasOverlappingMultipleDef(const MachineInstr &MI,
const MachineOperand &MODef, Register Def);
-
+
/// Candidates for deletion.
SmallSetVector<MachineInstr *, 8> MaybeDeadCopies;
@@ -986,8 +1155,10 @@ static bool isBackwardPropagatableCopy(const DestSourcePair &CopyOperands,
return CopyOperands.Source->isRenamable() && CopyOperands.Source->isKill();
}
-void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
- if (!Tracker.hasAnyCopies())
+void MachineCopyPropagation::propagateDefs(MachineInstr &MI,
+ ScheduleDAGMCP &DG,
+ bool MoveDependenciesForBetterCopyPropagation) {
+ if (!Tracker.hasAnyCopies() && !Tracker.hasAnyInvalidCopies())
return;
for (unsigned OpIdx = 0, OpEnd = MI.getNumOperands(); OpIdx != OpEnd;
@@ -1010,8 +1181,30 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
MachineInstr *Copy = Tracker.findAvailBackwardCopy(
MI, MODef.getReg().asMCReg(), *TRI, *TII, UseCopyInstr);
- if (!Copy)
- continue;
+ if (!Copy) {
+ if (!MoveDependenciesForBetterCopyPropagation)
+ continue;
+
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Couldn't find any backward copy that has no dependency.\n");
+ Copy = Tracker.findAvailBackwardCopy(MI, MODef.getReg().asMCReg(), *TRI,
+ *TII, UseCopyInstr, true);
+ if (!Copy) {
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Couldn't find any backward copy that has dependency.\n");
+ continue;
+ }
+ LLVM_DEBUG(
+ dbgs()
+ << "MCP: Found potential backward copy that has dependency.\n");
+ SUnit *DstSUnit = DG.getSUnit(Copy);
+ SUnit *SrcSUnit = DG.getSUnit(&MI);
+
+ if (!moveInstructionsOutOfTheWayIfWeCan(DstSUnit, SrcSUnit, DG))
+ continue;
+ }
std::optional<DestSourcePair> CopyOperands =
isCopyInstr(*Copy, *TII, UseCopyInstr);
@@ -1033,23 +1226,35 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
LLVM_DEBUG(dbgs() << "MCP: Replacing " << printReg(MODef.getReg(), TRI)
<< "\n with " << printReg(Def, TRI) << "\n in "
<< MI << " from " << *Copy);
+ if (!MoveDependenciesForBetterCopyPropagation) {
+ MODef.setReg(Def);
+ MODef.setIsRenamable(CopyOperands->Destination->isRenamable());
- MODef.setReg(Def);
- MODef.setIsRenamable(CopyOperands->Destination->isRenamable());
-
- LLVM_DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
- MaybeDeadCopies.insert(Copy);
- Changed = true;
- ++NumCopyBackwardPropagated;
+ LLVM_DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
+ MaybeDeadCopies.insert(Copy);
+ Changed = true;
+ ++NumCopyBackwardPropagated;
+ }
}
}
void MachineCopyPropagation::BackwardCopyPropagateBlock(
- MachineBasicBlock &MBB) {
+ MachineBasicBlock &MBB, bool MoveDependenciesForBetterCopyPropagation) {
+ ScheduleDAGMCP DG{*(MBB.getParent()), nullptr, false};
+ if (MoveDependenciesForBetterCopyPropagation) {
+ DG.startBlock(&MBB);
+ DG.enterRegion(&MBB, MBB.begin(), MBB.end(), MBB.size());
+ DG.buildSchedGraph(nullptr);
+ // DG.viewGraph();
+ }
+
+
LLVM_DEBUG(dbgs() << "MCP: BackwardCopyPropagateBlock " << MBB.getName()
<< "\n");
for (MachineInstr &MI : llvm::make_early_inc_range(llvm::reverse(MBB))) {
+ //llvm::errs() << "Next MI: ";
+ //MI.dump();
// Ignore non-trivial COPYs.
std::optional<DestSourcePair> CopyOperands =
isCopyInstr(MI, *TII, UseCopyInstr);
@@ -1062,7 +1267,7 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
// just let forward cp do COPY-to-COPY propagation.
if (isBackwardPropagatableCopy(*CopyOperands, *MRI)) {
Tracker.invalidateRegister(SrcReg.asMCReg(), *TRI, *TII,
- UseCopyInstr);
+ UseCopyInstr, MoveDependenciesForBetterCopyPropagation);
Tracker.invalidateRegister(DefReg.asMCReg(), *TRI, *TII,
UseCopyInstr);
Tracker.trackCopy(&MI, *TRI, *TII, UseCopyInstr);
@@ -1077,10 +1282,10 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
MCRegister Reg = MO.getReg().asMCReg();
if (!Reg)
continue;
- Tracker.invalidateRegister(Reg, *TRI, *TII, UseCopyInstr);
+ Tracker.invalidateRegister(Reg, *TRI, *TII, UseCopyInstr, false);
}
- propagateDefs(MI);
+ propagateDefs(MI, DG, MoveDependenciesForBetterCopyPropagation);
for (const MachineOperand &MO : MI.operands()) {
if (!MO.isReg())
continue;
@@ -1104,7 +1309,7 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
}
} else {
Tracker.invalidateRegister(MO.getReg().asMCReg(), *TRI, *TII,
- UseCopyInstr);
+ UseCopyInstr, MoveDependenciesForBetterCopyPropagation);
}
}
}
@@ -1122,6 +1327,15 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
Copy->eraseFromParent();
++NumDeletes;
}
+ if (MoveDependenciesForBetterCopyPropagation) {
+ DG.exitRegion();
+ DG.finishBlock();
+ // QUESTION: Does it makes sense to keep the kill flags here?
+ // On the other parts of this pass we juts throw out
+ // the kill flags.
+ DG.fixupKills(MBB);
+ }
+
MaybeDeadCopies.clear();
CopyDbgUsers.clear();
@@ -1472,11 +1686,29 @@ bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {
TRI = MF.getSubtarget().getRegisterInfo();
TII = MF.getSubtarget().getInstrInfo();
MRI = &MF.getRegInfo();
+ auto *LISWrapper = getAnalysisIfAvailable<LiveIntervalsWrapperPass>();
+ LIS = LISWrapper ? &LISWrapper->getLIS() : nullptr;
for (MachineBasicBlock &MBB : MF) {
if (isSpillageCopyElimEnabled)
EliminateSpillageCopies(MBB);
+
+ // BackwardCopyPropagateBlock happens in two stages.
+ // First we move those unnecessary dependencies out of the way
+ // that may block copy propagations.
+ //
+ // The reason for this two stage approach is that the ScheduleDAG can not
+ // handle register renaming.
+ // QUESTION: I think these two stages could be merged together, if I were to change
+ // the renaming mechanism.
+ //
+ // The renaming wouldn't happen instantly. There would be a data structure
+ // that contained what register should be renamed to what. Then after the
+ // backward propagation has concluded the renaming would happen.
+ BackwardCopyPropagateBlock(MBB, true);
+ // Then we do the actual copy propagation.
BackwardCopyPropagateBlock(MBB);
+
ForwardCopyPropagateBlock(MBB);
}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
index de3f323891a36a..92575d701f4281 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
@@ -6026,8 +6026,8 @@ define { i8, i1 } @cmpxchg_i8(ptr %ptr, i8 %desired, i8 %new) {
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w29, -16
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w19, -24
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w20, -32
-; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLINE-O1-NEXT: mov w19, w1
+; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLINE-O1-NEXT: mov w1, w2
; CHECK-OUTLINE-O1-NEXT: mov w0, w19
; CHECK-OUTLINE-O1-NEXT: mov x2, x3
@@ -6133,8 +6133,8 @@ define { i16, i1 } @cmpxchg_i16(ptr %ptr, i16 %desired, i16 %new) {
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w29, -16
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w19, -24
; CHECK-OUTLINE-O1-NEXT: .cfi_offset w20, -32
-; CHECK-OUTLINE-O1-NEXT: mov x3, x0
; CHECK-OUTLI...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff 22d3fb182c9199ac3d51e5577c6647508a7a37f0 027d7761dbb0293451a2c00e32cd6dc5ce83252c --extensions cpp -- llvm/lib/CodeGen/MachineCopyPropagation.cpp View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index c1fe3f1964..2d07f4383c 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -123,15 +123,17 @@ public:
};
static std::optional<llvm::SmallVector<MachineInstr *>>
-moveInstructionsOutOfTheWayIfWeCan(MachineInstr *DstInstr, MachineInstr *SrcInstr, ScheduleDAGMCP &DG) {
+moveInstructionsOutOfTheWayIfWeCan(MachineInstr *DstInstr,
+ MachineInstr *SrcInstr, ScheduleDAGMCP &DG) {
SUnit *Dst;
- //SUnit *Src;
+ // SUnit *Src;
MachineBasicBlock *MBB = SrcInstr->getParent();
int SectionSize =
std::distance(SrcInstr->getIterator(), DstInstr->getIterator());
- DG.enterRegion(MBB, (SrcInstr->getIterator()), ++(DstInstr->getIterator()), SectionSize+1);
+ DG.enterRegion(MBB, (SrcInstr->getIterator()), ++(DstInstr->getIterator()),
+ SectionSize + 1);
DG.buildSchedGraph(nullptr);
Dst = DG.getSUnit(DstInstr);
unsigned MaxNumberOfNodesToBeProcessed = 10;
@@ -141,7 +143,6 @@ moveInstructionsOutOfTheWayIfWeCan(MachineInstr *DstInstr, MachineInstr *SrcInst
assert("This function only operates on a basic block level." &&
MBB == DstInstr->getParent());
-
assert(SectionSize > 0 &&
"The copy source must precede the copy destination.");
@@ -160,8 +161,9 @@ moveInstructionsOutOfTheWayIfWeCan(MachineInstr *DstInstr, MachineInstr *SrcInst
// (only if we are not talking about the destination node which is a special
// case indicated by a flag) and is located between the source of the copy and
// the destination of the copy.
- auto ProcessSNodeChildren = [&Edges, SrcInstr, &SectionSize, &SectionInstr, &NumProcessedNode, &MaxNumberOfNodesToBeProcessed](
- const SUnit *Node, bool IsRoot) -> bool {
+ auto ProcessSNodeChildren =
+ [&Edges, SrcInstr, &SectionSize, &SectionInstr, &NumProcessedNode,
+ &MaxNumberOfNodesToBeProcessed](const SUnit *Node, bool IsRoot) -> bool {
for (llvm::SDep I : Node->Preds) {
SUnit *SU = I.getSUnit();
MachineInstr &MI = *(SU->getInstr());
@@ -183,7 +185,7 @@ moveInstructionsOutOfTheWayIfWeCan(MachineInstr *DstInstr, MachineInstr *SrcInst
}
}
}
- return NumProcessedNode < MaxNumberOfNodesToBeProcessed;
+ return NumProcessedNode < MaxNumberOfNodesToBeProcessed;
};
// The BFS happens here.
@@ -268,9 +270,7 @@ public:
}
}
- int getInvalidCopiesSize() {
- return InvalidCopies.size();
- }
+ int getInvalidCopiesSize() { return InvalidCopies.size(); }
/// Remove register from copy maps.
void invalidateRegister(MCRegister Reg, const TargetRegisterInfo &TRI,
@@ -398,9 +398,7 @@ public:
return !Copies.empty();
}
- bool hasAnyInvalidCopies() {
- return !InvalidCopies.empty();
- }
+ bool hasAnyInvalidCopies() { return !InvalidCopies.empty(); }
MachineInstr *findCopyForUnit(MCRegUnit RegUnit,
const TargetRegisterInfo &TRI,
@@ -414,8 +412,8 @@ public:
}
MachineInstr *findInvalidCopyForUnit(MCRegUnit RegUnit,
- const TargetRegisterInfo &TRI,
- bool MustBeAvailable = false) {
+ const TargetRegisterInfo &TRI,
+ bool MustBeAvailable = false) {
auto CI = InvalidCopies.find(RegUnit);
if (CI == InvalidCopies.end())
return nullptr;
@@ -436,7 +434,7 @@ public:
}
MachineInstr *findInvalidCopyDefViaUnit(MCRegUnit RegUnit,
- const TargetRegisterInfo &TRI) {
+ const TargetRegisterInfo &TRI) {
auto CI = InvalidCopies.find(RegUnit);
if (CI == InvalidCopies.end())
return nullptr;
@@ -447,8 +445,8 @@ public:
}
// TODO: This is ugly there shall be a more elegant solution to invalid
- // copy searching. Create a variant that either returns a valid an invalid
- // copy or no copy at all (std::monotype).
+ // copy searching. Create a variant that either returns a valid an
+ // invalid copy or no copy at all (std::monotype).
MachineInstr *findAvailBackwardCopy(MachineInstr &I, MCRegister Reg,
const TargetRegisterInfo &TRI,
const TargetInstrInfo &TII,
@@ -558,7 +556,7 @@ public:
}
};
-using Copy = MachineInstr*;
+using Copy = MachineInstr *;
using InvalidCopy = std::pair<Copy, MachineInstr *>;
using CopyLookupResult = std::variant<std::monostate, Copy, InvalidCopy>;
@@ -599,7 +597,8 @@ private:
void ReadRegister(MCRegister Reg, MachineInstr &Reader, DebugType DT);
void readSuccessorLiveIns(const MachineBasicBlock &MBB);
void ForwardCopyPropagateBlock(MachineBasicBlock &MBB);
- void BackwardCopyPropagateBlock(MachineBasicBlock &MBB, ScheduleDAGMCP *DG = nullptr);
+ void BackwardCopyPropagateBlock(MachineBasicBlock &MBB,
+ ScheduleDAGMCP *DG = nullptr);
void EliminateSpillageCopies(MachineBasicBlock &MBB);
bool eraseIfRedundant(MachineInstr &Copy, MCRegister Src, MCRegister Def);
void forwardUses(MachineInstr &MI);
@@ -612,7 +611,7 @@ private:
bool hasImplicitOverlap(const MachineInstr &MI, const MachineOperand &Use);
bool hasOverlappingMultipleDef(const MachineInstr &MI,
const MachineOperand &MODef, Register Def);
-
+
/// Candidates for deletion.
SmallSetVector<MachineInstr *, 8> MaybeDeadCopies;
@@ -1216,8 +1215,7 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI,
dbgs()
<< "MCP: Found potential backward copy that has dependency.\n");
- InstructionsToMove =
- moveInstructionsOutOfTheWayIfWeCan(Copy, &MI, *DG);
+ InstructionsToMove = moveInstructionsOutOfTheWayIfWeCan(Copy, &MI, *DG);
if (!InstructionsToMove)
continue;
}
@@ -1252,27 +1250,27 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI,
++NumCopyBackwardPropagated;
} else if (InstructionsToMove) {
for (auto *I : *InstructionsToMove) {
- MI.getParent()->splice(MI.getIterator(), MI.getParent(), I->getIterator());
+ MI.getParent()->splice(MI.getIterator(), MI.getParent(),
+ I->getIterator());
}
}
}
}
-void MachineCopyPropagation::BackwardCopyPropagateBlock(
- MachineBasicBlock &MBB, ScheduleDAGMCP *DG) {
+void MachineCopyPropagation::BackwardCopyPropagateBlock(MachineBasicBlock &MBB,
+ ScheduleDAGMCP *DG) {
if (DG) {
DG->startBlock(&MBB);
// DG.viewGraph();
}
-
LLVM_DEBUG(dbgs() << "MCP: BackwardCopyPropagateBlock " << MBB.getName()
<< "\n");
for (MachineInstr &MI : llvm::make_early_inc_range(llvm::reverse(MBB))) {
- //llvm::errs() << "Next MI: ";
- //MI.dump();
- // Ignore non-trivial COPYs.
+ // llvm::errs() << "Next MI: ";
+ // MI.dump();
+ // Ignore non-trivial COPYs.
std::optional<DestSourcePair> CopyOperands =
isCopyInstr(MI, *TII, UseCopyInstr);
if (CopyOperands && MI.getNumOperands() == 2) {
@@ -1326,8 +1324,7 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
}
} else {
Tracker.invalidateRegister(MO.getReg().asMCReg(), *TRI, *TII,
- UseCopyInstr,
- DG);
+ UseCopyInstr, DG);
}
}
}
@@ -1353,7 +1350,6 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
DG->fixupKills(MBB);
}
-
MaybeDeadCopies.clear();
CopyDbgUsers.clear();
Tracker.clear();
@@ -1716,8 +1712,8 @@ bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {
//
// The reason for this two stage approach is that the ScheduleDAG can not
// handle register renaming.
- // QUESTION: I think these two stages could be merged together, if I were to change
- // the renaming mechanism.
+ // QUESTION: I think these two stages could be merged together, if I were to
+ // change the renaming mechanism.
//
// The renaming wouldn't happen instantly. There would be a data structure
// that contained what register should be renamed to what. Then after the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had a cursory look.
The direction seems fine (we compute a DDG and do some stuff with it) but I didn't look closely at the logic.
I remain skeptical that the compile time impact is negligible.
Could you use CTMark from the LLVM test suite to valid your initial measurements?
MBB == SrcInstr->getParent()); | ||
|
||
int SectionSize = | ||
std::distance(SrcInstr->getIterator(), DstInstr->getIterator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the size be negative?
I.e., are we sure DstInstr
appears after SrcInstr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we should not have that. It shall be an unsigned
instead of int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep using int
for the section size. It is compared with possibly negative int (DestinationFromSource
) and also good for the assertion. Is it ok for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes an assertion sounds good.
SUnit *SU = I.getSUnit(); | ||
MachineInstr &MI = *(SU->getInstr()); | ||
if (!IsRoot && &MI == SrcInstr) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the returned boolean mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It signals to the main loop that we have an instruction that is not the root, the copy propagation destination is dependent on it and that it has any dependency on the copy source.
This means that moving of instructions is not possible. If this lambda returns with false the main loop that does the BFS stops.
SUnit *SU = I.getSUnit(); | ||
MachineInstr &MI = *(SU->getInstr()); | ||
if (!IsRoot && &MI == SrcInstr) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we dequeue any of the unit pushed?
(I don't understand what the queue is used for yet, but it feels weird that depending on how the Node->Preds
is ordered we enqueue potentially different things.)
For instance, let's say we have two SUnit in Node->Preds and one of them is SrcInstr.
If SrcInstr is viewed first, the queue will be left untouched, but if it is seen last then the queue may have some additional entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing the dequeuing in the main loop. I should capture the queue instead of passing it as an argument to the lambda.
If we have an instruction, that is not the Destination and is part of the dependency tree staring from the Destination (if the traversal gets there it must be the part of that tree) and it has a dependency on the Source instruction that means that the moving of instructions between Destination and Source is not possible. We signal this to the main loop. The main loop returns no move is performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing the dequeuing in the main loop.
Thanks for the confirmation!
@qcolombet Than you for taking a look at this PR. Tomorrow I will check out CTMark. Today I was also thinking about benchmarking. I have compiled coremark for RISCV. Here are the results of those. (Only code sizes and compilation times were measured):
I think I could inline the lambda in the BFS shall I do it? |
I'm not sure what this would look like, but generally speaking I like subroutines (I even prefer static functions to lambdas) because it makes the contract between the functionality and the surrounding code more obvious and forces you to document the API (what the input arguments are and what is returned mean!) |
I have done some benchmarking. I used this cmake command for the benchmarks:
I used this command to compile:
This is the llvm-lit invcoaction:
I ran compare with this command:
And here are the results for compile time:
And for execution time:
text section sizes:
I should do some fine tuning:
What do you think would this PR be fine for O3 or O2 or Os? |
I'm still concerned about introducing rescheduling instructions in MCP. Is there possibility we enhance pre-RA machine scheduler to achieve the same effect? |
I don't really think the logic of this would fit into the current scheduling mechanism.
In conclusion just the current stuff would be moved, we would have less code reuse. So I think this logic is a bit incompatible with the scheduler. The only thing that is common between them is the use of a Dependency graph. Or maybe I could take a whole different approach:
This would be less general I think, because if a dependency blocks some copies that is not caused by the scheduler it may would not be recognized and moved. Also it would only work in scheduling regions so this may would make it less effective . |
This doesn't work in all the cases. It only works if the scheduler is the one who "spoils" the data dependencies. For example let's see the Here is one function on which this patch improves: define zeroext i1 @smuloi8(i8 %v1, i8 %v2, ptr %res) {
%t = call {i8, i1} @llvm.smul.with.overflow.i8(i8 %v1, i8 %v2)
%val = extractvalue {i8, i1} %t, 0
%obit = extractvalue {i8, i1} %t, 1
store i8 %val, ptr %res
ret i1 %obit
}
If I compile it without my patch with this command: bin/llc -disable-peephole -enable-misched=0 -mtriple=x86_64-linux-unknown X86ex.txt -o oldNoPreRASched.s I get: smuloi8: # @smuloi8
# %bb.0:
movl %edi, %eax
imulb %sil
seto %cl
movb %al, (%rdx)
movl %ecx, %eax
retq
Same result happens if we leave the With my patch with the command: bin/llc -disable-peephole -mtriple=x86_64-linux-unknown X86ex.txt -o newPreRASched.s We get: smuloi8: # @smuloi8
# %bb.0:
movl %edi, %eax
imulb %sil
movb %al, (%rdx)
seto %al
retq So there are more cases that are not closely related to the scheduler. |
Couple of comments:
Could you re-run with CTMark? |
Okay I will change that. I thought the Benchmarking suite includes CT mark based on this: https://llvm.org/docs/TestSuiteGuide.html#common-configuration-options . I used the
Will change that for the next measurement.
Debug + Dynamically linked. I should do release plus statically linked right?
|
Here are the updated results. I used this cmake command now: cmake -DCMAKE_C_COMPILER=/home/spaits/repo/spare-llvm/llvm-project/build/bin/clang -DCMAKE_CXX_FLAGS='-lstdc++ -lrt -lm -lpthread' -DTEST_SUITE_SUBDIRS=CTMark .. -GNinja My clang was compiled in release mode. This time I have run the benchmarks four times for the baseline and four times for my changes and merged them with compare.py.
Execution time:
Text section size:
|
Maybe it would be a good idea to introduce a limit in the dependency checking. So for example at most ten dependencies can be checked. If there are more then we don't do anything. |
When decreasing the node limit to 10 we no longer have major affect on the compile time:
And another measurement:
Here are code sizes:
and the exec time:
|
The impact is still significant on the compile time without a huge impact on the generated code. This is surprising to me. |
I don't know. I was surprised too. So basically the benchmarking is:
There is only one cycle that has been done with the new node restricted patch. That result were compiled with a wo patch version that was produced before (so 4 file vs 4 file) This is why I assumed that it is not much because once its slightly positive once its slightly negative. |
Maybe another interesting way to do thing is to build the scheduler graph only for the code section between the copy source and copy destination. This way we will only spend resources where it has a high chance of benefiting. |
Also maybe I shall run the benchmarks 10 to 20 times not just 4 times. And merge and compare those results. |
I have don yet another benchmarking. Now with each each version the benchmark was compiled five times and for each compilation I was running llvm-lit 4 times. The following three versions were considered: There is a compile time regression relative to the state when we have done ddg for basic blocks. Left (Baseline) Right (DDG for regions) Compile time:
Exec_time:
Size of text section:
I have done these benchmarks on my laptop. I don't have better equipment right now. Also I have only checked the x86-64 target that is not that prone to patterns like the one addressed by this patch. I think the best would be to try this out on arm or riscv and do the compilation in a more consistent environment. Also one more possible improvement:Since I only build the ddg when needed, now we don't have to deal with register renames after the ddg build so this whole thing can be done in one stage again. |
This reverts commit f6ca9f6.
As we have discussed in a previous PR (#98087) here is an implementation using ScheduleDAG in the MCP.
This PR is not fully finished yet. I have not really done any precise benchmarking.
The only thing I have done is that, I have tested how much time does the generation of some regression tests take before my patch and after my path. I have not seen any increases there on my machine. But this is not a precise way of measuring.
I have not updated all the tests yet. I have only somewhat checked the RISCV, AArch64, ARM, X86 and Thumb2 tests.
Could you please take a quick look at this PR and give some feedback?
Is this direction good? Should we continue with this? (Then I will try to do some compile time benchmarking and also update the tests).