Skip to content

Commit d97f997

Browse files
author
Jessica Paquette
committed
[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"
We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.
1 parent dc09815 commit d97f997

File tree

4 files changed

+145
-125
lines changed

4 files changed

+145
-125
lines changed

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1940,6 +1940,25 @@ class TargetInstrInfo : public MCInstrInfo {
19401940
virtual bool isMBBSafeToOutlineFrom(MachineBasicBlock &MBB,
19411941
unsigned &Flags) const;
19421942

1943+
/// Optional target hook which partitions \p MBB into outlinable ranges for
1944+
/// instruction mapping purposes. Each range is defined by two iterators:
1945+
/// [start, end).
1946+
///
1947+
/// Ranges are expected to be ordered top-down. That is, ranges closer to the
1948+
/// top of the block should come before ranges closer to the end of the block.
1949+
///
1950+
/// Ranges cannot overlap.
1951+
///
1952+
/// If an entire block is mappable, then its range is [MBB.begin(), MBB.end())
1953+
///
1954+
/// All instructions not present in an outlinable range are considered
1955+
/// illegal.
1956+
virtual SmallVector<
1957+
std::pair<MachineBasicBlock::iterator, MachineBasicBlock::iterator>>
1958+
getOutlinableRanges(MachineBasicBlock &MBB, unsigned &Flags) const {
1959+
return {std::make_pair(MBB.begin(), MBB.end())};
1960+
}
1961+
19431962
/// Insert a custom frame for outlined functions.
19441963
virtual void buildOutlinedFrame(MachineBasicBlock &MBB, MachineFunction &MF,
19451964
const outliner::OutlinedFunction &OF) const {

llvm/lib/CodeGen/MachineOutliner.cpp

Lines changed: 42 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,10 @@ struct InstructionMapper {
258258
if (!TII.isMBBSafeToOutlineFrom(MBB, Flags))
259259
return;
260260

261+
auto Ranges = TII.getOutlinableRanges(MBB, Flags);
262+
if (Ranges.empty())
263+
return;
264+
261265
// Store info for the MBB for later outlining.
262266
MBBFlagsMap[&MBB] = Flags;
263267

@@ -280,34 +284,47 @@ struct InstructionMapper {
280284
std::vector<unsigned> UnsignedVecForMBB;
281285
std::vector<MachineBasicBlock::iterator> InstrListForMBB;
282286

283-
for (MachineBasicBlock::iterator Et = MBB.end(); It != Et; ++It) {
284-
// Keep track of where this instruction is in the module.
285-
switch (TII.getOutliningType(It, Flags)) {
286-
case InstrType::Illegal:
287+
for (auto &Range : Ranges) {
288+
auto RangeStart = Range.first;
289+
auto RangeEnd = Range.second;
290+
// Everything outside of an outlinable range is illegal.
291+
for (; It != RangeStart; ++It)
287292
mapToIllegalUnsigned(It, CanOutlineWithPrevInstr, UnsignedVecForMBB,
288293
InstrListForMBB);
289-
break;
290-
291-
case InstrType::Legal:
292-
mapToLegalUnsigned(It, CanOutlineWithPrevInstr, HaveLegalRange,
293-
NumLegalInBlock, UnsignedVecForMBB, InstrListForMBB);
294-
break;
295-
296-
case InstrType::LegalTerminator:
297-
mapToLegalUnsigned(It, CanOutlineWithPrevInstr, HaveLegalRange,
298-
NumLegalInBlock, UnsignedVecForMBB, InstrListForMBB);
299-
// The instruction also acts as a terminator, so we have to record that
300-
// in the string.
301-
mapToIllegalUnsigned(It, CanOutlineWithPrevInstr, UnsignedVecForMBB,
294+
assert(It != MBB.end() && "Should still have instructions?");
295+
// `It` is now positioned at the beginning of a range of instructions
296+
// which may be outlinable. Check if each instruction is known to be safe.
297+
for (; It != RangeEnd; ++It) {
298+
// Keep track of where this instruction is in the module.
299+
switch (TII.getOutliningType(It, Flags)) {
300+
case InstrType::Illegal:
301+
mapToIllegalUnsigned(It, CanOutlineWithPrevInstr, UnsignedVecForMBB,
302+
InstrListForMBB);
303+
break;
304+
305+
case InstrType::Legal:
306+
mapToLegalUnsigned(It, CanOutlineWithPrevInstr, HaveLegalRange,
307+
NumLegalInBlock, UnsignedVecForMBB,
302308
InstrListForMBB);
303-
break;
304-
305-
case InstrType::Invisible:
306-
// Normally this is set by mapTo(Blah)Unsigned, but we just want to
307-
// skip this instruction. So, unset the flag here.
308-
++NumInvisible;
309-
AddedIllegalLastTime = false;
310-
break;
309+
break;
310+
311+
case InstrType::LegalTerminator:
312+
mapToLegalUnsigned(It, CanOutlineWithPrevInstr, HaveLegalRange,
313+
NumLegalInBlock, UnsignedVecForMBB,
314+
InstrListForMBB);
315+
// The instruction also acts as a terminator, so we have to record
316+
// that in the string.
317+
mapToIllegalUnsigned(It, CanOutlineWithPrevInstr, UnsignedVecForMBB,
318+
InstrListForMBB);
319+
break;
320+
321+
case InstrType::Invisible:
322+
// Normally this is set by mapTo(Blah)Unsigned, but we just want to
323+
// skip this instruction. So, unset the flag here.
324+
++NumInvisible;
325+
AddedIllegalLastTime = false;
326+
break;
327+
}
311328
}
312329
}
313330

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

Lines changed: 79 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -6783,48 +6783,11 @@ outliner::OutlinedFunction AArch64InstrInfo::getOutliningCandidateInfo(
67836783

67846784
// Properties about candidate MBBs that hold for all of them.
67856785
unsigned FlagsSetInAll = 0xF;
6786-
6787-
// Compute liveness information for each candidate, and set FlagsSetInAll.
67886786
std::for_each(RepeatedSequenceLocs.begin(), RepeatedSequenceLocs.end(),
67896787
[&FlagsSetInAll](outliner::Candidate &C) {
67906788
FlagsSetInAll &= C.Flags;
67916789
});
67926790

6793-
// According to the AArch64 Procedure Call Standard, the following are
6794-
// undefined on entry/exit from a function call:
6795-
//
6796-
// * Registers x16, x17, (and thus w16, w17)
6797-
// * Condition codes (and thus the NZCV register)
6798-
//
6799-
// Because if this, we can't outline any sequence of instructions where
6800-
// one
6801-
// of these registers is live into/across it. Thus, we need to delete
6802-
// those
6803-
// candidates.
6804-
auto CantGuaranteeValueAcrossCall = [&TRI](outliner::Candidate &C) {
6805-
// If the unsafe registers in this block are all dead, then we don't need
6806-
// to compute liveness here.
6807-
if (C.Flags & UnsafeRegsDead)
6808-
return false;
6809-
return C.isAnyUnavailableAcrossOrOutOfSeq(
6810-
{AArch64::W16, AArch64::W17, AArch64::NZCV}, TRI);
6811-
};
6812-
6813-
// Are there any candidates where those registers are live?
6814-
if (!(FlagsSetInAll & UnsafeRegsDead)) {
6815-
// Erase every candidate that violates the restrictions above. (It could be
6816-
// true that we have viable candidates, so it's not worth bailing out in
6817-
// the case that, say, 1 out of 20 candidates violate the restructions.)
6818-
llvm::erase_if(RepeatedSequenceLocs, CantGuaranteeValueAcrossCall);
6819-
6820-
// If the sequence doesn't have enough candidates left, then we're done.
6821-
if (RepeatedSequenceLocs.size() < 2)
6822-
return outliner::OutlinedFunction();
6823-
}
6824-
6825-
// At this point, we have only "safe" candidates to outline. Figure out
6826-
// frame + call instruction information.
6827-
68286791
unsigned LastInstrOpcode = RepeatedSequenceLocs[0].back()->getOpcode();
68296792

68306793
// Helper lambda which sets call information for every candidate.
@@ -6952,6 +6915,10 @@ outliner::OutlinedFunction AArch64InstrInfo::getOutliningCandidateInfo(
69526915

69536916
// Check if we have to save LR.
69546917
for (outliner::Candidate &C : RepeatedSequenceLocs) {
6918+
bool LRAvailable =
6919+
(C.Flags & MachineOutlinerMBBFlags::LRUnavailableSomewhere)
6920+
? C.isAvailableAcrossAndOutOfSeq(AArch64::LR, TRI)
6921+
: true;
69556922
// If we have a noreturn caller, then we're going to be conservative and
69566923
// say that we have to save LR. If we don't have a ret at the end of the
69576924
// block, then we can't reason about liveness accurately.
@@ -6962,7 +6929,7 @@ outliner::OutlinedFunction AArch64InstrInfo::getOutliningCandidateInfo(
69626929
C.getMF()->getFunction().hasFnAttribute(Attribute::NoReturn);
69636930

69646931
// Is LR available? If so, we don't need a save.
6965-
if (C.isAvailableAcrossAndOutOfSeq(AArch64::LR, TRI) && !IsNoReturn) {
6932+
if (LRAvailable && !IsNoReturn) {
69666933
NumBytesNoStackCalls += 4;
69676934
C.setCallInfo(MachineOutlinerNoLRSave, 4);
69686935
CandidatesWithoutStackFixups.push_back(C);
@@ -7134,72 +7101,88 @@ bool AArch64InstrInfo::isFunctionSafeToOutlineFrom(
71347101
return true;
71357102
}
71367103

7137-
bool AArch64InstrInfo::isMBBSafeToOutlineFrom(MachineBasicBlock &MBB,
7138-
unsigned &Flags) const {
7139-
if (!TargetInstrInfo::isMBBSafeToOutlineFrom(MBB, Flags))
7140-
return false;
7141-
// Check if LR is available through all of the MBB. If it's not, then set
7142-
// a flag.
7104+
SmallVector<std::pair<MachineBasicBlock::iterator, MachineBasicBlock::iterator>>
7105+
AArch64InstrInfo::getOutlinableRanges(MachineBasicBlock &MBB,
7106+
unsigned &Flags) const {
71437107
assert(MBB.getParent()->getRegInfo().tracksLiveness() &&
7144-
"Suitable Machine Function for outlining must track liveness");
7145-
LiveRegUnits LRU(getRegisterInfo());
7108+
"Must track liveness!");
7109+
SmallVector<
7110+
std::pair<MachineBasicBlock::iterator, MachineBasicBlock::iterator>>
7111+
Ranges;
71467112

7147-
std::for_each(MBB.rbegin(), MBB.rend(),
7148-
[&LRU](MachineInstr &MI) { LRU.accumulate(MI); });
7113+
// The range [RangeBegin, RangeEnd).
7114+
MachineBasicBlock::instr_iterator RangeEnd = MBB.instr_end();
7115+
MachineBasicBlock::instr_iterator RangeBegin = RangeEnd;
7116+
unsigned RangeLen = 0;
71497117

7150-
// Check if each of the unsafe registers are available...
7151-
bool W16AvailableInBlock = LRU.available(AArch64::W16);
7152-
bool W17AvailableInBlock = LRU.available(AArch64::W17);
7153-
bool NZCVAvailableInBlock = LRU.available(AArch64::NZCV);
7118+
// According to the AArch64 Procedure Call Standard, the following are
7119+
// undefined on entry/exit from a function call:
7120+
//
7121+
// * Registers x16, x17, (and thus w16, w17)
7122+
// * Condition codes (and thus the NZCV register)
7123+
//
7124+
// If any of these registers are used inside or live across an outlined
7125+
// function, then they may be modified later, either by the compiler or
7126+
// some other tool (like the linker).
7127+
//
7128+
// To avoid outlining in these situations, partition each block into ranges
7129+
// where these registers are dead. We will only outline from those ranges.
7130+
LiveRegUnits LRU(getRegisterInfo());
7131+
auto AreAllUnsafeRegsDead = [&LRU]() {
7132+
return LRU.available(AArch64::W16) && LRU.available(AArch64::W17) &&
7133+
LRU.available(AArch64::NZCV);
7134+
};
71547135

7155-
// If all of these are dead (and not live out), we know we don't have to check
7156-
// them later.
7157-
if (W16AvailableInBlock && W17AvailableInBlock && NZCVAvailableInBlock)
7158-
Flags |= MachineOutlinerMBBFlags::UnsafeRegsDead;
7136+
// We need to know if LR is live across an outlining boundary later on in
7137+
// order to decide how we'll create the outlined call, frame, etc.
7138+
//
7139+
// It's pretty expensive to check this for *every candidate* within a block.
7140+
// That's some potentially n^2 behaviour, since in the worst case, we'd need
7141+
// to compute liveness from the end of the block for O(n) candidates within
7142+
// the block.
7143+
//
7144+
// So, to improve the average case, let's keep track of liveness from the end
7145+
// of the block to the beginning of *every outlinable range*. If we know that
7146+
// LR is available in every range we could outline from, then we know that
7147+
// we don't need to check liveness for any candidate within that range.
7148+
bool LRAvailableEverywhere = true;
71597149

7160-
// Now, add the live outs to the set.
7150+
// Compute liveness bottom-up.
71617151
LRU.addLiveOuts(MBB);
7162-
7163-
// If any of these registers is available in the MBB, but also a live out of
7164-
// the block, then we know outlining is unsafe.
7165-
if (W16AvailableInBlock && !LRU.available(AArch64::W16))
7166-
return false;
7167-
if (W17AvailableInBlock && !LRU.available(AArch64::W17))
7168-
return false;
7169-
if (NZCVAvailableInBlock && !LRU.available(AArch64::NZCV))
7170-
return false;
7171-
7172-
// Check if there's a call inside this MachineBasicBlock. If there is, then
7173-
// set a flag.
7174-
if (any_of(MBB, [](MachineInstr &MI) { return MI.isCall(); }))
7175-
Flags |= MachineOutlinerMBBFlags::HasCalls;
7176-
7177-
MachineFunction *MF = MBB.getParent();
7178-
7179-
// In the event that we outline, we may have to save LR. If there is an
7180-
// available register in the MBB, then we'll always save LR there. Check if
7181-
// this is true.
7182-
bool CanSaveLR = false;
7183-
const AArch64RegisterInfo *ARI = static_cast<const AArch64RegisterInfo *>(
7184-
MF->getSubtarget().getRegisterInfo());
7185-
7186-
// Check if there is an available register across the sequence that we can
7187-
// use.
7188-
for (unsigned Reg : AArch64::GPR64RegClass) {
7189-
if (!ARI->isReservedReg(*MF, Reg) && Reg != AArch64::LR &&
7190-
Reg != AArch64::X16 && Reg != AArch64::X17 && LRU.available(Reg)) {
7191-
CanSaveLR = true;
7192-
break;
7152+
for (auto &MI : make_range(MBB.instr_rbegin(), MBB.instr_rend())) {
7153+
LRU.stepBackward(MI);
7154+
// If we are in a range where all of the unsafe registers are dead, then
7155+
// update the beginning of the range. Also try to precalculate some stuff
7156+
// for getOutliningCandidateInfo.
7157+
if (AreAllUnsafeRegsDead()) {
7158+
if (MI.isCall())
7159+
Flags |= MachineOutlinerMBBFlags::HasCalls;
7160+
LRAvailableEverywhere &= LRU.available(AArch64::LR);
7161+
RangeBegin = MI.getIterator();
7162+
++RangeLen;
7163+
continue;
71937164
}
7194-
}
7195-
7196-
// Check if we have a register we can save LR to, and if LR was used
7197-
// somewhere. If both of those things are true, then we need to evaluate the
7198-
// safety of outlining stack instructions later.
7199-
if (!CanSaveLR && !LRU.available(AArch64::LR))
7165+
// At least one unsafe register is not dead. We do not want to outline at
7166+
// this point. If it is long enough to outline from, save the range
7167+
// [RangeBegin, RangeEnd).
7168+
if (RangeLen > 1)
7169+
Ranges.push_back(std::make_pair(RangeBegin, RangeEnd));
7170+
// Start a new range where RangeEnd is the first known unsafe point.
7171+
RangeLen = 0;
7172+
RangeBegin = MI.getIterator();
7173+
RangeEnd = MI.getIterator();
7174+
}
7175+
// Above loop misses the last (or only) range.
7176+
if (AreAllUnsafeRegsDead() && RangeLen > 1)
7177+
Ranges.push_back(std::make_pair(RangeBegin, RangeEnd));
7178+
if (Ranges.empty())
7179+
return Ranges;
7180+
// We found the ranges bottom-up. Mapping expects the top-down. Reverse
7181+
// the order.
7182+
std::reverse(Ranges.begin(), Ranges.end());
7183+
if (!LRAvailableEverywhere)
72007184
Flags |= MachineOutlinerMBBFlags::LRUnavailableSomewhere;
7201-
7202-
return true;
7185+
return Ranges;
72037186
}
72047187

72057188
outliner::InstrType

llvm/lib/Target/AArch64/AArch64InstrInfo.h

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -280,10 +280,11 @@ class AArch64InstrInfo final : public AArch64GenInstrInfo {
280280
bool OutlineFromLinkOnceODRs) const override;
281281
outliner::OutlinedFunction getOutliningCandidateInfo(
282282
std::vector<outliner::Candidate> &RepeatedSequenceLocs) const override;
283-
outliner::InstrType
284-
getOutliningType(MachineBasicBlock::iterator &MIT, unsigned Flags) const override;
285-
bool isMBBSafeToOutlineFrom(MachineBasicBlock &MBB,
286-
unsigned &Flags) const override;
283+
outliner::InstrType getOutliningType(MachineBasicBlock::iterator &MIT,
284+
unsigned Flags) const override;
285+
SmallVector<
286+
std::pair<MachineBasicBlock::iterator, MachineBasicBlock::iterator>>
287+
getOutlinableRanges(MachineBasicBlock &MBB, unsigned &Flags) const override;
287288
void buildOutlinedFrame(MachineBasicBlock &MBB, MachineFunction &MF,
288289
const outliner::OutlinedFunction &OF) const override;
289290
MachineBasicBlock::iterator

0 commit comments

Comments
 (0)