-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[MemProf] Track and report profiled sizes through cloning #98382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If requested, via the -memprof-report-hinted-sizes option, track the total profiled size of each MIB through the thin link, then report on the corresponding allocation coldness after all cloning is complete. To save size, a different bitcode record type is used for the allocation info when the option is specified, and the sizes are kept separate from the MIBs in the index.
@llvm/pr-subscribers-lto @llvm/pr-subscribers-llvm-ir Author: Teresa Johnson (teresajohnson) ChangesIf requested, via the -memprof-report-hinted-sizes option, track the To save size, a different bitcode record type is used for the allocation Patch is 28.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98382.diff 10 Files Affected:
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 5b5e08b5cbc3f..9f989aaf22779 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -319,6 +319,16 @@ enum GlobalValueSummarySymtabCodes {
// numver x version]
FS_COMBINED_ALLOC_INFO = 29,
FS_STACK_IDS = 30,
+ // Summary of per-module allocation memprof metadata when we are tracking
+ // total sizes.
+ // [n x (alloc type, total size, nummib, nummib x stackidindex)]
+ FS_PERMODULE_ALLOC_INFO_TOTAL_SIZES = 31,
+ // Summary of combined index allocation memprof metadata when we are tracking
+ // total sizes.
+ // [nummib, numver,
+ // nummib x (alloc type, total size, numstackids,
+ // numstackids x stackidindex), numver x version]
+ FS_COMBINED_ALLOC_INFO_TOTAL_SIZES = 32,
};
enum MetadataCodes {
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 31271ed388e54..950c0ed824dbb 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -403,6 +403,10 @@ struct AllocInfo {
// Vector of MIBs in this memprof metadata.
std::vector<MIBInfo> MIBs;
+ // If requested, keep track of total profiled sizes for each MIB. This will be
+ // a vector of the same length and order as the MIBs vector.
+ std::unique_ptr<std::vector<uint64_t>> TotalSizes;
+
AllocInfo(std::vector<MIBInfo> MIBs) : MIBs(std::move(MIBs)) {
Versions.push_back(0);
}
@@ -423,6 +427,16 @@ inline raw_ostream &operator<<(raw_ostream &OS, const AllocInfo &AE) {
for (auto &M : AE.MIBs) {
OS << "\t\t" << M << "\n";
}
+ if (AE.TotalSizes) {
+ OS << " TotalSizes per MIB:\n\t\t";
+ First = true;
+ for (auto &TS : *AE.TotalSizes) {
+ if (!First)
+ OS << ", ";
+ First = false;
+ OS << TS << "\n";
+ }
+ }
return OS;
}
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h b/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h
index b2747d24c5396..26bf08f9f7439 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h
@@ -224,6 +224,7 @@ template <> struct CustomMappingTraits<GlobalValueSummaryMapTy> {
V.emplace(RefGUID, /*IsAnalysis=*/false);
Refs.push_back(ValueInfo(/*IsAnalysis=*/false, &*V.find(RefGUID)));
}
+ std::vector<AllocInfo> EmptyAllocInfo;
Elem.SummaryList.push_back(std::make_unique<FunctionSummary>(
GlobalValueSummary::GVFlags(
static_cast<GlobalValue::LinkageTypes>(FSum.Linkage),
@@ -238,7 +239,7 @@ template <> struct CustomMappingTraits<GlobalValueSummaryMapTy> {
std::move(FSum.TypeTestAssumeConstVCalls),
std::move(FSum.TypeCheckedLoadConstVCalls),
ArrayRef<FunctionSummary::ParamAccess>{}, ArrayRef<CallsiteInfo>{},
- ArrayRef<AllocInfo>{}));
+ std::move(EmptyAllocInfo)));
}
}
static void output(IO &io, GlobalValueSummaryMapTy &V) {
diff --git a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
index 94ac0484f5ec7..c3b6fd8bc20af 100644
--- a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
+++ b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
@@ -85,6 +85,8 @@ extern cl::opt<bool> ScalePartialSampleProfileWorkingSetSize;
extern cl::opt<unsigned> MaxNumVTableAnnotations;
+extern cl::opt<bool> MemProfReportHintedSizes;
+
// Walk through the operands of a given User via worklist iteration and populate
// the set of GlobalValue references encountered. Invoked either on an
// Instruction or a GlobalVariable (which walks its initializer).
@@ -517,6 +519,7 @@ static void computeFunctionSummary(
auto *MemProfMD = I.getMetadata(LLVMContext::MD_memprof);
if (MemProfMD) {
std::vector<MIBInfo> MIBs;
+ std::vector<uint64_t> TotalSizes;
for (auto &MDOp : MemProfMD->operands()) {
auto *MIBMD = cast<const MDNode>(MDOp);
MDNode *StackNode = getMIBStackNode(MIBMD);
@@ -536,8 +539,18 @@ static void computeFunctionSummary(
}
MIBs.push_back(
MIBInfo(getMIBAllocType(MIBMD), std::move(StackIdIndices)));
+ if (MemProfReportHintedSizes) {
+ auto TotalSize = getMIBTotalSize(MIBMD);
+ assert(TotalSize);
+ TotalSizes.push_back(TotalSize);
+ }
}
Allocs.push_back(AllocInfo(std::move(MIBs)));
+ if (MemProfReportHintedSizes) {
+ assert(Allocs.back().MIBs.size() == TotalSizes.size());
+ Allocs.back().TotalSizes =
+ std::make_unique<std::vector<uint64_t>>(std::move(TotalSizes));
+ }
} else if (!InstCallsite.empty()) {
SmallVector<unsigned> StackIdIndices;
for (auto StackId : InstCallsite)
@@ -935,6 +948,7 @@ ModuleSummaryIndex llvm::buildModuleSummaryIndex(
CantBePromoted.insert(GV->getGUID());
// Create the appropriate summary type.
if (Function *F = dyn_cast<Function>(GV)) {
+ std::vector<AllocInfo> EmptyAllocInfo;
std::unique_ptr<FunctionSummary> Summary =
std::make_unique<FunctionSummary>(
GVFlags, /*InstCount=*/0,
@@ -957,7 +971,7 @@ ModuleSummaryIndex llvm::buildModuleSummaryIndex(
ArrayRef<FunctionSummary::ConstVCall>{},
ArrayRef<FunctionSummary::ConstVCall>{},
ArrayRef<FunctionSummary::ParamAccess>{},
- ArrayRef<CallsiteInfo>{}, ArrayRef<AllocInfo>{});
+ ArrayRef<CallsiteInfo>{}, std::move(EmptyAllocInfo));
Index.addGlobalValueSummary(*GV, std::move(Summary));
} else {
std::unique_ptr<GlobalVarSummary> Summary =
diff --git a/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp b/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
index b7ed9cdf63145..82aa7b7e077a1 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
@@ -328,6 +328,8 @@ GetCodeName(unsigned CodeID, unsigned BlockID,
STRINGIFY_CODE(FS, COMBINED_CALLSITE_INFO)
STRINGIFY_CODE(FS, COMBINED_ALLOC_INFO)
STRINGIFY_CODE(FS, STACK_IDS)
+ STRINGIFY_CODE(FS, PERMODULE_ALLOC_INFO_TOTAL_SIZES)
+ STRINGIFY_CODE(FS, COMBINED_ALLOC_INFO_TOTAL_SIZES)
}
case bitc::METADATA_ATTACHMENT_ID:
switch (CodeID) {
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index f56b2b32ff98f..1cba859a3fade 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -7991,12 +7991,19 @@ Error ModuleSummaryIndexBitcodeReader::parseEntireSummary(unsigned ID) {
break;
}
- case bitc::FS_PERMODULE_ALLOC_INFO: {
+ case bitc::FS_PERMODULE_ALLOC_INFO:
+ case bitc::FS_PERMODULE_ALLOC_INFO_TOTAL_SIZES: {
+ bool HasTotalSizes = BitCode == bitc::FS_PERMODULE_ALLOC_INFO_TOTAL_SIZES;
unsigned I = 0;
std::vector<MIBInfo> MIBs;
+ std::vector<uint64_t> TotalSizes;
while (I < Record.size()) {
- assert(Record.size() - I >= 2);
+ assert(Record.size() - I >= (HasTotalSizes ? 3 : 2));
AllocationType AllocType = (AllocationType)Record[I++];
+ if (HasTotalSizes) {
+ TotalSizes.push_back(Record[I++]);
+ assert(TotalSizes.back());
+ }
unsigned NumStackEntries = Record[I++];
assert(Record.size() - I >= NumStackEntries);
SmallVector<unsigned> StackIdList;
@@ -8008,18 +8015,31 @@ Error ModuleSummaryIndexBitcodeReader::parseEntireSummary(unsigned ID) {
MIBs.push_back(MIBInfo(AllocType, std::move(StackIdList)));
}
PendingAllocs.push_back(AllocInfo(std::move(MIBs)));
+ assert(HasTotalSizes != TotalSizes.empty());
+ if (HasTotalSizes) {
+ assert(PendingAllocs.back().MIBs.size() == TotalSizes.size());
+ PendingAllocs.back().TotalSizes =
+ std::make_unique<std::vector<uint64_t>>(std::move(TotalSizes));
+ }
break;
}
- case bitc::FS_COMBINED_ALLOC_INFO: {
+ case bitc::FS_COMBINED_ALLOC_INFO:
+ case bitc::FS_COMBINED_ALLOC_INFO_TOTAL_SIZES: {
+ bool HasTotalSizes = BitCode == bitc::FS_COMBINED_ALLOC_INFO_TOTAL_SIZES;
unsigned I = 0;
std::vector<MIBInfo> MIBs;
+ std::vector<uint64_t> TotalSizes;
unsigned NumMIBs = Record[I++];
unsigned NumVersions = Record[I++];
unsigned MIBsRead = 0;
while (MIBsRead++ < NumMIBs) {
- assert(Record.size() - I >= 2);
+ assert(Record.size() - I >= (HasTotalSizes ? 3 : 2));
AllocationType AllocType = (AllocationType)Record[I++];
+ if (HasTotalSizes) {
+ TotalSizes.push_back(Record[I++]);
+ assert(TotalSizes.back());
+ }
unsigned NumStackEntries = Record[I++];
assert(Record.size() - I >= NumStackEntries);
SmallVector<unsigned> StackIdList;
@@ -8036,6 +8056,12 @@ Error ModuleSummaryIndexBitcodeReader::parseEntireSummary(unsigned ID) {
Versions.push_back(Record[I++]);
PendingAllocs.push_back(
AllocInfo(std::move(Versions), std::move(MIBs)));
+ assert(HasTotalSizes != TotalSizes.empty());
+ if (HasTotalSizes) {
+ assert(PendingAllocs.back().MIBs.size() == TotalSizes.size());
+ PendingAllocs.back().TotalSizes =
+ std::make_unique<std::vector<uint64_t>>(std::move(TotalSizes));
+ }
break;
}
}
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 3378931065f9b..7b8f6371b5698 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -230,7 +230,8 @@ class ModuleBitcodeWriterBase : public BitcodeWriterBase {
void writePerModuleFunctionSummaryRecord(
SmallVector<uint64_t, 64> &NameVals, GlobalValueSummary *Summary,
unsigned ValueID, unsigned FSCallsAbbrev, unsigned FSCallsProfileAbbrev,
- unsigned CallsiteAbbrev, unsigned AllocAbbrev, const Function &F);
+ unsigned CallsiteAbbrev, unsigned AllocAbbrev,
+ unsigned AllocTotalSizeAbbrev, const Function &F);
void writeModuleLevelReferences(const GlobalVariable &V,
SmallVector<uint64_t, 64> &NameVals,
unsigned FSModRefsAbbrev,
@@ -4158,7 +4159,7 @@ static void writeTypeIdCompatibleVtableSummaryRecord(
static void writeFunctionHeapProfileRecords(
BitstreamWriter &Stream, FunctionSummary *FS, unsigned CallsiteAbbrev,
- unsigned AllocAbbrev, bool PerModule,
+ unsigned AllocAbbrev, unsigned AllocTotalSizeAbbrev, bool PerModule,
std::function<unsigned(const ValueInfo &VI)> GetValueID,
std::function<unsigned(unsigned)> GetStackIndex) {
SmallVector<uint64_t> Record;
@@ -4193,19 +4194,31 @@ static void writeFunctionHeapProfileRecords(
Record.push_back(AI.MIBs.size());
Record.push_back(AI.Versions.size());
}
+ unsigned I = 0;
+ assert(!AI.TotalSizes || AI.TotalSizes->size() == AI.MIBs.size());
for (auto &MIB : AI.MIBs) {
Record.push_back((uint8_t)MIB.AllocType);
+ if (AI.TotalSizes) {
+ Record.push_back((*AI.TotalSizes)[I]);
+ assert((*AI.TotalSizes)[I]);
+ }
Record.push_back(MIB.StackIdIndices.size());
for (auto Id : MIB.StackIdIndices)
Record.push_back(GetStackIndex(Id));
+ I++;
}
if (!PerModule) {
for (auto V : AI.Versions)
Record.push_back(V);
}
- Stream.EmitRecord(PerModule ? bitc::FS_PERMODULE_ALLOC_INFO
- : bitc::FS_COMBINED_ALLOC_INFO,
- Record, AllocAbbrev);
+ unsigned Code =
+ PerModule ? (AI.TotalSizes ? bitc::FS_PERMODULE_ALLOC_INFO_TOTAL_SIZES
+ : bitc::FS_PERMODULE_ALLOC_INFO)
+ : (AI.TotalSizes ? bitc::FS_COMBINED_ALLOC_INFO_TOTAL_SIZES
+ : bitc::FS_COMBINED_ALLOC_INFO);
+ unsigned AllocAbbrevToUse =
+ AI.TotalSizes ? AllocTotalSizeAbbrev : AllocAbbrev;
+ Stream.EmitRecord(Code, Record, AllocAbbrevToUse);
}
}
@@ -4214,7 +4227,7 @@ void ModuleBitcodeWriterBase::writePerModuleFunctionSummaryRecord(
SmallVector<uint64_t, 64> &NameVals, GlobalValueSummary *Summary,
unsigned ValueID, unsigned FSCallsRelBFAbbrev,
unsigned FSCallsProfileAbbrev, unsigned CallsiteAbbrev,
- unsigned AllocAbbrev, const Function &F) {
+ unsigned AllocAbbrev, unsigned AllocTotalSizeAbbrev, const Function &F) {
NameVals.push_back(ValueID);
FunctionSummary *FS = cast<FunctionSummary>(Summary);
@@ -4225,7 +4238,7 @@ void ModuleBitcodeWriterBase::writePerModuleFunctionSummaryRecord(
});
writeFunctionHeapProfileRecords(
- Stream, FS, CallsiteAbbrev, AllocAbbrev,
+ Stream, FS, CallsiteAbbrev, AllocAbbrev, AllocTotalSizeAbbrev,
/*PerModule*/ true,
/*GetValueId*/ [&](const ValueInfo &VI) { return getValueId(VI); },
/*GetStackIndex*/ [&](unsigned I) { return I; });
@@ -4437,6 +4450,13 @@ void ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() {
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
unsigned AllocAbbrev = Stream.EmitAbbrev(std::move(Abbv));
+ Abbv = std::make_shared<BitCodeAbbrev>();
+ Abbv->Add(BitCodeAbbrevOp(bitc::FS_PERMODULE_ALLOC_INFO_TOTAL_SIZES));
+ // n x (alloc type, total size, numstackids, numstackids x stackidindex)
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
+ unsigned AllocTotalSizeAbbrev = Stream.EmitAbbrev(std::move(Abbv));
+
SmallVector<uint64_t, 64> NameVals;
// Iterate over the list of functions instead of the Index to
// ensure the ordering is stable.
@@ -4454,9 +4474,10 @@ void ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() {
continue;
}
auto *Summary = VI.getSummaryList()[0].get();
- writePerModuleFunctionSummaryRecord(
- NameVals, Summary, VE.getValueID(&F), FSCallsRelBFAbbrev,
- FSCallsProfileAbbrev, CallsiteAbbrev, AllocAbbrev, F);
+ writePerModuleFunctionSummaryRecord(NameVals, Summary, VE.getValueID(&F),
+ FSCallsRelBFAbbrev,
+ FSCallsProfileAbbrev, CallsiteAbbrev,
+ AllocAbbrev, AllocTotalSizeAbbrev, F);
}
// Capture references from GlobalVariable initializers, which are outside
@@ -4586,6 +4607,16 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
return DecSummaries->count(GVS);
};
+ Abbv = std::make_shared<BitCodeAbbrev>();
+ Abbv->Add(BitCodeAbbrevOp(bitc::FS_COMBINED_ALLOC_INFO_TOTAL_SIZES));
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // nummib
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 4)); // numver
+ // nummib x (alloc type, total size, numstackids, numstackids x stackidindex),
+ // numver x version
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
+ unsigned AllocTotalSizeAbbrev = Stream.EmitAbbrev(std::move(Abbv));
+
// The aliases are emitted as a post-pass, and will point to the value
// id of the aliasee. Save them in a vector for post-processing.
SmallVector<AliasSummary *, 64> Aliases;
@@ -4673,9 +4704,10 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
getReferencedTypeIds(FS, ReferencedTypeIds);
writeFunctionHeapProfileRecords(
- Stream, FS, CallsiteAbbrev, AllocAbbrev,
+ Stream, FS, CallsiteAbbrev, AllocAbbrev, AllocTotalSizeAbbrev,
/*PerModule*/ false,
- /*GetValueId*/ [&](const ValueInfo &VI) -> unsigned {
+ /*GetValueId*/
+ [&](const ValueInfo &VI) -> unsigned {
std::optional<unsigned> ValueID = GetValueId(VI);
// This can happen in shared index files for distributed ThinLTO if
// the callee function summary is not included. Record 0 which we
@@ -4685,7 +4717,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
return 0;
return *ValueID;
},
- /*GetStackIndex*/ [&](unsigned I) {
+ /*GetStackIndex*/
+ [&](unsigned I) {
// Get the corresponding index into the list of StackIds actually
// being written for this combined index (which may be a subset in
// the case of distributed indexes).
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index c1e5ab1a2b561..f5713d31e263f 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -134,6 +134,8 @@ cl::opt<bool> SupportsHotColdNew(
cl::desc("Linking with hot/cold operator new interfaces"));
} // namespace llvm
+extern cl::opt<bool> MemProfReportHintedSizes;
+
namespace {
/// CRTP base for graphs built from either IR or ThinLTO summary index.
///
@@ -172,6 +174,7 @@ class CallsiteContextGraph {
void dump() const;
void print(raw_ostream &OS) const;
+ void printTotalSizes(raw_ostream &OS) const;
friend raw_ostream &operator<<(raw_ostream &OS,
const CallsiteContextGraph &CCG) {
@@ -439,7 +442,7 @@ class CallsiteContextGraph {
void addStackNodesForMIB(ContextNode *AllocNode,
CallStack<NodeT, IteratorT> &StackContext,
CallStack<NodeT, IteratorT> &CallsiteContext,
- AllocationType AllocType);
+ AllocationType AllocType, uint64_t TotalSize);
/// Matches all callsite metadata (or summary) to the nodes created for
/// allocation memprof MIB metadata, synthesizing new nodes to reflect any
@@ -611,6 +614,8 @@ class CallsiteContextGraph {
/// Map from each context ID to the AllocationType assigned to that context.
DenseMap<uint32_t, AllocationType> ContextIdToAllocationType;
+ std::map<uint32_t, uint64_t> ContextIdToTotalSize;
+
/// Identifies the context node created for a stack id when adding the MIB
/// contexts to the graph. This is used to locate the context nodes when
/// trying to assign the corresponding callsites with those stack ids to these
@@ -1004,11 +1009,24 @@ CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::addAllocNode(
return AllocNode;
}
+static std::string getAllocTypeString(uint8_t AllocTypes) {
+ if (!AllocTypes)
+ return "None";
+ std::string Str;
+ if (AllocTypes & (uint8_t)AllocationType::NotCold)
+ Str += "NotCold";
+ if (AllocTypes & (uint8_t)AllocationType::Cold)
+ Str += "Cold";
+ return Str;
+}
+
template <typename DerivedCCG, typename FuncTy, typename CallTy>
template <class NodeT, class IteratorT>
void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::addStackNodesForMIB(
ContextNode *AllocNode, CallStack<NodeT, IteratorT> &StackContext,
- CallStack<NodeT, IteratorT> &CallsiteContext, AllocationType AllocType) {
+ CallStack<NodeT, IteratorT> &CallsiteContext, AllocationType AllocType,
+ uint64_t TotalSize) {
+ assert(!MemProfReportHintedSizes || TotalSize > 0);
// Treating the hot alloc type as NotCold before the disambiguation for "hot"
// is done.
if (AllocType == AllocationType::Hot)
@@ -1016,6 +1034,11 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::addStackNodesForMIB(
ContextIdToAllocationType[++LastContextId] = AllocType;
+ if (MemProfReportHintedSizes) {
+ assert(TotalSize);
+ ContextIdToTotalSize[LastContextId] = TotalSize;
+ }
+
// Update alloc type and context ids for this MIB.
AllocNode->AllocTypes |= (uint8_t)AllocType;
@@ -1060,6 +1083,10 @@ CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::duplicateContextIds(
assert(ContextIdToAllocationType.count(OldId));
// The new context has the same allocation type as original.
ContextIdToAllocationType[LastContextId] = ContextIdToAllocationType[OldId];
+ // For now set this to 0 so we don't duplicate sizes. Not clear how to divvy
+ // up the size. Assume that if we are able to duplicate context ids that we
+ // will be...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the approach is to optionally track total size if requested. What is the overhead for this extra tracking? Can we consider making it unconditionally available all the time?
continue; | ||
if (!Node->IsAllocation) | ||
continue; | ||
auto ContextIds = Node->getContextIds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a reference to avoid a copy? Also spelling out the type would be helpful in reasoning about the usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the type. Can't make it a reference since as of another recent change the node context ids are computed on the fly.
@@ -611,6 +614,8 @@ class CallsiteContextGraph { | |||
/// Map from each context ID to the AllocationType assigned to that context. | |||
DenseMap<uint32_t, AllocationType> ContextIdToAllocationType; | |||
|
|||
std::map<uint32_t, uint64_t> ContextIdToTotalSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be an unordered map since we don't iterate on the contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, according to https://llvm.org/docs/ProgrammersManual.html#other-map-like-container-options:
"We never use containers like unordered_map because they are generally very expensive (each insertion requires a malloc)."
But I do see that I use unordered_map elsewhere in this file, probably due to a previous review comment. Is the advice in the documentation outdated?
In any case this should probably at least be a DenseMap, will make that change for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also documented the new map.
@@ -935,6 +948,7 @@ ModuleSummaryIndex llvm::buildModuleSummaryIndex( | |||
CantBePromoted.insert(GV->getGUID()); | |||
// Create the appropriate summary type. | |||
if (Function *F = dyn_cast<Function>(GV)) { | |||
std::vector<AllocInfo> EmptyAllocInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need an empty vector to move here. Can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It silenced compiler errors I got =). Seems that with the unique_ptr in the AllocInfo it no longer liked the aggregate initialization I had previously. Here's one of the errors:
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_uninitialized.h:90:7: error: static assertion failed due to requirement 'is_constructible<llvm::AllocInfo, const llvm::AllocInfo &>::value': result type must be constructible from input type
static_assert(is_constructible<_ValueType, _Tp>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_uninitialized.h:182:4: note: in instantiation of function template specialization 'std::__check_constructible<llvm::AllocInfo, const llvm::AllocInfo &>' requested here
= _GLIBCXX_USE_ASSIGN_FOR_INIT(_ValueType2, _From);
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_uninitialized.h:101:13: note: expanded from macro '_GLIBCXX_USE_ASSIGN_FOR_INIT'
&& std::__check_constructible<T, U>()
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_uninitialized.h:373:19: note: in instantiation of function template specialization 'std::uninitialized_copy<const llvm::AllocInfo *, llvm::AllocInfo *>' requested here
return std::uninitialized_copy(__first, __last, __result);
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_vector.h:1692:11: note: in instantiation of function template specialization 'std::__uninitialized_copy_a<const llvm::AllocInfo *, llvm::AllocInfo *, llvm::AllocInfo>' requested here
std::__uninitialized_copy_a(__first, __last,
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_vector.h:708:4: note: in instantiation of function template specialization 'std::vector<llvm::AllocInfo>::_M_range_initialize<const llvm::AllocInfo *>' requested here
_M_range_initialize(__first, __last,
^
llvm/include/llvm/ADT/ArrayRef.h:288:14: note: in instantiation of function template specialization 'std::vector<llvm::AllocInfo>::vector<const llvm::AllocInfo *, void>' requested here
return std::vector<T>(Data, Data+Length);
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:1070:38: note: in instantiation of member function 'llvm::ArrayRef<llvm::AllocInfo>::operator vector' requested here
{ return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
^
llvm/lib/Analysis/ModuleSummaryAnalysis.cpp:953:22: note: in instantiation of function template specialization 'std::make_unique<llvm::FunctionSummary, llvm::GlobalValueSummary::GVFlags &, int, llvm::FunctionSummary::FFlags, int, llvm::ArrayRef<llvm::ValueInfo>, llvm::ArrayRef<std::pair<llvm::ValueInfo, llvm::CalleeInfo>>, llvm::ArrayRef<unsigned long>, llvm::ArrayRef<llvm::FunctionSummary::VFuncId>, llvm::ArrayRef<llvm::FunctionSummary::VFuncId>, llvm::ArrayRef<llvm::FunctionSummary::ConstVCall>, llvm::ArrayRef<llvm::FunctionSummary::ConstVCall>, llvm::ArrayRef<llvm::FunctionSummary::ParamAccess>, llvm::ArrayRef<llvm::CallsiteInfo>, llvm::ArrayRef<llvm::AllocInfo>>' requested here
std::make_unique<FunctionSummary>(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it makes sense to me now. The copy-constructor for AllocInfo is deleted due to the new unique_ptr member. I think an empty std::vector can also be used to indicate non-availability of the sizes? It only adds an additional 16b per AllocInfo (sizeof std::vector is 24b I think) and would avoid some of the complexity and overhead of indirection. Given some of the other discussion about making this unconditional, what do you think about dropping the unique_ptr and making it a plain std::vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it looks like that doesn't affect the overhead noticeably, so I have gone ahead with that simplification.
I haven't measured it, but am concerned about adding any additional memory or bitcode size overhead for something that is only used under an option. Would prefer to leave it optional and revisit this later if we find more uses for the information. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
@snehasish clang-format is irritated about my undoing the formatting changes it made. Are you ok with my putting those back? |
As discussed offline, to keep things simple we can include the total size field in the records, but just not carry the total size through when not requested. This avoids the need for separate record types. I measured the bitcode size overhead when including the size unconditionally with each MIB. Even when 0 valued, i.e. with the option disabled, for a large project this caused a 2%, 82M raw, aggregate increase in the summary bitcode file sizes. I instead settled on a different format: emit all the sizes separately at the end of the alloc info record array. We either have 0 or MIB of them. This would be 0 extra overhead when there are no total sizes (without the option), except for the fact that I now need to include one extra field in each per-module alloc info indicating the number of MIBs (this already was included in the combined index alloc info record). With this I see almost 0%, 100K raw, overhead (when the option is disabled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, with a few minor comments.
if (!AE.TotalSizes.empty()) { | ||
OS << " TotalSizes per MIB:\n\t\t"; | ||
First = true; | ||
for (auto &TS : AE.TotalSizes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There's no benefit to using a reference here since we don't mutate the value and the uint64 is just as wide as the pointer (on 64-bit platforms). Using auto hides this. Maybe spell out the type and drop the reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
; SIZES: NotCold context 1 with total size 100 is NotCold after cloning | ||
; SIZES: Cold context 2 with total size 400 is Cold after cloning | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed here and in other test
!4 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 8632435727821051414} | ||
!5 = !{!6, !"cold"} | ||
!5 = !{!6, !"cold", i64 400} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add another test case where the number of sizes listed is > 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't possible. Each MIB only has one aggregate size from the profile.
If requested, via the -memprof-report-hinted-sizes option, track the total profiled size of each MIB through the thin link, then report on the corresponding allocation coldness after all cloning is complete. To save size, a different bitcode record type is used for the allocation info when the option is specified, and the sizes are kept separate from the MIBs in the index.
This change XFAILs instrprof-gc-sections.c compiler-rt test case because it uses the `lld-available` lit.config tag which erroneously uses the `CMAKE_LINKER` if it is lld instead of checking for the latest and greatest lld available. A patch is in the works to fix this. This change is to bring the `clang-ppc64le-linux-test-suite` and `clang-ppc64le-linux-multistage` back to green as they were broken as of PR #98382
This change XFAILs instrprof-gc-sections.c compiler-rt test case because it uses the `lld-available` lit.config tag which erroneously uses the `CMAKE_LINKER` if it is lld instead of checking for the latest and greatest lld available. A patch is in the works to fix this. This change is to bring the `clang-ppc64le-linux-test-suite` and `clang-ppc64le-linux-multistage` back to green as they were broken as of PR #98382
…O summaries (llvm#90692) This reverts commit d71771d. Retrying now that it seems increasing BitcodeSummaryVersion worked in llvm#98382
If requested, via the -memprof-report-hinted-sizes option, track the
total profiled size of each MIB through the thin link, then report on
the corresponding allocation coldness after all cloning is complete.
To save size, a different bitcode record type is used for the allocation
info when the option is specified, and the sizes are kept separate from
the MIBs in the index.