-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[memprof] Use std::vector<Frame> instead of llvm::SmallVector<Frame> (NFC) #94432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…(NFC) This patch replaces llvm::SmallVector<Frame> with std::vector<Frame>. llvm::SmallVector<Frame> sets aside one inline element. Meanwhile, when I sort all call stacks by their lengths, the length at the first percentile is already 2. That is, 99 percent of call stacks do not take advantage of the inline element. Using std::vector<Frame> reduces the cycle and instruction counts by 4.5% and 10.3%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.
@llvm/pr-subscribers-pgo @llvm/pr-subscribers-llvm-transforms Author: Kazu Hirata (kazutakahirata) ChangesThis patch replaces llvm::SmallVector<Frame> with std::vector<Frame>. llvm::SmallVector<Frame> sets aside one inline element. Meanwhile, Using std::vector<Frame> reduces the cycle and instruction counts by Full diff: https://github.com/llvm/llvm-project/pull/94432.diff 3 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index 406144d9db1e8..667b9fa033d75 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -364,7 +364,7 @@ struct IndexedAllocationInfo {
// be used for temporary in-memory instances.
struct AllocationInfo {
// Same as IndexedAllocationInfo::CallStack with the frame contents inline.
- llvm::SmallVector<Frame> CallStack;
+ std::vector<Frame> CallStack;
// Same as IndexedAllocationInfo::Info;
PortableMemInfoBlock Info;
@@ -446,8 +446,7 @@ struct IndexedMemProfRecord {
// Convert IndexedMemProfRecord to MemProfRecord. Callback is used to
// translate CallStackId to call stacks with frames inline.
MemProfRecord toMemProfRecord(
- llvm::function_ref<llvm::SmallVector<Frame>(const CallStackId)> Callback)
- const;
+ llvm::function_ref<std::vector<Frame>(const CallStackId)> Callback) const;
// Returns the GUID for the function name after canonicalization. For
// memprof, we remove any .llvm suffix added by LTO. MemProfRecords are
@@ -462,7 +461,7 @@ struct MemProfRecord {
// Same as IndexedMemProfRecord::AllocSites with frame contents inline.
llvm::SmallVector<AllocationInfo> AllocSites;
// Same as IndexedMemProfRecord::CallSites with frame contents inline.
- llvm::SmallVector<llvm::SmallVector<Frame>> CallSites;
+ llvm::SmallVector<std::vector<Frame>> CallSites;
MemProfRecord() = default;
MemProfRecord(
@@ -472,7 +471,7 @@ struct MemProfRecord {
AllocSites.emplace_back(IndexedAI, IdToFrameCallback);
}
for (const ArrayRef<FrameId> Site : Record.CallSites) {
- llvm::SmallVector<Frame> Frames;
+ std::vector<Frame> Frames;
for (const FrameId Id : Site) {
Frames.push_back(IdToFrameCallback(Id));
}
@@ -490,7 +489,7 @@ struct MemProfRecord {
if (!CallSites.empty()) {
OS << " CallSites:\n";
- for (const llvm::SmallVector<Frame> &Frames : CallSites) {
+ for (const std::vector<Frame> &Frames : CallSites) {
for (const Frame &F : Frames) {
OS << " -\n";
F.printYAML(OS);
@@ -844,8 +843,8 @@ template <typename MapTy> struct CallStackIdConverter {
CallStackIdConverter(const CallStackIdConverter &) = delete;
CallStackIdConverter &operator=(const CallStackIdConverter &) = delete;
- llvm::SmallVector<Frame> operator()(CallStackId CSId) {
- llvm::SmallVector<Frame> Frames;
+ std::vector<Frame> operator()(CallStackId CSId) {
+ std::vector<Frame> Frames;
auto CSIter = Map.find(CSId);
if (CSIter == Map.end()) {
LastUnmappedId = CSId;
@@ -886,8 +885,8 @@ struct LinearCallStackIdConverter {
std::function<Frame(LinearFrameId)> FrameIdToFrame)
: CallStackBase(CallStackBase), FrameIdToFrame(FrameIdToFrame) {}
- llvm::SmallVector<Frame> operator()(LinearCallStackId LinearCSId) {
- llvm::SmallVector<Frame> Frames;
+ std::vector<Frame> operator()(LinearCallStackId LinearCSId) {
+ std::vector<Frame> Frames;
const unsigned char *Ptr =
CallStackBase +
diff --git a/llvm/lib/ProfileData/MemProf.cpp b/llvm/lib/ProfileData/MemProf.cpp
index 1d9860e0ea7e8..2b227a65c1d8f 100644
--- a/llvm/lib/ProfileData/MemProf.cpp
+++ b/llvm/lib/ProfileData/MemProf.cpp
@@ -338,8 +338,7 @@ IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
}
MemProfRecord IndexedMemProfRecord::toMemProfRecord(
- llvm::function_ref<llvm::SmallVector<Frame>(const CallStackId)> Callback)
- const {
+ llvm::function_ref<std::vector<Frame>(const CallStackId)> Callback) const {
MemProfRecord Record;
Record.AllocSites.reserve(AllocSites.size());
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
index c0a3bf8464d2d..d70c6a7a0a152 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
@@ -759,7 +759,7 @@ static void readMemprof(Module &M, Function &F,
std::map<uint64_t, std::set<const AllocationInfo *>> LocHashToAllocInfo;
// For the callsites we need to record the index of the associated frame in
// the frame array (see comments below where the map entries are added).
- std::map<uint64_t, std::set<std::pair<const SmallVector<Frame> *, unsigned>>>
+ std::map<uint64_t, std::set<std::pair<const std::vector<Frame> *, unsigned>>>
LocHashToCallSites;
for (auto &AI : MemProfRec->AllocSites) {
// Associate the allocation info with the leaf frame. The later matching
@@ -815,7 +815,7 @@ static void readMemprof(Module &M, Function &F,
// and another callsite).
std::map<uint64_t, std::set<const AllocationInfo *>>::iterator
AllocInfoIter;
- std::map<uint64_t, std::set<std::pair<const SmallVector<Frame> *,
+ std::map<uint64_t, std::set<std::pair<const std::vector<Frame> *,
unsigned>>>::iterator CallSitesIter;
for (const DILocation *DIL = I.getDebugLoc(); DIL != nullptr;
DIL = DIL->getInlinedAt()) {
|
Consider |
That's super interesting. Thanks! I just tried |
I thought it allocated a few elements (I guess more than 2 from what you found below)?
I'm surprised that almost all are <=2: the allocation call contexts should almost always be longer than 2 frames in practice. Is this dominated by the CallSites frames? I can imagine those are short. Maybe that should use std::vector (or SmallVector<...,0>), and the AllocationInfo CallStack remain SmallVector? |
No, by default, the number of inlined elements is the maximum of:
So,
I think you've got the percentiles backwards. I'm sorting the call stacks in the ascending order of their lengths. Here are the stats for call stack lengths:
So, we rarely use the inlined element of With the smaller baseline cycle/instruction counts thanks to #94655, which just landed, using |
Oh got it, yep I mis-read.
|
My recollection was that even |
I've tried |
This patch replaces llvm::SmallVector with std::vector.
llvm::SmallVector sets aside one inline element. Meanwhile,
when I sort all call stacks by their lengths, the length at the first
percentile is already 2. That is, 99 percent of call stacks do not
take advantage of the inline element.
Using std::vector reduces the cycle and instruction counts by
11% and 22%, respectively, with "llvm-profdata show" modified to
deserialize all MemProfRecords.