-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[memprof] Add Version2 of IndexedMemProfRecord serialization #87455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm currently developing a new version of the indexed memprof format where we deduplicate call stacks in IndexedAllocationInfo::CallStack and IndexedMemProfRecord::CallSites. We refer to call stacks with integer IDs, namely CallStackId, just as we refer to Frame with FrameId. The deduplication will cut down the profile file size by 80% in a large memprof file of mine. As a step toward the goal, this patch teaches IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A subsequent patch will add Version2 support to llvm-profdata. The essense of the patch is to replace the serialization of a call stack, a vector of FrameIDs, with that of a CallStackId. That is: const IndexedAllocationInfo &N = ...; ... LE.write<uint64_t>(N.CallStack.size()); for (const FrameId &Id : N.CallStack) LE.write<FrameId>(Id); becomes: LE.write<CallStackId>(N.CSId);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A subsequent patch will add Version2 support to llvm-profdata.
Do you generally mean the reader / writer?
Yes. |
Removed CallSites-based comparison. Made Version a required parameter in serialize/deserialize/serializedSize.
@llvm/pr-subscribers-pgo Author: Kazu Hirata (kazutakahirata) ChangesI'm currently developing a new version of the indexed memprof format As a step toward the goal, this patch teaches The essense of the patch is to replace the serialization of a call const IndexedAllocationInfo &N = ...; becomes: LE.write<CallStackId>(N.CSId); Full diff: https://github.com/llvm/llvm-project/pull/87455.diff 3 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index ff00900a1466a8..3e93f086cd0f20 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -22,6 +22,8 @@ enum IndexedVersion : uint64_t {
Version0 = 0,
// Version 1: Added a version field to the header.
Version1 = 1,
+ // Version 2: Added a call stack table. Under development.
+ Version2 = 2,
};
constexpr uint64_t MinimumSupportedVersion = Version0;
@@ -289,23 +291,28 @@ struct IndexedAllocationInfo {
: CallStack(CS.begin(), CS.end()), CSId(CSId), Info(MB) {}
// Returns the size in bytes when this allocation info struct is serialized.
- size_t serializedSize() const {
- return sizeof(uint64_t) + // The number of frames to serialize.
- sizeof(FrameId) * CallStack.size() + // The callstack frame ids.
- PortableMemInfoBlock::serializedSize(); // The size of the payload.
+ size_t serializedSize(IndexedVersion Version) const {
+ size_t Size = 0;
+ if (Version <= Version1) {
+ // The number of frames to serialize.
+ Size += sizeof(uint64_t);
+ // The callstack frame ids.
+ Size += sizeof(FrameId) * CallStack.size();
+ } else {
+ // The CallStackId
+ Size += sizeof(CallStackId);
+ }
+ // The size of the payload.
+ Size += PortableMemInfoBlock::serializedSize();
+ return Size;
}
bool operator==(const IndexedAllocationInfo &Other) const {
if (Other.Info != Info)
return false;
- if (Other.CallStack.size() != CallStack.size())
+ if (Other.CSId != CSId)
return false;
-
- for (size_t J = 0; J < Other.CallStack.size(); J++) {
- if (Other.CallStack[J] != CallStack[J])
- return false;
- }
return true;
}
@@ -357,6 +364,9 @@ struct IndexedMemProfRecord {
// inline location list may include additional entries, users should pick
// the last entry in the list with the same function GUID.
llvm::SmallVector<llvm::SmallVector<FrameId>> CallSites;
+ // Conceptually the same as above. We are going to keep both CallSites and
+ // CallSiteIds while we are transitioning from CallSites to CallSitesIds.
+ llvm::SmallVector<CallStackId> CallSiteIds;
void clear() {
AllocSites.clear();
@@ -370,17 +380,22 @@ struct IndexedMemProfRecord {
CallSites.append(Other.CallSites);
}
- size_t serializedSize() const {
+ size_t serializedSize(IndexedVersion Version) const {
size_t Result = sizeof(GlobalValue::GUID);
for (const IndexedAllocationInfo &N : AllocSites)
- Result += N.serializedSize();
+ Result += N.serializedSize(Version);
// The number of callsites we have information for.
Result += sizeof(uint64_t);
- for (const auto &Frames : CallSites) {
- // The number of frame ids to serialize.
- Result += sizeof(uint64_t);
- Result += Frames.size() * sizeof(FrameId);
+ if (Version <= Version1) {
+ for (const auto &Frames : CallSites) {
+ // The number of frame ids to serialize.
+ Result += sizeof(uint64_t);
+ Result += Frames.size() * sizeof(FrameId);
+ }
+ } else {
+ // The CallStackId
+ Result += CallSiteIds.size() * sizeof(CallStackId);
}
return Result;
}
@@ -389,28 +404,25 @@ struct IndexedMemProfRecord {
if (Other.AllocSites.size() != AllocSites.size())
return false;
- if (Other.CallSites.size() != CallSites.size())
- return false;
-
for (size_t I = 0; I < AllocSites.size(); I++) {
if (AllocSites[I] != Other.AllocSites[I])
return false;
}
- for (size_t I = 0; I < CallSites.size(); I++) {
- if (CallSites[I] != Other.CallSites[I])
- return false;
- }
+ if (Other.CallSiteIds != CallSiteIds)
+ return false;
return true;
}
// Serializes the memprof records in \p Records to the ostream \p OS based
// on the schema provided in \p Schema.
- void serialize(const MemProfSchema &Schema, raw_ostream &OS);
+ void serialize(const MemProfSchema &Schema, raw_ostream &OS,
+ IndexedVersion Version);
// Deserializes memprof records from the Buffer.
static IndexedMemProfRecord deserialize(const MemProfSchema &Schema,
- const unsigned char *Buffer);
+ const unsigned char *Buffer,
+ IndexedVersion Version);
// Returns the GUID for the function name after canonicalization. For
// memprof, we remove any .llvm suffix added by LTO. MemProfRecords are
@@ -507,7 +519,7 @@ class RecordLookupTrait {
data_type ReadData(uint64_t K, const unsigned char *D,
offset_type /*Unused*/) {
- Record = IndexedMemProfRecord::deserialize(Schema, D);
+ Record = IndexedMemProfRecord::deserialize(Schema, D, Version1);
return Record;
}
@@ -546,7 +558,7 @@ class RecordWriterTrait {
endian::Writer LE(Out, llvm::endianness::little);
offset_type N = sizeof(K);
LE.write<offset_type>(N);
- offset_type M = V.serializedSize();
+ offset_type M = V.serializedSize(Version1);
LE.write<offset_type>(M);
return std::make_pair(N, M);
}
@@ -560,7 +572,7 @@ class RecordWriterTrait {
void EmitData(raw_ostream &Out, key_type_ref /*Unused*/, data_type_ref V,
offset_type /*Unused*/) {
assert(Schema != nullptr && "MemProf schema is not initialized!");
- V.serialize(*Schema, Out);
+ V.serialize(*Schema, Out, Version1);
// Clear the IndexedMemProfRecord which results in clearing/freeing its
// vectors of allocs and callsites. This is owned by the associated on-disk
// hash table, but unused after this point. See also the comment added to
diff --git a/llvm/lib/ProfileData/MemProf.cpp b/llvm/lib/ProfileData/MemProf.cpp
index 6c419811d59e2f..e5c5fb58c25ab9 100644
--- a/llvm/lib/ProfileData/MemProf.cpp
+++ b/llvm/lib/ProfileData/MemProf.cpp
@@ -12,31 +12,42 @@ namespace llvm {
namespace memprof {
void IndexedMemProfRecord::serialize(const MemProfSchema &Schema,
- raw_ostream &OS) {
+ raw_ostream &OS, IndexedVersion Version) {
using namespace support;
endian::Writer LE(OS, llvm::endianness::little);
LE.write<uint64_t>(AllocSites.size());
for (const IndexedAllocationInfo &N : AllocSites) {
- LE.write<uint64_t>(N.CallStack.size());
- for (const FrameId &Id : N.CallStack)
- LE.write<FrameId>(Id);
+ if (Version <= Version1) {
+ LE.write<uint64_t>(N.CallStack.size());
+ for (const FrameId &Id : N.CallStack)
+ LE.write<FrameId>(Id);
+ } else {
+ LE.write<CallStackId>(N.CSId);
+ }
N.Info.serialize(Schema, OS);
}
// Related contexts.
- LE.write<uint64_t>(CallSites.size());
- for (const auto &Frames : CallSites) {
- LE.write<uint64_t>(Frames.size());
- for (const FrameId &Id : Frames)
- LE.write<FrameId>(Id);
+ if (Version <= Version1) {
+ LE.write<uint64_t>(CallSites.size());
+ for (const auto &Frames : CallSites) {
+ LE.write<uint64_t>(Frames.size());
+ for (const FrameId &Id : Frames)
+ LE.write<FrameId>(Id);
+ }
+ } else {
+ LE.write<uint64_t>(CallSiteIds.size());
+ for (const auto &CSId : CallSiteIds)
+ LE.write<CallStackId>(CSId);
}
}
IndexedMemProfRecord
IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
- const unsigned char *Ptr) {
+ const unsigned char *Ptr,
+ IndexedVersion Version) {
using namespace support;
IndexedMemProfRecord Record;
@@ -46,14 +57,20 @@ IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
for (uint64_t I = 0; I < NumNodes; I++) {
IndexedAllocationInfo Node;
- const uint64_t NumFrames =
- endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
- for (uint64_t J = 0; J < NumFrames; J++) {
- const FrameId Id =
- endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
- Node.CallStack.push_back(Id);
+ if (Version <= Version1) {
+ const uint64_t NumFrames =
+ endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
+ for (uint64_t J = 0; J < NumFrames; J++) {
+ const FrameId Id =
+ endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
+ Node.CallStack.push_back(Id);
+ }
+ Node.CSId = hashCallStack(Node.CallStack);
+ } else {
+ Node.CSId =
+ endian::readNext<CallStackId, llvm::endianness::little, unaligned>(
+ Ptr);
}
- Node.CSId = hashCallStack(Node.CallStack);
Node.Info.deserialize(Schema, Ptr);
Ptr += PortableMemInfoBlock::serializedSize();
Record.AllocSites.push_back(Node);
@@ -63,16 +80,24 @@ IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
const uint64_t NumCtxs =
endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
for (uint64_t J = 0; J < NumCtxs; J++) {
- const uint64_t NumFrames =
- endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
- llvm::SmallVector<FrameId> Frames;
- Frames.reserve(NumFrames);
- for (uint64_t K = 0; K < NumFrames; K++) {
- const FrameId Id =
- endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
- Frames.push_back(Id);
+ if (Version <= Version1) {
+ const uint64_t NumFrames =
+ endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
+ llvm::SmallVector<FrameId> Frames;
+ Frames.reserve(NumFrames);
+ for (uint64_t K = 0; K < NumFrames; K++) {
+ const FrameId Id =
+ endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
+ Frames.push_back(Id);
+ }
+ Record.CallSites.push_back(Frames);
+ Record.CallSiteIds.push_back(hashCallStack(Frames));
+ } else {
+ CallStackId CSId =
+ endian::readNext<CallStackId, llvm::endianness::little, unaligned>(
+ Ptr);
+ Record.CallSiteIds.push_back(CSId);
}
- Record.CallSites.push_back(Frames);
}
return Record;
diff --git a/llvm/unittests/ProfileData/MemProfTest.cpp b/llvm/unittests/ProfileData/MemProfTest.cpp
index 1cca44e9b03707..f1aa6f37aa399f 100644
--- a/llvm/unittests/ProfileData/MemProfTest.cpp
+++ b/llvm/unittests/ProfileData/MemProfTest.cpp
@@ -265,7 +265,9 @@ TEST(MemProf, PortableWrapper) {
EXPECT_EQ(3UL, ReadBlock.getAllocCpuId());
}
-TEST(MemProf, RecordSerializationRoundTrip) {
+// Version0 and Version1 serialize IndexedMemProfRecord in the same format, so
+// we share one test.
+TEST(MemProf, RecordSerializationRoundTripVersion0And1) {
const MemProfSchema Schema = getFullSchema();
MemInfoBlock Info(/*size=*/16, /*access_count=*/7, /*alloc_timestamp=*/1000,
@@ -284,14 +286,47 @@ TEST(MemProf, RecordSerializationRoundTrip) {
Info);
}
Record.CallSites.assign(CallSites);
+ for (const auto &CS : CallSites)
+ Record.CallSiteIds.push_back(llvm::memprof::hashCallStack(CS));
std::string Buffer;
llvm::raw_string_ostream OS(Buffer);
- Record.serialize(Schema, OS);
+ Record.serialize(Schema, OS, llvm::memprof::Version0);
OS.flush();
const IndexedMemProfRecord GotRecord = IndexedMemProfRecord::deserialize(
- Schema, reinterpret_cast<const unsigned char *>(Buffer.data()));
+ Schema, reinterpret_cast<const unsigned char *>(Buffer.data()),
+ llvm::memprof::Version0);
+
+ EXPECT_EQ(Record, GotRecord);
+}
+
+TEST(MemProf, RecordSerializationRoundTripVerion2) {
+ const MemProfSchema Schema = getFullSchema();
+
+ MemInfoBlock Info(/*size=*/16, /*access_count=*/7, /*alloc_timestamp=*/1000,
+ /*dealloc_timestamp=*/2000, /*alloc_cpu=*/3,
+ /*dealloc_cpu=*/4);
+
+ llvm::SmallVector<llvm::memprof::CallStackId> CallStackIds = {0x123, 0x456};
+
+ llvm::SmallVector<llvm::memprof::CallStackId> CallSiteIds = {0x333, 0x444};
+
+ IndexedMemProfRecord Record;
+ for (const auto &CSId : CallStackIds) {
+ // Use the same info block for both allocation sites.
+ Record.AllocSites.emplace_back(llvm::SmallVector<FrameId>(), CSId, Info);
+ }
+ Record.CallSiteIds.assign(CallSiteIds);
+
+ std::string Buffer;
+ llvm::raw_string_ostream OS(Buffer);
+ Record.serialize(Schema, OS, llvm::memprof::Version2);
+ OS.flush();
+
+ const IndexedMemProfRecord GotRecord = IndexedMemProfRecord::deserialize(
+ Schema, reinterpret_cast<const unsigned char *>(Buffer.data()),
+ llvm::memprof::Version2);
EXPECT_EQ(Record, GotRecord);
}
|
I believe I've addressed all the comments. Please take a look at the latest iteration. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to have a higher level distinction of versioning instead of pushing the condition down into each method. I'm trying to think about how difficult it will be in the future to evolve to the next version. For example, say we have serializedSizeV1()
as the old implementation and serializedSizeV2()
as the new implementation and serializedSize()
calls one or the other based on the version. Pushing it down one level makes it easy to just delete the old function when we introduce v3. In this case the difference is not large since the logic varies only a little but it may make sense for more complex code. What do you both think?
Do you mean something like this?
If so, I like the style. As you point out, I like the ability to easily delete an old version. If there is a lot of code duplication between two versions, we can always factor that out to a helper function, so the code duplication shouldn't really be a big concern. |
I've split |
I've moved version-specific implementation to |
I've included changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but I do have one question below.
I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites. We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId. The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.
As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A
subsequent patch will add Version2 support to llvm-profdata.
The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId. That is:
const IndexedAllocationInfo &N = ...;
...
LE.write<uint64_t>(N.CallStack.size());
for (const FrameId &Id : N.CallStack)
LE.write(Id);
becomes:
LE.write(N.CSId);