[memprof] Add Version2 of IndexedMemProfRecord serialization #87455

kazutakahirata · 2024-04-03T06:15:17Z

I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites. We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId. The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.

As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A
subsequent patch will add Version2 support to llvm-profdata.

The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId. That is:

const IndexedAllocationInfo &N = ...;
...
LE.write<uint64_t>(N.CallStack.size());
for (const FrameId &Id : N.CallStack)
LE.write(Id);

becomes:

LE.write(N.CSId);

I'm currently developing a new version of the indexed memprof format where we deduplicate call stacks in IndexedAllocationInfo::CallStack and IndexedMemProfRecord::CallSites. We refer to call stacks with integer IDs, namely CallStackId, just as we refer to Frame with FrameId. The deduplication will cut down the profile file size by 80% in a large memprof file of mine. As a step toward the goal, this patch teaches IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A subsequent patch will add Version2 support to llvm-profdata. The essense of the patch is to replace the serialization of a call stack, a vector of FrameIDs, with that of a CallStackId. That is: const IndexedAllocationInfo &N = ...; ... LE.write<uint64_t>(N.CallStack.size()); for (const FrameId &Id : N.CallStack) LE.write<FrameId>(Id); becomes: LE.write<CallStackId>(N.CSId);

teresajohnson

A subsequent patch will add Version2 support to llvm-profdata.

Do you generally mean the reader / writer?

llvm/include/llvm/ProfileData/MemProf.h

llvm/unittests/ProfileData/MemProfTest.cpp

llvm/include/llvm/ProfileData/MemProf.h

kazutakahirata · 2024-04-03T18:56:02Z

A subsequent patch will add Version2 support to llvm-profdata.

Do you generally mean the reader / writer?

Yes.

Removed CallSites-based comparison. Made Version a required parameter in serialize/deserialize/serializedSize.

llvmbot · 2024-04-03T19:26:08Z

@llvm/pr-subscribers-pgo

Author: Kazu Hirata (kazutakahirata)

Changes

I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites. We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId. The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.

As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A
subsequent patch will add Version2 support to llvm-profdata.

The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId. That is:

const IndexedAllocationInfo &N = ...;
...
LE.write<uint64_t>(N.CallStack.size());
for (const FrameId &Id : N.CallStack)
LE.write<FrameId>(Id);

becomes:

LE.write<CallStackId>(N.CSId);

Full diff: https://github.com/llvm/llvm-project/pull/87455.diff

3 Files Affected:

(modified) llvm/include/llvm/ProfileData/MemProf.h (+40-28)
(modified) llvm/lib/ProfileData/MemProf.cpp (+51-26)
(modified) llvm/unittests/ProfileData/MemProfTest.cpp (+38-3)

diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index ff00900a1466a8..3e93f086cd0f20 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -22,6 +22,8 @@ enum IndexedVersion : uint64_t {
   Version0 = 0,
   // Version 1: Added a version field to the header.
   Version1 = 1,
+  // Version 2: Added a call stack table.  Under development.
+  Version2 = 2,
 };
 
 constexpr uint64_t MinimumSupportedVersion = Version0;
@@ -289,23 +291,28 @@ struct IndexedAllocationInfo {
       : CallStack(CS.begin(), CS.end()), CSId(CSId), Info(MB) {}
 
   // Returns the size in bytes when this allocation info struct is serialized.
-  size_t serializedSize() const {
-    return sizeof(uint64_t) + // The number of frames to serialize.
-           sizeof(FrameId) * CallStack.size() +    // The callstack frame ids.
-           PortableMemInfoBlock::serializedSize(); // The size of the payload.
+  size_t serializedSize(IndexedVersion Version) const {
+    size_t Size = 0;
+    if (Version <= Version1) {
+      // The number of frames to serialize.
+      Size += sizeof(uint64_t);
+      // The callstack frame ids.
+      Size += sizeof(FrameId) * CallStack.size();
+    } else {
+      // The CallStackId
+      Size += sizeof(CallStackId);
+    }
+    // The size of the payload.
+    Size += PortableMemInfoBlock::serializedSize();
+    return Size;
   }
 
   bool operator==(const IndexedAllocationInfo &Other) const {
     if (Other.Info != Info)
       return false;
 
-    if (Other.CallStack.size() != CallStack.size())
+    if (Other.CSId != CSId)
       return false;
-
-    for (size_t J = 0; J < Other.CallStack.size(); J++) {
-      if (Other.CallStack[J] != CallStack[J])
-        return false;
-    }
     return true;
   }
 
@@ -357,6 +364,9 @@ struct IndexedMemProfRecord {
   // inline location list may include additional entries, users should pick
   // the last entry in the list with the same function GUID.
   llvm::SmallVector<llvm::SmallVector<FrameId>> CallSites;
+  // Conceptually the same as above.  We are going to keep both CallSites and
+  // CallSiteIds while we are transitioning from CallSites to CallSitesIds.
+  llvm::SmallVector<CallStackId> CallSiteIds;
 
   void clear() {
     AllocSites.clear();
@@ -370,17 +380,22 @@ struct IndexedMemProfRecord {
     CallSites.append(Other.CallSites);
   }
 
-  size_t serializedSize() const {
+  size_t serializedSize(IndexedVersion Version) const {
     size_t Result = sizeof(GlobalValue::GUID);
     for (const IndexedAllocationInfo &N : AllocSites)
-      Result += N.serializedSize();
+      Result += N.serializedSize(Version);
 
     // The number of callsites we have information for.
     Result += sizeof(uint64_t);
-    for (const auto &Frames : CallSites) {
-      // The number of frame ids to serialize.
-      Result += sizeof(uint64_t);
-      Result += Frames.size() * sizeof(FrameId);
+    if (Version <= Version1) {
+      for (const auto &Frames : CallSites) {
+        // The number of frame ids to serialize.
+        Result += sizeof(uint64_t);
+        Result += Frames.size() * sizeof(FrameId);
+      }
+    } else {
+      // The CallStackId
+      Result += CallSiteIds.size() * sizeof(CallStackId);
     }
     return Result;
   }
@@ -389,28 +404,25 @@ struct IndexedMemProfRecord {
     if (Other.AllocSites.size() != AllocSites.size())
       return false;
 
-    if (Other.CallSites.size() != CallSites.size())
-      return false;
-
     for (size_t I = 0; I < AllocSites.size(); I++) {
       if (AllocSites[I] != Other.AllocSites[I])
         return false;
     }
 
-    for (size_t I = 0; I < CallSites.size(); I++) {
-      if (CallSites[I] != Other.CallSites[I])
-        return false;
-    }
+    if (Other.CallSiteIds != CallSiteIds)
+      return false;
     return true;
   }
 
   // Serializes the memprof records in \p Records to the ostream \p OS based
   // on the schema provided in \p Schema.
-  void serialize(const MemProfSchema &Schema, raw_ostream &OS);
+  void serialize(const MemProfSchema &Schema, raw_ostream &OS,
+                 IndexedVersion Version);
 
   // Deserializes memprof records from the Buffer.
   static IndexedMemProfRecord deserialize(const MemProfSchema &Schema,
-                                          const unsigned char *Buffer);
+                                          const unsigned char *Buffer,
+                                          IndexedVersion Version);
 
   // Returns the GUID for the function name after canonicalization. For
   // memprof, we remove any .llvm suffix added by LTO. MemProfRecords are
@@ -507,7 +519,7 @@ class RecordLookupTrait {
 
   data_type ReadData(uint64_t K, const unsigned char *D,
                      offset_type /*Unused*/) {
-    Record = IndexedMemProfRecord::deserialize(Schema, D);
+    Record = IndexedMemProfRecord::deserialize(Schema, D, Version1);
     return Record;
   }
 
@@ -546,7 +558,7 @@ class RecordWriterTrait {
     endian::Writer LE(Out, llvm::endianness::little);
     offset_type N = sizeof(K);
     LE.write<offset_type>(N);
-    offset_type M = V.serializedSize();
+    offset_type M = V.serializedSize(Version1);
     LE.write<offset_type>(M);
     return std::make_pair(N, M);
   }
@@ -560,7 +572,7 @@ class RecordWriterTrait {
   void EmitData(raw_ostream &Out, key_type_ref /*Unused*/, data_type_ref V,
                 offset_type /*Unused*/) {
     assert(Schema != nullptr && "MemProf schema is not initialized!");
-    V.serialize(*Schema, Out);
+    V.serialize(*Schema, Out, Version1);
     // Clear the IndexedMemProfRecord which results in clearing/freeing its
     // vectors of allocs and callsites. This is owned by the associated on-disk
     // hash table, but unused after this point. See also the comment added to
diff --git a/llvm/lib/ProfileData/MemProf.cpp b/llvm/lib/ProfileData/MemProf.cpp
index 6c419811d59e2f..e5c5fb58c25ab9 100644
--- a/llvm/lib/ProfileData/MemProf.cpp
+++ b/llvm/lib/ProfileData/MemProf.cpp
@@ -12,31 +12,42 @@ namespace llvm {
 namespace memprof {
 
 void IndexedMemProfRecord::serialize(const MemProfSchema &Schema,
-                                     raw_ostream &OS) {
+                                     raw_ostream &OS, IndexedVersion Version) {
   using namespace support;
 
   endian::Writer LE(OS, llvm::endianness::little);
 
   LE.write<uint64_t>(AllocSites.size());
   for (const IndexedAllocationInfo &N : AllocSites) {
-    LE.write<uint64_t>(N.CallStack.size());
-    for (const FrameId &Id : N.CallStack)
-      LE.write<FrameId>(Id);
+    if (Version <= Version1) {
+      LE.write<uint64_t>(N.CallStack.size());
+      for (const FrameId &Id : N.CallStack)
+        LE.write<FrameId>(Id);
+    } else {
+      LE.write<CallStackId>(N.CSId);
+    }
     N.Info.serialize(Schema, OS);
   }
 
   // Related contexts.
-  LE.write<uint64_t>(CallSites.size());
-  for (const auto &Frames : CallSites) {
-    LE.write<uint64_t>(Frames.size());
-    for (const FrameId &Id : Frames)
-      LE.write<FrameId>(Id);
+  if (Version <= Version1) {
+    LE.write<uint64_t>(CallSites.size());
+    for (const auto &Frames : CallSites) {
+      LE.write<uint64_t>(Frames.size());
+      for (const FrameId &Id : Frames)
+        LE.write<FrameId>(Id);
+    }
+  } else {
+    LE.write<uint64_t>(CallSiteIds.size());
+    for (const auto &CSId : CallSiteIds)
+      LE.write<CallStackId>(CSId);
   }
 }
 
 IndexedMemProfRecord
 IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
-                                  const unsigned char *Ptr) {
+                                  const unsigned char *Ptr,
+                                  IndexedVersion Version) {
   using namespace support;
 
   IndexedMemProfRecord Record;
@@ -46,14 +57,20 @@ IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
       endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
   for (uint64_t I = 0; I < NumNodes; I++) {
     IndexedAllocationInfo Node;
-    const uint64_t NumFrames =
-        endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
-    for (uint64_t J = 0; J < NumFrames; J++) {
-      const FrameId Id =
-          endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
-      Node.CallStack.push_back(Id);
+    if (Version <= Version1) {
+      const uint64_t NumFrames =
+          endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
+      for (uint64_t J = 0; J < NumFrames; J++) {
+        const FrameId Id =
+            endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
+        Node.CallStack.push_back(Id);
+      }
+      Node.CSId = hashCallStack(Node.CallStack);
+    } else {
+      Node.CSId =
+          endian::readNext<CallStackId, llvm::endianness::little, unaligned>(
+              Ptr);
     }
-    Node.CSId = hashCallStack(Node.CallStack);
     Node.Info.deserialize(Schema, Ptr);
     Ptr += PortableMemInfoBlock::serializedSize();
     Record.AllocSites.push_back(Node);
@@ -63,16 +80,24 @@ IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
   const uint64_t NumCtxs =
       endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
   for (uint64_t J = 0; J < NumCtxs; J++) {
-    const uint64_t NumFrames =
-        endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
-    llvm::SmallVector<FrameId> Frames;
-    Frames.reserve(NumFrames);
-    for (uint64_t K = 0; K < NumFrames; K++) {
-      const FrameId Id =
-          endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
-      Frames.push_back(Id);
+    if (Version <= Version1) {
+      const uint64_t NumFrames =
+          endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
+      llvm::SmallVector<FrameId> Frames;
+      Frames.reserve(NumFrames);
+      for (uint64_t K = 0; K < NumFrames; K++) {
+        const FrameId Id =
+            endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
+        Frames.push_back(Id);
+      }
+      Record.CallSites.push_back(Frames);
+      Record.CallSiteIds.push_back(hashCallStack(Frames));
+    } else {
+      CallStackId CSId =
+          endian::readNext<CallStackId, llvm::endianness::little, unaligned>(
+              Ptr);
+      Record.CallSiteIds.push_back(CSId);
     }
-    Record.CallSites.push_back(Frames);
   }
 
   return Record;
diff --git a/llvm/unittests/ProfileData/MemProfTest.cpp b/llvm/unittests/ProfileData/MemProfTest.cpp
index 1cca44e9b03707..f1aa6f37aa399f 100644
--- a/llvm/unittests/ProfileData/MemProfTest.cpp
+++ b/llvm/unittests/ProfileData/MemProfTest.cpp
@@ -265,7 +265,9 @@ TEST(MemProf, PortableWrapper) {
   EXPECT_EQ(3UL, ReadBlock.getAllocCpuId());
 }
 
-TEST(MemProf, RecordSerializationRoundTrip) {
+// Version0 and Version1 serialize IndexedMemProfRecord in the same format, so
+// we share one test.
+TEST(MemProf, RecordSerializationRoundTripVersion0And1) {
   const MemProfSchema Schema = getFullSchema();
 
   MemInfoBlock Info(/*size=*/16, /*access_count=*/7, /*alloc_timestamp=*/1000,
@@ -284,14 +286,47 @@ TEST(MemProf, RecordSerializationRoundTrip) {
                                    Info);
   }
   Record.CallSites.assign(CallSites);
+  for (const auto &CS : CallSites)
+    Record.CallSiteIds.push_back(llvm::memprof::hashCallStack(CS));
 
   std::string Buffer;
   llvm::raw_string_ostream OS(Buffer);
-  Record.serialize(Schema, OS);
+  Record.serialize(Schema, OS, llvm::memprof::Version0);
   OS.flush();
 
   const IndexedMemProfRecord GotRecord = IndexedMemProfRecord::deserialize(
-      Schema, reinterpret_cast<const unsigned char *>(Buffer.data()));
+      Schema, reinterpret_cast<const unsigned char *>(Buffer.data()),
+      llvm::memprof::Version0);
+
+  EXPECT_EQ(Record, GotRecord);
+}
+
+TEST(MemProf, RecordSerializationRoundTripVerion2) {
+  const MemProfSchema Schema = getFullSchema();
+
+  MemInfoBlock Info(/*size=*/16, /*access_count=*/7, /*alloc_timestamp=*/1000,
+                    /*dealloc_timestamp=*/2000, /*alloc_cpu=*/3,
+                    /*dealloc_cpu=*/4);
+
+  llvm::SmallVector<llvm::memprof::CallStackId> CallStackIds = {0x123, 0x456};
+
+  llvm::SmallVector<llvm::memprof::CallStackId> CallSiteIds = {0x333, 0x444};
+
+  IndexedMemProfRecord Record;
+  for (const auto &CSId : CallStackIds) {
+    // Use the same info block for both allocation sites.
+    Record.AllocSites.emplace_back(llvm::SmallVector<FrameId>(), CSId, Info);
+  }
+  Record.CallSiteIds.assign(CallSiteIds);
+
+  std::string Buffer;
+  llvm::raw_string_ostream OS(Buffer);
+  Record.serialize(Schema, OS, llvm::memprof::Version2);
+  OS.flush();
+
+  const IndexedMemProfRecord GotRecord = IndexedMemProfRecord::deserialize(
+      Schema, reinterpret_cast<const unsigned char *>(Buffer.data()),
+      llvm::memprof::Version2);
 
   EXPECT_EQ(Record, GotRecord);
 }

kazutakahirata · 2024-04-03T19:29:32Z

I believe I've addressed all the comments. Please take a look at the latest iteration. Thanks!

llvm/include/llvm/ProfileData/MemProf.h

snehasish

I wonder if it makes sense to have a higher level distinction of versioning instead of pushing the condition down into each method. I'm trying to think about how difficult it will be in the future to evolve to the next version. For example, say we have serializedSizeV1() as the old implementation and serializedSizeV2() as the new implementation and serializedSize() calls one or the other based on the version. Pushing it down one level makes it easy to just delete the old function when we introduce v3. In this case the difference is not large since the logic varies only a little but it may make sense for more complex code. What do you both think?

kazutakahirata · 2024-04-03T20:51:55Z

I wonder if it makes sense to have a higher level distinction of versioning instead of pushing the condition down into each method. I'm trying to think about how difficult it will be in the future to evolve to the next version. For example,

Say we have serializedSizeV1() as the old implementation and serializedSizeV2() as the new implementation and serializedSize() calls one or the other based on the version. Pushing it down one level makes it easy to just delete the old function when we introduce v3. In this case the difference is not large since the logic varies only a little but it may make sense for more complex code. What do you both think?

Do you mean something like this?

  size_t serializedSize(IndexedVersion Version) const {
    switch (Version) {
    case Version0:
    case Version1:
      return serializedSizeV0();  // Implemented elsewhere.
    case Version2:
      return serializedSizeV2();  // Implemented elsewhere.
    }
    llvm_unreachable("unsupported MemProf version");
  }

If so, I like the style. As you point out, I like the ability to easily delete an old version. If there is a lot of code duplication between two versions, we can always factor that out to a helper function, so the code duplication shouldn't really be a big concern.

kazutakahirata · 2024-04-03T22:36:39Z

I've split serialize and its friends into separate functions, one for each version. Please take a look. Thanks!

llvm/include/llvm/ProfileData/MemProf.h

kazutakahirata · 2024-04-04T00:57:37Z

I've moved version-specific implementation to MemProf.cpp. Please take a look. Thanks!

kazutakahirata · 2024-04-04T01:51:13Z

I've included changes to RecordLookupTrait and RecordWriterTrait in the latest iteration. Please take a look. Thanks!

teresajohnson

lgtm but I do have one question below.

llvm/include/llvm/ProfileData/MemProf.h

kazutakahirata requested review from snehasish and teresajohnson April 3, 2024 06:15

teresajohnson reviewed Apr 3, 2024

View reviewed changes

Removed CallStack-based comparison.

a3f96b5

Removed CallSites-based comparison. Made Version a required parameter in serialize/deserialize/serializedSize.

llvmbot added the PGO Profile Guided Optimizations label Apr 3, 2024

Fix a typo.

803c29f

teresajohnson reviewed Apr 3, 2024

View reviewed changes

llvm/include/llvm/ProfileData/MemProf.h Outdated Show resolved Hide resolved

snehasish reviewed Apr 3, 2024

View reviewed changes

Use separate functions for different versions.

f07603d

snehasish reviewed Apr 3, 2024

View reviewed changes

llvm/include/llvm/ProfileData/MemProf.h Outdated Show resolved Hide resolved

Move version-specific serializers to .cpp files.

6b636d3

Include changes to RecordLookupTrait and RecordWriterTrait.

28cbed7

Fix a typo.

dcda563

teresajohnson approved these changes Apr 4, 2024

View reviewed changes

llvm/include/llvm/ProfileData/MemProf.h Show resolved Hide resolved

kazutakahirata merged commit d89914f into llvm:main Apr 4, 2024

kazutakahirata deleted the pr_memprof_version2 branch April 4, 2024 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[memprof] Add Version2 of IndexedMemProfRecord serialization #87455

[memprof] Add Version2 of IndexedMemProfRecord serialization #87455

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

teresajohnson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

llvmbot commented Apr 3, 2024

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

Uh oh!

snehasish left a comment •

edited

Loading

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

Uh oh!

kazutakahirata commented Apr 4, 2024

Uh oh!

kazutakahirata commented Apr 4, 2024

Uh oh!

teresajohnson left a comment

Uh oh!

Uh oh!

Uh oh!

[memprof] Add Version2 of IndexedMemProfRecord serialization #87455

[memprof] Add Version2 of IndexedMemProfRecord serialization #87455

Uh oh!

Conversation

kazutakahirata commented Apr 3, 2024

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

llvmbot commented Apr 3, 2024

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

Uh oh!

snehasish left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

kazutakahirata commented Apr 3, 2024

Uh oh!

Uh oh!

kazutakahirata commented Apr 4, 2024

Uh oh!

kazutakahirata commented Apr 4, 2024

Uh oh!

teresajohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

snehasish left a comment •

edited

Loading