Skip to content

[memprof] Add Version2 of IndexedMemProfRecord serialization #87455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 4, 2024

Conversation

kazutakahirata
Copy link
Contributor

I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites. We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId. The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.

As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A
subsequent patch will add Version2 support to llvm-profdata.

The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId. That is:

const IndexedAllocationInfo &N = ...;
...
LE.write<uint64_t>(N.CallStack.size());
for (const FrameId &Id : N.CallStack)
LE.write(Id);

becomes:

LE.write(N.CSId);

I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites.  We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId.  The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.

As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2.  A
subsequent patch will add Version2 support to llvm-profdata.

The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId.  That is:

  const IndexedAllocationInfo &N = ...;
  ...
  LE.write<uint64_t>(N.CallStack.size());
  for (const FrameId &Id : N.CallStack)
    LE.write<FrameId>(Id);

becomes:

  LE.write<CallStackId>(N.CSId);
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A subsequent patch will add Version2 support to llvm-profdata.

Do you generally mean the reader / writer?

@kazutakahirata
Copy link
Contributor Author

A subsequent patch will add Version2 support to llvm-profdata.

Do you generally mean the reader / writer?

Yes.

Removed CallSites-based comparison.

Made Version a required parameter in
serialize/deserialize/serializedSize.
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Apr 3, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 3, 2024

@llvm/pr-subscribers-pgo

Author: Kazu Hirata (kazutakahirata)

Changes

I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites. We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId. The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.

As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A
subsequent patch will add Version2 support to llvm-profdata.

The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId. That is:

const IndexedAllocationInfo &N = ...;
...
LE.write<uint64_t>(N.CallStack.size());
for (const FrameId &Id : N.CallStack)
LE.write<FrameId>(Id);

becomes:

LE.write<CallStackId>(N.CSId);


Full diff: https://github.com/llvm/llvm-project/pull/87455.diff

3 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/MemProf.h (+40-28)
  • (modified) llvm/lib/ProfileData/MemProf.cpp (+51-26)
  • (modified) llvm/unittests/ProfileData/MemProfTest.cpp (+38-3)
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index ff00900a1466a8..3e93f086cd0f20 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -22,6 +22,8 @@ enum IndexedVersion : uint64_t {
   Version0 = 0,
   // Version 1: Added a version field to the header.
   Version1 = 1,
+  // Version 2: Added a call stack table.  Under development.
+  Version2 = 2,
 };
 
 constexpr uint64_t MinimumSupportedVersion = Version0;
@@ -289,23 +291,28 @@ struct IndexedAllocationInfo {
       : CallStack(CS.begin(), CS.end()), CSId(CSId), Info(MB) {}
 
   // Returns the size in bytes when this allocation info struct is serialized.
-  size_t serializedSize() const {
-    return sizeof(uint64_t) + // The number of frames to serialize.
-           sizeof(FrameId) * CallStack.size() +    // The callstack frame ids.
-           PortableMemInfoBlock::serializedSize(); // The size of the payload.
+  size_t serializedSize(IndexedVersion Version) const {
+    size_t Size = 0;
+    if (Version <= Version1) {
+      // The number of frames to serialize.
+      Size += sizeof(uint64_t);
+      // The callstack frame ids.
+      Size += sizeof(FrameId) * CallStack.size();
+    } else {
+      // The CallStackId
+      Size += sizeof(CallStackId);
+    }
+    // The size of the payload.
+    Size += PortableMemInfoBlock::serializedSize();
+    return Size;
   }
 
   bool operator==(const IndexedAllocationInfo &Other) const {
     if (Other.Info != Info)
       return false;
 
-    if (Other.CallStack.size() != CallStack.size())
+    if (Other.CSId != CSId)
       return false;
-
-    for (size_t J = 0; J < Other.CallStack.size(); J++) {
-      if (Other.CallStack[J] != CallStack[J])
-        return false;
-    }
     return true;
   }
 
@@ -357,6 +364,9 @@ struct IndexedMemProfRecord {
   // inline location list may include additional entries, users should pick
   // the last entry in the list with the same function GUID.
   llvm::SmallVector<llvm::SmallVector<FrameId>> CallSites;
+  // Conceptually the same as above.  We are going to keep both CallSites and
+  // CallSiteIds while we are transitioning from CallSites to CallSitesIds.
+  llvm::SmallVector<CallStackId> CallSiteIds;
 
   void clear() {
     AllocSites.clear();
@@ -370,17 +380,22 @@ struct IndexedMemProfRecord {
     CallSites.append(Other.CallSites);
   }
 
-  size_t serializedSize() const {
+  size_t serializedSize(IndexedVersion Version) const {
     size_t Result = sizeof(GlobalValue::GUID);
     for (const IndexedAllocationInfo &N : AllocSites)
-      Result += N.serializedSize();
+      Result += N.serializedSize(Version);
 
     // The number of callsites we have information for.
     Result += sizeof(uint64_t);
-    for (const auto &Frames : CallSites) {
-      // The number of frame ids to serialize.
-      Result += sizeof(uint64_t);
-      Result += Frames.size() * sizeof(FrameId);
+    if (Version <= Version1) {
+      for (const auto &Frames : CallSites) {
+        // The number of frame ids to serialize.
+        Result += sizeof(uint64_t);
+        Result += Frames.size() * sizeof(FrameId);
+      }
+    } else {
+      // The CallStackId
+      Result += CallSiteIds.size() * sizeof(CallStackId);
     }
     return Result;
   }
@@ -389,28 +404,25 @@ struct IndexedMemProfRecord {
     if (Other.AllocSites.size() != AllocSites.size())
       return false;
 
-    if (Other.CallSites.size() != CallSites.size())
-      return false;
-
     for (size_t I = 0; I < AllocSites.size(); I++) {
       if (AllocSites[I] != Other.AllocSites[I])
         return false;
     }
 
-    for (size_t I = 0; I < CallSites.size(); I++) {
-      if (CallSites[I] != Other.CallSites[I])
-        return false;
-    }
+    if (Other.CallSiteIds != CallSiteIds)
+      return false;
     return true;
   }
 
   // Serializes the memprof records in \p Records to the ostream \p OS based
   // on the schema provided in \p Schema.
-  void serialize(const MemProfSchema &Schema, raw_ostream &OS);
+  void serialize(const MemProfSchema &Schema, raw_ostream &OS,
+                 IndexedVersion Version);
 
   // Deserializes memprof records from the Buffer.
   static IndexedMemProfRecord deserialize(const MemProfSchema &Schema,
-                                          const unsigned char *Buffer);
+                                          const unsigned char *Buffer,
+                                          IndexedVersion Version);
 
   // Returns the GUID for the function name after canonicalization. For
   // memprof, we remove any .llvm suffix added by LTO. MemProfRecords are
@@ -507,7 +519,7 @@ class RecordLookupTrait {
 
   data_type ReadData(uint64_t K, const unsigned char *D,
                      offset_type /*Unused*/) {
-    Record = IndexedMemProfRecord::deserialize(Schema, D);
+    Record = IndexedMemProfRecord::deserialize(Schema, D, Version1);
     return Record;
   }
 
@@ -546,7 +558,7 @@ class RecordWriterTrait {
     endian::Writer LE(Out, llvm::endianness::little);
     offset_type N = sizeof(K);
     LE.write<offset_type>(N);
-    offset_type M = V.serializedSize();
+    offset_type M = V.serializedSize(Version1);
     LE.write<offset_type>(M);
     return std::make_pair(N, M);
   }
@@ -560,7 +572,7 @@ class RecordWriterTrait {
   void EmitData(raw_ostream &Out, key_type_ref /*Unused*/, data_type_ref V,
                 offset_type /*Unused*/) {
     assert(Schema != nullptr && "MemProf schema is not initialized!");
-    V.serialize(*Schema, Out);
+    V.serialize(*Schema, Out, Version1);
     // Clear the IndexedMemProfRecord which results in clearing/freeing its
     // vectors of allocs and callsites. This is owned by the associated on-disk
     // hash table, but unused after this point. See also the comment added to
diff --git a/llvm/lib/ProfileData/MemProf.cpp b/llvm/lib/ProfileData/MemProf.cpp
index 6c419811d59e2f..e5c5fb58c25ab9 100644
--- a/llvm/lib/ProfileData/MemProf.cpp
+++ b/llvm/lib/ProfileData/MemProf.cpp
@@ -12,31 +12,42 @@ namespace llvm {
 namespace memprof {
 
 void IndexedMemProfRecord::serialize(const MemProfSchema &Schema,
-                                     raw_ostream &OS) {
+                                     raw_ostream &OS, IndexedVersion Version) {
   using namespace support;
 
   endian::Writer LE(OS, llvm::endianness::little);
 
   LE.write<uint64_t>(AllocSites.size());
   for (const IndexedAllocationInfo &N : AllocSites) {
-    LE.write<uint64_t>(N.CallStack.size());
-    for (const FrameId &Id : N.CallStack)
-      LE.write<FrameId>(Id);
+    if (Version <= Version1) {
+      LE.write<uint64_t>(N.CallStack.size());
+      for (const FrameId &Id : N.CallStack)
+        LE.write<FrameId>(Id);
+    } else {
+      LE.write<CallStackId>(N.CSId);
+    }
     N.Info.serialize(Schema, OS);
   }
 
   // Related contexts.
-  LE.write<uint64_t>(CallSites.size());
-  for (const auto &Frames : CallSites) {
-    LE.write<uint64_t>(Frames.size());
-    for (const FrameId &Id : Frames)
-      LE.write<FrameId>(Id);
+  if (Version <= Version1) {
+    LE.write<uint64_t>(CallSites.size());
+    for (const auto &Frames : CallSites) {
+      LE.write<uint64_t>(Frames.size());
+      for (const FrameId &Id : Frames)
+        LE.write<FrameId>(Id);
+    }
+  } else {
+    LE.write<uint64_t>(CallSiteIds.size());
+    for (const auto &CSId : CallSiteIds)
+      LE.write<CallStackId>(CSId);
   }
 }
 
 IndexedMemProfRecord
 IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
-                                  const unsigned char *Ptr) {
+                                  const unsigned char *Ptr,
+                                  IndexedVersion Version) {
   using namespace support;
 
   IndexedMemProfRecord Record;
@@ -46,14 +57,20 @@ IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
       endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
   for (uint64_t I = 0; I < NumNodes; I++) {
     IndexedAllocationInfo Node;
-    const uint64_t NumFrames =
-        endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
-    for (uint64_t J = 0; J < NumFrames; J++) {
-      const FrameId Id =
-          endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
-      Node.CallStack.push_back(Id);
+    if (Version <= Version1) {
+      const uint64_t NumFrames =
+          endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
+      for (uint64_t J = 0; J < NumFrames; J++) {
+        const FrameId Id =
+            endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
+        Node.CallStack.push_back(Id);
+      }
+      Node.CSId = hashCallStack(Node.CallStack);
+    } else {
+      Node.CSId =
+          endian::readNext<CallStackId, llvm::endianness::little, unaligned>(
+              Ptr);
     }
-    Node.CSId = hashCallStack(Node.CallStack);
     Node.Info.deserialize(Schema, Ptr);
     Ptr += PortableMemInfoBlock::serializedSize();
     Record.AllocSites.push_back(Node);
@@ -63,16 +80,24 @@ IndexedMemProfRecord::deserialize(const MemProfSchema &Schema,
   const uint64_t NumCtxs =
       endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
   for (uint64_t J = 0; J < NumCtxs; J++) {
-    const uint64_t NumFrames =
-        endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
-    llvm::SmallVector<FrameId> Frames;
-    Frames.reserve(NumFrames);
-    for (uint64_t K = 0; K < NumFrames; K++) {
-      const FrameId Id =
-          endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
-      Frames.push_back(Id);
+    if (Version <= Version1) {
+      const uint64_t NumFrames =
+          endian::readNext<uint64_t, llvm::endianness::little, unaligned>(Ptr);
+      llvm::SmallVector<FrameId> Frames;
+      Frames.reserve(NumFrames);
+      for (uint64_t K = 0; K < NumFrames; K++) {
+        const FrameId Id =
+            endian::readNext<FrameId, llvm::endianness::little, unaligned>(Ptr);
+        Frames.push_back(Id);
+      }
+      Record.CallSites.push_back(Frames);
+      Record.CallSiteIds.push_back(hashCallStack(Frames));
+    } else {
+      CallStackId CSId =
+          endian::readNext<CallStackId, llvm::endianness::little, unaligned>(
+              Ptr);
+      Record.CallSiteIds.push_back(CSId);
     }
-    Record.CallSites.push_back(Frames);
   }
 
   return Record;
diff --git a/llvm/unittests/ProfileData/MemProfTest.cpp b/llvm/unittests/ProfileData/MemProfTest.cpp
index 1cca44e9b03707..f1aa6f37aa399f 100644
--- a/llvm/unittests/ProfileData/MemProfTest.cpp
+++ b/llvm/unittests/ProfileData/MemProfTest.cpp
@@ -265,7 +265,9 @@ TEST(MemProf, PortableWrapper) {
   EXPECT_EQ(3UL, ReadBlock.getAllocCpuId());
 }
 
-TEST(MemProf, RecordSerializationRoundTrip) {
+// Version0 and Version1 serialize IndexedMemProfRecord in the same format, so
+// we share one test.
+TEST(MemProf, RecordSerializationRoundTripVersion0And1) {
   const MemProfSchema Schema = getFullSchema();
 
   MemInfoBlock Info(/*size=*/16, /*access_count=*/7, /*alloc_timestamp=*/1000,
@@ -284,14 +286,47 @@ TEST(MemProf, RecordSerializationRoundTrip) {
                                    Info);
   }
   Record.CallSites.assign(CallSites);
+  for (const auto &CS : CallSites)
+    Record.CallSiteIds.push_back(llvm::memprof::hashCallStack(CS));
 
   std::string Buffer;
   llvm::raw_string_ostream OS(Buffer);
-  Record.serialize(Schema, OS);
+  Record.serialize(Schema, OS, llvm::memprof::Version0);
   OS.flush();
 
   const IndexedMemProfRecord GotRecord = IndexedMemProfRecord::deserialize(
-      Schema, reinterpret_cast<const unsigned char *>(Buffer.data()));
+      Schema, reinterpret_cast<const unsigned char *>(Buffer.data()),
+      llvm::memprof::Version0);
+
+  EXPECT_EQ(Record, GotRecord);
+}
+
+TEST(MemProf, RecordSerializationRoundTripVerion2) {
+  const MemProfSchema Schema = getFullSchema();
+
+  MemInfoBlock Info(/*size=*/16, /*access_count=*/7, /*alloc_timestamp=*/1000,
+                    /*dealloc_timestamp=*/2000, /*alloc_cpu=*/3,
+                    /*dealloc_cpu=*/4);
+
+  llvm::SmallVector<llvm::memprof::CallStackId> CallStackIds = {0x123, 0x456};
+
+  llvm::SmallVector<llvm::memprof::CallStackId> CallSiteIds = {0x333, 0x444};
+
+  IndexedMemProfRecord Record;
+  for (const auto &CSId : CallStackIds) {
+    // Use the same info block for both allocation sites.
+    Record.AllocSites.emplace_back(llvm::SmallVector<FrameId>(), CSId, Info);
+  }
+  Record.CallSiteIds.assign(CallSiteIds);
+
+  std::string Buffer;
+  llvm::raw_string_ostream OS(Buffer);
+  Record.serialize(Schema, OS, llvm::memprof::Version2);
+  OS.flush();
+
+  const IndexedMemProfRecord GotRecord = IndexedMemProfRecord::deserialize(
+      Schema, reinterpret_cast<const unsigned char *>(Buffer.data()),
+      llvm::memprof::Version2);
 
   EXPECT_EQ(Record, GotRecord);
 }

@kazutakahirata
Copy link
Contributor Author

I believe I've addressed all the comments. Please take a look at the latest iteration. Thanks!

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to have a higher level distinction of versioning instead of pushing the condition down into each method. I'm trying to think about how difficult it will be in the future to evolve to the next version. For example, say we have serializedSizeV1() as the old implementation and serializedSizeV2() as the new implementation and serializedSize() calls one or the other based on the version. Pushing it down one level makes it easy to just delete the old function when we introduce v3. In this case the difference is not large since the logic varies only a little but it may make sense for more complex code. What do you both think?

@kazutakahirata
Copy link
Contributor Author

I wonder if it makes sense to have a higher level distinction of versioning instead of pushing the condition down into each method. I'm trying to think about how difficult it will be in the future to evolve to the next version. For example,

Say we have serializedSizeV1() as the old implementation and serializedSizeV2() as the new implementation and serializedSize() calls one or the other based on the version. Pushing it down one level makes it easy to just delete the old function when we introduce v3. In this case the difference is not large since the logic varies only a little but it may make sense for more complex code. What do you both think?

Do you mean something like this?

  size_t serializedSize(IndexedVersion Version) const {
    switch (Version) {
    case Version0:
    case Version1:
      return serializedSizeV0();  // Implemented elsewhere.
    case Version2:
      return serializedSizeV2();  // Implemented elsewhere.
    }
    llvm_unreachable("unsupported MemProf version");
  }

If so, I like the style. As you point out, I like the ability to easily delete an old version. If there is a lot of code duplication between two versions, we can always factor that out to a helper function, so the code duplication shouldn't really be a big concern.

@kazutakahirata
Copy link
Contributor Author

I've split serialize and its friends into separate functions, one for each version. Please take a look. Thanks!

@kazutakahirata
Copy link
Contributor Author

I've moved version-specific implementation to MemProf.cpp. Please take a look. Thanks!

@kazutakahirata
Copy link
Contributor Author

I've included changes to RecordLookupTrait and RecordWriterTrait in the latest iteration. Please take a look. Thanks!

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but I do have one question below.

@kazutakahirata kazutakahirata merged commit d89914f into llvm:main Apr 4, 2024
@kazutakahirata kazutakahirata deleted the pr_memprof_version2 branch April 4, 2024 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants