Skip to content

[C++20][Modules] Fix non-determinism in serialized AST #110131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dmpolukhin
Copy link
Contributor

Summary:
#109167 serializes FunctionToLambdasMap in the order of pointers in DenseMap. It gives different order with different memory layouts. Fix this issue by using LocalDeclID instead of pointers.

Test Plan: check-clang

Summary:
llvm#109167 serialized
FunctionToLambdasMap in the order of pointers in DenseMap. It gives
different order with different memeory leayout. Fix this issue by using
LocalDeclID instead of pointers.

Test Plan: check-clang
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Sep 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2024

@llvm/pr-subscribers-clang-modules

Author: Dmitry Polukhin (dmpolukhin)

Changes

Summary:
#109167 serializes FunctionToLambdasMap in the order of pointers in DenseMap. It gives different order with different memory layouts. Fix this issue by using LocalDeclID instead of pointers.

Test Plan: check-clang


Full diff: https://github.com/llvm/llvm-project/pull/110131.diff

4 Files Affected:

  • (modified) clang/include/clang/AST/DeclID.h (+22)
  • (modified) clang/include/clang/Serialization/ASTWriter.h (+3-3)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+1-2)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+2-1)
diff --git a/clang/include/clang/AST/DeclID.h b/clang/include/clang/AST/DeclID.h
index f4607e42c4be38..49964b43c7d1d8 100644
--- a/clang/include/clang/AST/DeclID.h
+++ b/clang/include/clang/AST/DeclID.h
@@ -189,6 +189,7 @@ class LocalDeclID : public DeclIDBase {
   // Every Decl ID is a local decl ID to the module being writing in ASTWriter.
   friend class ASTWriter;
   friend class GlobalDeclID;
+  friend struct llvm::DenseMapInfo<clang::LocalDeclID>;
 
 public:
   LocalDeclID() : Base() {}
@@ -267,6 +268,27 @@ template <> struct DenseMapInfo<clang::GlobalDeclID> {
   }
 };
 
+template <> struct DenseMapInfo<clang::LocalDeclID> {
+  using LocalDeclID = clang::LocalDeclID;
+  using DeclID = LocalDeclID::DeclID;
+
+  static LocalDeclID getEmptyKey() {
+    return LocalDeclID(DenseMapInfo<DeclID>::getEmptyKey());
+  }
+
+  static LocalDeclID getTombstoneKey() {
+    return LocalDeclID(DenseMapInfo<DeclID>::getTombstoneKey());
+  }
+
+  static unsigned getHashValue(const LocalDeclID &Key) {
+    return DenseMapInfo<DeclID>::getHashValue(Key.getRawValue());
+  }
+
+  static bool isEqual(const LocalDeclID &L, const LocalDeclID &R) {
+    return L == R;
+  }
+};
+
 } // namespace llvm
 
 #endif
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index 760866fd9de938..e21d41c8673143 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -233,13 +233,13 @@ class ASTWriter : public ASTDeserializationListener,
   /// instead of comparing the result of `getDeclID()` or `GetDeclRef()`.
   llvm::SmallPtrSet<const Decl *, 32> PredefinedDecls;
 
-  /// Mapping from FunctionDecl to the list of lambda IDs inside the function.
+  /// Mapping from FunctionDecl ID to the list of lambda IDs inside the
+  /// function.
   ///
   /// These lambdas have to be loaded right after the function they belong to.
   /// In order to have canonical declaration for lambda class from the same
   /// module as enclosing function during deserialization.
-  llvm::DenseMap<const Decl *, SmallVector<LocalDeclID, 4>>
-      FunctionToLambdasMap;
+  llvm::DenseMap<LocalDeclID, SmallVector<LocalDeclID, 4>> FunctionToLambdasMap;
 
   /// Offset of each declaration in the bitstream, indexed by
   /// the declaration's ID.
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 223727366f61bd..7a40c5c65d39d9 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -5713,8 +5713,7 @@ void ASTWriter::WriteDeclAndTypes(ASTContext &Context) {
     // efficent becuase it allows lazy deserialization.
     RecordData FunctionToLambdasMapRecord;
     for (const auto &Pair : FunctionToLambdasMap) {
-      FunctionToLambdasMapRecord.push_back(
-          GetDeclRef(Pair.first).getRawValue());
+      FunctionToLambdasMapRecord.push_back(Pair.first.getRawValue());
       FunctionToLambdasMapRecord.push_back(Pair.second.size());
       for (const auto &Lambda : Pair.second)
         FunctionToLambdasMapRecord.push_back(Lambda.getRawValue());
diff --git a/clang/lib/Serialization/ASTWriterDecl.cpp b/clang/lib/Serialization/ASTWriterDecl.cpp
index 50c090b195d619..b9222a1b33fd74 100644
--- a/clang/lib/Serialization/ASTWriterDecl.cpp
+++ b/clang/lib/Serialization/ASTWriterDecl.cpp
@@ -1524,7 +1524,8 @@ void ASTDeclWriter::VisitCXXRecordDecl(CXXRecordDecl *D) {
     // For lambdas inside canonical FunctionDecl remember the mapping.
     if (auto FD = llvm::dyn_cast_or_null<FunctionDecl>(D->getDeclContext());
         FD && FD->isCanonicalDecl()) {
-      Writer.FunctionToLambdasMap[FD].push_back(Writer.GetDeclRef(D));
+      Writer.FunctionToLambdasMap[Writer.GetDeclRef(FD)].push_back(
+          Writer.GetDeclRef(D));
     }
   } else {
     Record.push_back(CXXRecNotTemplate);

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2024

@llvm/pr-subscribers-clang

Author: Dmitry Polukhin (dmpolukhin)

Changes

Summary:
#109167 serializes FunctionToLambdasMap in the order of pointers in DenseMap. It gives different order with different memory layouts. Fix this issue by using LocalDeclID instead of pointers.

Test Plan: check-clang


Full diff: https://github.com/llvm/llvm-project/pull/110131.diff

4 Files Affected:

  • (modified) clang/include/clang/AST/DeclID.h (+22)
  • (modified) clang/include/clang/Serialization/ASTWriter.h (+3-3)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+1-2)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+2-1)
diff --git a/clang/include/clang/AST/DeclID.h b/clang/include/clang/AST/DeclID.h
index f4607e42c4be38..49964b43c7d1d8 100644
--- a/clang/include/clang/AST/DeclID.h
+++ b/clang/include/clang/AST/DeclID.h
@@ -189,6 +189,7 @@ class LocalDeclID : public DeclIDBase {
   // Every Decl ID is a local decl ID to the module being writing in ASTWriter.
   friend class ASTWriter;
   friend class GlobalDeclID;
+  friend struct llvm::DenseMapInfo<clang::LocalDeclID>;
 
 public:
   LocalDeclID() : Base() {}
@@ -267,6 +268,27 @@ template <> struct DenseMapInfo<clang::GlobalDeclID> {
   }
 };
 
+template <> struct DenseMapInfo<clang::LocalDeclID> {
+  using LocalDeclID = clang::LocalDeclID;
+  using DeclID = LocalDeclID::DeclID;
+
+  static LocalDeclID getEmptyKey() {
+    return LocalDeclID(DenseMapInfo<DeclID>::getEmptyKey());
+  }
+
+  static LocalDeclID getTombstoneKey() {
+    return LocalDeclID(DenseMapInfo<DeclID>::getTombstoneKey());
+  }
+
+  static unsigned getHashValue(const LocalDeclID &Key) {
+    return DenseMapInfo<DeclID>::getHashValue(Key.getRawValue());
+  }
+
+  static bool isEqual(const LocalDeclID &L, const LocalDeclID &R) {
+    return L == R;
+  }
+};
+
 } // namespace llvm
 
 #endif
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index 760866fd9de938..e21d41c8673143 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -233,13 +233,13 @@ class ASTWriter : public ASTDeserializationListener,
   /// instead of comparing the result of `getDeclID()` or `GetDeclRef()`.
   llvm::SmallPtrSet<const Decl *, 32> PredefinedDecls;
 
-  /// Mapping from FunctionDecl to the list of lambda IDs inside the function.
+  /// Mapping from FunctionDecl ID to the list of lambda IDs inside the
+  /// function.
   ///
   /// These lambdas have to be loaded right after the function they belong to.
   /// In order to have canonical declaration for lambda class from the same
   /// module as enclosing function during deserialization.
-  llvm::DenseMap<const Decl *, SmallVector<LocalDeclID, 4>>
-      FunctionToLambdasMap;
+  llvm::DenseMap<LocalDeclID, SmallVector<LocalDeclID, 4>> FunctionToLambdasMap;
 
   /// Offset of each declaration in the bitstream, indexed by
   /// the declaration's ID.
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 223727366f61bd..7a40c5c65d39d9 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -5713,8 +5713,7 @@ void ASTWriter::WriteDeclAndTypes(ASTContext &Context) {
     // efficent becuase it allows lazy deserialization.
     RecordData FunctionToLambdasMapRecord;
     for (const auto &Pair : FunctionToLambdasMap) {
-      FunctionToLambdasMapRecord.push_back(
-          GetDeclRef(Pair.first).getRawValue());
+      FunctionToLambdasMapRecord.push_back(Pair.first.getRawValue());
       FunctionToLambdasMapRecord.push_back(Pair.second.size());
       for (const auto &Lambda : Pair.second)
         FunctionToLambdasMapRecord.push_back(Lambda.getRawValue());
diff --git a/clang/lib/Serialization/ASTWriterDecl.cpp b/clang/lib/Serialization/ASTWriterDecl.cpp
index 50c090b195d619..b9222a1b33fd74 100644
--- a/clang/lib/Serialization/ASTWriterDecl.cpp
+++ b/clang/lib/Serialization/ASTWriterDecl.cpp
@@ -1524,7 +1524,8 @@ void ASTDeclWriter::VisitCXXRecordDecl(CXXRecordDecl *D) {
     // For lambdas inside canonical FunctionDecl remember the mapping.
     if (auto FD = llvm::dyn_cast_or_null<FunctionDecl>(D->getDeclContext());
         FD && FD->isCanonicalDecl()) {
-      Writer.FunctionToLambdasMap[FD].push_back(Writer.GetDeclRef(D));
+      Writer.FunctionToLambdasMap[Writer.GetDeclRef(FD)].push_back(
+          Writer.GetDeclRef(D));
     }
   } else {
     Record.push_back(CXXRecNotTemplate);

@dmpolukhin
Copy link
Contributor Author

@eaeltsin because I don't have a reproducer for the issue, could you please check that fix helps on your reproducer?

@eaeltsin
Copy link
Contributor

I cannot reproduce the non-determinism on top of this patch any more. Thanks for the fix!

Please submit at your earliest convenience.

Copy link
Contributor

@ivanmurashko ivanmurashko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eaeltsin confirmed that the commit fixes an issue introduced earlier by #109167. The fix LGTM

@@ -5713,8 +5713,7 @@ void ASTWriter::WriteDeclAndTypes(ASTContext &Context) {
// efficent becuase it allows lazy deserialization.
RecordData FunctionToLambdasMapRecord;
for (const auto &Pair : FunctionToLambdasMap) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now an iteration of a DenseMap of non-pointers, which I suppose is arbitrary, but determinstic. Are you sure it isn't preferable to write these in the order in which they were added to the map? That's the behavior you'd get from a MapVector, which seems better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnk yes, the order is not important because we read the whole map into memory anyway. So IMHO the only thing we need to do here is to make sure that we order doesn't depend on pointer values.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dmpolukhin dmpolukhin merged commit 9a36168 into llvm:main Sep 27, 2024
12 checks passed
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Sep 27, 2024
Summary:
llvm#109167 serializes
FunctionToLambdasMap in the order of pointers in DenseMap. It gives
different order with different memory layouts. Fix this issue by using
LocalDeclID instead of pointers.

Test Plan: check-clang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants