Skip to content

[analyzer] Allow overriding Unknown memspaces using a ProgramState trait #123003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 22, 2025

Conversation

Flandini
Copy link
Contributor

@Flandini Flandini commented Jan 15, 2025

In general, if we see an allocation, we associate the immutable memory
space with the constructed memory region.
This works fine if we see the allocation.
However, with symbolic regions it's not great because there we don't
know anything about their memory spaces, thus put them into the Unknown
space.

The unfortunate consequence is that once we learn about some aliasing
with this Symbolic Region, we can't change the memory space to the
deduced one.

In this patch, we open up the memory spaces as a trait, basically
allowing associating a better memory space with a memregion that
was created with the Unknown memory space.

As a side effect, this means that now queriing the memory space of a
region depends on the State, but many places in the analyzer, such as
the Store, doesn't have (and cannot have) access to the State by design.

This means that some uses must solely rely on the memspaces of the
region, but any other users should use the getter taking a State.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Jan 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 15, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Michael Flanders (Flandini)

Changes

This trait associates MemSpaceRegions with MemRegions to allow refining or changing information known about memory regions after they are created, since they are immutable. This commit is an intermediate step towards moving MemSpaceRegion out of the MemRegion class hierarchy and moving all notion of MemSpaceRegion to the trait. The intermediate step is that for now, only MemRegions with UnknownSpaceRegion are mapped in the trait and checked in checkers/core.

Tagging @steakhal for review.


Patch is 28.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123003.diff

13 Files Affected:

  • (added) clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h (+47)
  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp (+6-4)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp (+23-9)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp (+23-18)
  • (modified) clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp (+8-2)
  • (modified) clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp (+9-3)
  • (modified) clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp (+18-8)
  • (modified) clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp (+2-1)
  • (modified) clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp (+11-1)
  • (modified) clang/lib/StaticAnalyzer/Core/CMakeLists.txt (+1)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+10)
  • (added) clang/lib/StaticAnalyzer/Core/MemSpaces.cpp (+62)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h
new file mode 100644
index 00000000000000..178a6b60c1cb1a
--- /dev/null
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h
@@ -0,0 +1,47 @@
+//===-- MemSpaces.h -----------------------------------------------*- C++ -*--//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_STATICANALYZER_CHECKERS_MEMSPACES_H
+#define LLVM_CLANG_STATICANALYZER_CHECKERS_MEMSPACES_H
+
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"
+
+namespace clang {
+namespace ento {
+
+class MemRegion;
+class MemSpaceRegion;
+
+namespace memspace {
+
+[[nodiscard]] ProgramStateRef setMemSpaceTrait(ProgramStateRef State,
+                                               const MemRegion *MR,
+                                               const MemSpaceRegion *MS);
+
+[[nodiscard]] const MemSpaceRegion *getMemSpaceTrait(ProgramStateRef State,
+                                                     const MemRegion *MR);
+
+[[nodiscard]] bool hasMemSpaceTrait(ProgramStateRef State, const MemRegion *MR);
+
+template <typename FirstT, typename... RestT>
+[[nodiscard]] bool isMemSpaceOrTrait(ProgramStateRef State,
+                                     const MemRegion *MR) {
+  return isa<FirstT, RestT...>(MR->getMemorySpace()) ||
+         isa_and_nonnull<FirstT, RestT...>(getMemSpaceTrait(State, MR));
+}
+
+} // namespace memspace
+} // namespace ento
+} // namespace clang
+
+#endif // LLVM_CLANG_STATICANALYZER_CHECKERS_MEMSPACES_H
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 6422933c8828a9..e97be53fee4c7f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -22,6 +22,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/Support/FormatVariadic.h"
diff --git a/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp
index 754b167642961c..858b6a37551f5d 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp
@@ -22,6 +22,7 @@
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringSwitch.h"
@@ -75,10 +76,11 @@ void MacOSXAPIChecker::CheckDispatchOnce(CheckerContext &C, const CallExpr *CE,
   if (!R)
     return;
 
+  ProgramStateRef State = C.getState();
+
   // Global variables are fine.
   const MemRegion *RB = R->getBaseRegion();
-  const MemSpaceRegion *RS = RB->getMemorySpace();
-  if (isa<GlobalsSpaceRegion>(RS))
+  if (memspace::isMemSpaceOrTrait<GlobalsSpaceRegion>(State, RB))
     return;
 
   // Handle _dispatch_once.  In some versions of the OS X SDK we have the case
@@ -117,9 +119,9 @@ void MacOSXAPIChecker::CheckDispatchOnce(CheckerContext &C, const CallExpr *CE,
     if (IVR != R)
       os << " memory within";
     os << " the instance variable '" << IVR->getDecl()->getName() << '\'';
-  } else if (isa<HeapSpaceRegion>(RS)) {
+  } else if (memspace::isMemSpaceOrTrait<HeapSpaceRegion>(State, RB)) {
     os << " heap-allocated memory";
-  } else if (isa<UnknownSpaceRegion>(RS)) {
+  } else if (isa<UnknownSpaceRegion>(RB->getMemorySpace())) {
     // Presence of an IVar superregion has priority over this branch, because
     // ObjC objects are on the heap even if the core doesn't realize this.
     // Presence of a block variable base region has priority over this branch,
diff --git a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
index 4166cf14391e2d..76fee2f1322e17 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
@@ -72,6 +72,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"
@@ -784,7 +785,8 @@ class MallocChecker
                                              bool IsALeakCheck = false) const;
   ///@}
   static bool SummarizeValue(raw_ostream &os, SVal V);
-  static bool SummarizeRegion(raw_ostream &os, const MemRegion *MR);
+  static bool SummarizeRegion(ProgramStateRef State, raw_ostream &os,
+                              const MemRegion *MR);
 
   void HandleNonHeapDealloc(CheckerContext &C, SVal ArgVal, SourceRange Range,
                             const Expr *DeallocExpr,
@@ -2206,13 +2208,21 @@ MallocChecker::FreeMemAux(CheckerContext &C, const Expr *ArgExpr,
 
   // Parameters, locals, statics, globals, and memory returned by
   // __builtin_alloca() shouldn't be freed.
-  if (!isa<UnknownSpaceRegion, HeapSpaceRegion>(MS)) {
+  // Should skip this check if:
+  // - The region is in the heap
+  // - The region has unknown memspace and no trait for further clarification
+  //   (i.e., just unknown)
+  // - The region has unknown memspace and a heap memspace trait
+  const MemSpaceRegion *MSTrait = memspace::getMemSpaceTrait(State, R);
+  bool HasHeapOrUnknownTrait = !MSTrait || isa<HeapSpaceRegion>(MSTrait);
+  if (!(isa<HeapSpaceRegion>(MS) ||
+        (isa<UnknownSpaceRegion>(MS) && HasHeapOrUnknownTrait))) {
     // Regions returned by malloc() are represented by SymbolicRegion objects
     // within HeapSpaceRegion. Of course, free() can work on memory allocated
     // outside the current function, so UnknownSpaceRegion is also a
     // possibility here.
 
-    if (isa<AllocaRegion>(R))
+    if (isa<AllocaRegion>(R) || isa_and_nonnull<AllocaRegion>(MSTrait))
       HandleFreeAlloca(C, ArgVal, ArgExpr->getSourceRange());
     else
       HandleNonHeapDealloc(C, ArgVal, ArgExpr->getSourceRange(), ParentExpr,
@@ -2384,7 +2394,7 @@ bool MallocChecker::SummarizeValue(raw_ostream &os, SVal V) {
   return true;
 }
 
-bool MallocChecker::SummarizeRegion(raw_ostream &os,
+bool MallocChecker::SummarizeRegion(ProgramStateRef State, raw_ostream &os,
                                     const MemRegion *MR) {
   switch (MR->getKind()) {
   case MemRegion::FunctionCodeRegionKind: {
@@ -2404,8 +2414,10 @@ bool MallocChecker::SummarizeRegion(raw_ostream &os,
     return true;
   default: {
     const MemSpaceRegion *MS = MR->getMemorySpace();
+    const MemSpaceRegion *MSTrait = memspace::getMemSpaceTrait(State, MR);
 
-    if (isa<StackLocalsSpaceRegion>(MS)) {
+    if (isa<StackLocalsSpaceRegion>(MS) ||
+        isa_and_nonnull<StackLocalsSpaceRegion>(MSTrait)) {
       const VarRegion *VR = dyn_cast<VarRegion>(MR);
       const VarDecl *VD;
       if (VR)
@@ -2420,7 +2432,8 @@ bool MallocChecker::SummarizeRegion(raw_ostream &os,
       return true;
     }
 
-    if (isa<StackArgumentsSpaceRegion>(MS)) {
+    if (isa<StackArgumentsSpaceRegion>(MS) ||
+        isa_and_nonnull<StackArgumentsSpaceRegion>(MSTrait)) {
       const VarRegion *VR = dyn_cast<VarRegion>(MR);
       const VarDecl *VD;
       if (VR)
@@ -2435,7 +2448,8 @@ bool MallocChecker::SummarizeRegion(raw_ostream &os,
       return true;
     }
 
-    if (isa<GlobalsSpaceRegion>(MS)) {
+    if (isa<GlobalsSpaceRegion>(MS) ||
+        isa_and_nonnull<GlobalsSpaceRegion>(MSTrait)) {
       const VarRegion *VR = dyn_cast<VarRegion>(MR);
       const VarDecl *VD;
       if (VR)
@@ -2489,8 +2503,8 @@ void MallocChecker::HandleNonHeapDealloc(CheckerContext &C, SVal ArgVal,
       os << "deallocator";
 
     os << " is ";
-    bool Summarized = MR ? SummarizeRegion(os, MR)
-                         : SummarizeValue(os, ArgVal);
+    bool Summarized =
+        MR ? SummarizeRegion(C.getState(), os, MR) : SummarizeValue(os, ArgVal);
     if (Summarized)
       os << ", which is not memory allocated by ";
     else
diff --git a/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
index 52416e21399147..d4ff8186ea69c9 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
@@ -22,6 +22,7 @@
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "llvm/ADT/StringSet.h"
 
 using namespace clang;
@@ -145,12 +146,14 @@ class MoveChecker
 
   // Obtains ObjectKind of an object. Because class declaration cannot always
   // be easily obtained from the memory region, it is supplied separately.
-  ObjectKind classifyObject(const MemRegion *MR, const CXXRecordDecl *RD) const;
+  ObjectKind classifyObject(ProgramStateRef State, const MemRegion *MR,
+                            const CXXRecordDecl *RD) const;
 
   // Classifies the object and dumps a user-friendly description string to
   // the stream.
-  void explainObject(llvm::raw_ostream &OS, const MemRegion *MR,
-                     const CXXRecordDecl *RD, MisuseKind MK) const;
+  void explainObject(ProgramStateRef State, llvm::raw_ostream &OS,
+                     const MemRegion *MR, const CXXRecordDecl *RD,
+                     MisuseKind MK) const;
 
   bool belongsTo(const CXXRecordDecl *RD, const llvm::StringSet<> &Set) const;
 
@@ -299,12 +302,12 @@ MoveChecker::MovedBugVisitor::VisitNode(const ExplodedNode *N,
   SmallString<128> Str;
   llvm::raw_svector_ostream OS(Str);
 
-  ObjectKind OK = Chk.classifyObject(Region, RD);
+  ObjectKind OK = Chk.classifyObject(State, Region, RD);
   switch (OK.StdKind) {
     case SK_SmartPtr:
       if (MK == MK_Dereference) {
         OS << "Smart pointer";
-        Chk.explainObject(OS, Region, RD, MK);
+        Chk.explainObject(State, OS, Region, RD, MK);
         OS << " is reset to null when moved from";
         break;
       }
@@ -315,12 +318,12 @@ MoveChecker::MovedBugVisitor::VisitNode(const ExplodedNode *N,
     case SK_NonStd:
     case SK_Safe:
       OS << "Object";
-      Chk.explainObject(OS, Region, RD, MK);
+      Chk.explainObject(State, OS, Region, RD, MK);
       OS << " is moved";
       break;
     case SK_Unsafe:
       OS << "Object";
-      Chk.explainObject(OS, Region, RD, MK);
+      Chk.explainObject(State, OS, Region, RD, MK);
       OS << " is left in a valid but unspecified state after move";
       break;
   }
@@ -353,7 +356,7 @@ void MoveChecker::modelUse(ProgramStateRef State, const MemRegion *Region,
                            CheckerContext &C) const {
   assert(!C.isDifferent() && "No transitions should have been made by now");
   const RegionState *RS = State->get<TrackedRegionMap>(Region);
-  ObjectKind OK = classifyObject(Region, RD);
+  ObjectKind OK = classifyObject(State, Region, RD);
 
   // Just in case: if it's not a smart pointer but it does have operator *,
   // we shouldn't call the bug a dereference.
@@ -406,24 +409,25 @@ ExplodedNode *MoveChecker::tryToReportBug(const MemRegion *Region,
     // Creating the error message.
     llvm::SmallString<128> Str;
     llvm::raw_svector_ostream OS(Str);
+    ProgramStateRef ErrorNodeState = N->getState();
     switch(MK) {
       case MK_FunCall:
         OS << "Method called on moved-from object";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         break;
       case MK_Copy:
         OS << "Moved-from object";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         OS << " is copied";
         break;
       case MK_Move:
         OS << "Moved-from object";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         OS << " is moved";
         break;
       case MK_Dereference:
         OS << "Dereference of null smart pointer";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         break;
     }
 
@@ -482,7 +486,7 @@ void MoveChecker::checkPostCall(const CallEvent &Call,
     return;
 
   const CXXRecordDecl *RD = MethodDecl->getParent();
-  ObjectKind OK = classifyObject(ArgRegion, RD);
+  ObjectKind OK = classifyObject(State, ArgRegion, RD);
   if (shouldBeTracked(OK)) {
     // Mark object as moved-from.
     State = State->set<TrackedRegionMap>(ArgRegion, RegionState::getMoved());
@@ -549,7 +553,7 @@ bool MoveChecker::belongsTo(const CXXRecordDecl *RD,
 }
 
 MoveChecker::ObjectKind
-MoveChecker::classifyObject(const MemRegion *MR,
+MoveChecker::classifyObject(ProgramStateRef State, const MemRegion *MR,
                             const CXXRecordDecl *RD) const {
   // Local variables and local rvalue references are classified as "Local".
   // For the purposes of this checker, we classify move-safe STL types
@@ -557,7 +561,7 @@ MoveChecker::classifyObject(const MemRegion *MR,
   MR = unwrapRValueReferenceIndirection(MR);
   bool IsLocal =
       isa_and_nonnull<VarRegion, CXXLifetimeExtendedObjectRegion>(MR) &&
-      isa<StackSpaceRegion>(MR->getMemorySpace());
+      memspace::isMemSpaceOrTrait<StackSpaceRegion>(State, MR);
 
   if (!RD || !RD->getDeclContext()->isStdNamespace())
     return { IsLocal, SK_NonStd };
@@ -571,8 +575,9 @@ MoveChecker::classifyObject(const MemRegion *MR,
   return { IsLocal, SK_Unsafe };
 }
 
-void MoveChecker::explainObject(llvm::raw_ostream &OS, const MemRegion *MR,
-                                const CXXRecordDecl *RD, MisuseKind MK) const {
+void MoveChecker::explainObject(ProgramStateRef State, llvm::raw_ostream &OS,
+                                const MemRegion *MR, const CXXRecordDecl *RD,
+                                MisuseKind MK) const {
   // We may need a leading space every time we actually explain anything,
   // and we never know if we are to explain anything until we try.
   if (const auto DR =
@@ -581,7 +586,7 @@ void MoveChecker::explainObject(llvm::raw_ostream &OS, const MemRegion *MR,
     OS << " '" << RegionDecl->getDeclName() << "'";
   }
 
-  ObjectKind OK = classifyObject(MR, RD);
+  ObjectKind OK = classifyObject(State, MR, RD);
   switch (OK.StdKind) {
     case SK_NonStd:
     case SK_Safe:
diff --git a/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp
index bf81d57bf82fd3..4449cb4ae68eb9 100644
--- a/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp
@@ -21,6 +21,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 
 using namespace clang;
 using namespace ento;
@@ -45,10 +46,15 @@ void PutenvStackArrayChecker::checkPostCall(const CallEvent &Call,
   SVal ArgV = Call.getArgSVal(0);
   const Expr *ArgExpr = Call.getArgExpr(0);
 
-  const auto *SSR =
-      dyn_cast<StackSpaceRegion>(ArgV.getAsRegion()->getMemorySpace());
+  const MemRegion *MR = ArgV.getAsRegion();
+
+  const auto *SSR = dyn_cast<StackSpaceRegion>(MR->getMemorySpace());
+  if (!SSR)
+    SSR = dyn_cast_if_present<StackSpaceRegion>(
+        memspace::getMemSpaceTrait(C.getState(), MR));
   if (!SSR)
     return;
+
   const auto *StackFrameFuncD =
       dyn_cast_or_null<FunctionDecl>(SSR->getStackFrame()->getDecl());
   if (StackFrameFuncD && StackFrameFuncD->isMain())
diff --git a/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp b/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
index 456132ef0a0a22..ecc7563ceb4bba 100644
--- a/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
@@ -13,6 +13,7 @@
 
 #include "RetainCountDiagnostics.h"
 #include "RetainCountChecker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include <optional>
@@ -690,9 +691,14 @@ static AllocationInfo GetAllocationSite(ProgramStateManager &StateMgr,
       const MemRegion *R = FB.getRegion();
       // Do not show local variables belonging to a function other than
       // where the error is reported.
-      if (auto MR = dyn_cast<StackSpaceRegion>(R->getMemorySpace()))
-        if (MR->getStackFrame() == LeakContext->getStackFrame())
-          FirstBinding = R;
+      const StackSpaceRegion *MR =
+          dyn_cast<StackSpaceRegion>(R->getMemorySpace());
+      if (!MR)
+        MR = dyn_cast_if_present<StackSpaceRegion>(
+            memspace::getMemSpaceTrait(St, R));
+
+      if (MR && MR->getStackFrame() == LeakContext->getStackFrame())
+        FirstBinding = R;
     }
 
     // AllocationNode is the last node in which the symbol was tracked.
diff --git a/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
index f4de3b500499c4..4d8c8eea14ea63 100644
--- a/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
@@ -19,6 +19,7 @@
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/Support/raw_ostream.h"
@@ -61,7 +62,7 @@ class StackAddrEscapeChecker
                              ASTContext &Ctx);
   static SmallVector<const MemRegion *, 4>
   getCapturedStackRegions(const BlockDataRegion &B, CheckerContext &C);
-  static bool isNotInCurrentFrame(const MemRegion *R, CheckerContext &C);
+  static bool isNotInCurrentFrame(const MemSpaceRegion *MS, CheckerContext &C);
 };
 } // namespace
 
@@ -117,9 +118,9 @@ SourceRange StackAddrEscapeChecker::genName(raw_ostream &os, const MemRegion *R,
   return range;
 }
 
-bool StackAddrEscapeChecker::isNotInCurrentFrame(const MemRegion *R,
+bool StackAddrEscapeChecker::isNotInCurrentFrame(const MemSpaceRegion *MS,
                                                  CheckerContext &C) {
-  const StackSpaceRegion *S = cast<StackSpaceRegion>(R->getMemorySpace());
+  const StackSpaceRegion *S = cast<StackSpaceRegion>(MS);
   return S->getStackFrame() != C.getStackFrame();
 }
 
@@ -138,10 +139,11 @@ SmallVector<const MemRegion *, 4>
 StackAddrEscapeChecker::getCapturedStackRegions(const BlockDataRegion &B,
                                                 CheckerContext &C) {
   SmallVector<const MemRegion *, 4> Regions;
+  ProgramStateRef State = C.getState();
   for (auto Var : B.referenced_vars()) {
-    SVal Val = C.getState()->getSVal(Var.getCapturedRegion());
+    SVal Val = State->getSVal(Var.getCapturedRegion());
     const MemRegion *Region = Val.getA...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 15, 2025

@llvm/pr-subscribers-clang

Author: Michael Flanders (Flandini)

Changes

This trait associates MemSpaceRegions with MemRegions to allow refining or changing information known about memory regions after they are created, since they are immutable. This commit is an intermediate step towards moving MemSpaceRegion out of the MemRegion class hierarchy and moving all notion of MemSpaceRegion to the trait. The intermediate step is that for now, only MemRegions with UnknownSpaceRegion are mapped in the trait and checked in checkers/core.

Tagging @steakhal for review.


Patch is 28.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123003.diff

13 Files Affected:

  • (added) clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h (+47)
  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp (+6-4)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp (+23-9)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp (+23-18)
  • (modified) clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp (+8-2)
  • (modified) clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp (+9-3)
  • (modified) clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp (+18-8)
  • (modified) clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp (+2-1)
  • (modified) clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp (+11-1)
  • (modified) clang/lib/StaticAnalyzer/Core/CMakeLists.txt (+1)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+10)
  • (added) clang/lib/StaticAnalyzer/Core/MemSpaces.cpp (+62)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h
new file mode 100644
index 00000000000000..178a6b60c1cb1a
--- /dev/null
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h
@@ -0,0 +1,47 @@
+//===-- MemSpaces.h -----------------------------------------------*- C++ -*--//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_STATICANALYZER_CHECKERS_MEMSPACES_H
+#define LLVM_CLANG_STATICANALYZER_CHECKERS_MEMSPACES_H
+
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"
+
+namespace clang {
+namespace ento {
+
+class MemRegion;
+class MemSpaceRegion;
+
+namespace memspace {
+
+[[nodiscard]] ProgramStateRef setMemSpaceTrait(ProgramStateRef State,
+                                               const MemRegion *MR,
+                                               const MemSpaceRegion *MS);
+
+[[nodiscard]] const MemSpaceRegion *getMemSpaceTrait(ProgramStateRef State,
+                                                     const MemRegion *MR);
+
+[[nodiscard]] bool hasMemSpaceTrait(ProgramStateRef State, const MemRegion *MR);
+
+template <typename FirstT, typename... RestT>
+[[nodiscard]] bool isMemSpaceOrTrait(ProgramStateRef State,
+                                     const MemRegion *MR) {
+  return isa<FirstT, RestT...>(MR->getMemorySpace()) ||
+         isa_and_nonnull<FirstT, RestT...>(getMemSpaceTrait(State, MR));
+}
+
+} // namespace memspace
+} // namespace ento
+} // namespace clang
+
+#endif // LLVM_CLANG_STATICANALYZER_CHECKERS_MEMSPACES_H
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 6422933c8828a9..e97be53fee4c7f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -22,6 +22,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/Support/FormatVariadic.h"
diff --git a/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp
index 754b167642961c..858b6a37551f5d 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MacOSXAPIChecker.cpp
@@ -22,6 +22,7 @@
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringSwitch.h"
@@ -75,10 +76,11 @@ void MacOSXAPIChecker::CheckDispatchOnce(CheckerContext &C, const CallExpr *CE,
   if (!R)
     return;
 
+  ProgramStateRef State = C.getState();
+
   // Global variables are fine.
   const MemRegion *RB = R->getBaseRegion();
-  const MemSpaceRegion *RS = RB->getMemorySpace();
-  if (isa<GlobalsSpaceRegion>(RS))
+  if (memspace::isMemSpaceOrTrait<GlobalsSpaceRegion>(State, RB))
     return;
 
   // Handle _dispatch_once.  In some versions of the OS X SDK we have the case
@@ -117,9 +119,9 @@ void MacOSXAPIChecker::CheckDispatchOnce(CheckerContext &C, const CallExpr *CE,
     if (IVR != R)
       os << " memory within";
     os << " the instance variable '" << IVR->getDecl()->getName() << '\'';
-  } else if (isa<HeapSpaceRegion>(RS)) {
+  } else if (memspace::isMemSpaceOrTrait<HeapSpaceRegion>(State, RB)) {
     os << " heap-allocated memory";
-  } else if (isa<UnknownSpaceRegion>(RS)) {
+  } else if (isa<UnknownSpaceRegion>(RB->getMemorySpace())) {
     // Presence of an IVar superregion has priority over this branch, because
     // ObjC objects are on the heap even if the core doesn't realize this.
     // Presence of a block variable base region has priority over this branch,
diff --git a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
index 4166cf14391e2d..76fee2f1322e17 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
@@ -72,6 +72,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"
@@ -784,7 +785,8 @@ class MallocChecker
                                              bool IsALeakCheck = false) const;
   ///@}
   static bool SummarizeValue(raw_ostream &os, SVal V);
-  static bool SummarizeRegion(raw_ostream &os, const MemRegion *MR);
+  static bool SummarizeRegion(ProgramStateRef State, raw_ostream &os,
+                              const MemRegion *MR);
 
   void HandleNonHeapDealloc(CheckerContext &C, SVal ArgVal, SourceRange Range,
                             const Expr *DeallocExpr,
@@ -2206,13 +2208,21 @@ MallocChecker::FreeMemAux(CheckerContext &C, const Expr *ArgExpr,
 
   // Parameters, locals, statics, globals, and memory returned by
   // __builtin_alloca() shouldn't be freed.
-  if (!isa<UnknownSpaceRegion, HeapSpaceRegion>(MS)) {
+  // Should skip this check if:
+  // - The region is in the heap
+  // - The region has unknown memspace and no trait for further clarification
+  //   (i.e., just unknown)
+  // - The region has unknown memspace and a heap memspace trait
+  const MemSpaceRegion *MSTrait = memspace::getMemSpaceTrait(State, R);
+  bool HasHeapOrUnknownTrait = !MSTrait || isa<HeapSpaceRegion>(MSTrait);
+  if (!(isa<HeapSpaceRegion>(MS) ||
+        (isa<UnknownSpaceRegion>(MS) && HasHeapOrUnknownTrait))) {
     // Regions returned by malloc() are represented by SymbolicRegion objects
     // within HeapSpaceRegion. Of course, free() can work on memory allocated
     // outside the current function, so UnknownSpaceRegion is also a
     // possibility here.
 
-    if (isa<AllocaRegion>(R))
+    if (isa<AllocaRegion>(R) || isa_and_nonnull<AllocaRegion>(MSTrait))
       HandleFreeAlloca(C, ArgVal, ArgExpr->getSourceRange());
     else
       HandleNonHeapDealloc(C, ArgVal, ArgExpr->getSourceRange(), ParentExpr,
@@ -2384,7 +2394,7 @@ bool MallocChecker::SummarizeValue(raw_ostream &os, SVal V) {
   return true;
 }
 
-bool MallocChecker::SummarizeRegion(raw_ostream &os,
+bool MallocChecker::SummarizeRegion(ProgramStateRef State, raw_ostream &os,
                                     const MemRegion *MR) {
   switch (MR->getKind()) {
   case MemRegion::FunctionCodeRegionKind: {
@@ -2404,8 +2414,10 @@ bool MallocChecker::SummarizeRegion(raw_ostream &os,
     return true;
   default: {
     const MemSpaceRegion *MS = MR->getMemorySpace();
+    const MemSpaceRegion *MSTrait = memspace::getMemSpaceTrait(State, MR);
 
-    if (isa<StackLocalsSpaceRegion>(MS)) {
+    if (isa<StackLocalsSpaceRegion>(MS) ||
+        isa_and_nonnull<StackLocalsSpaceRegion>(MSTrait)) {
       const VarRegion *VR = dyn_cast<VarRegion>(MR);
       const VarDecl *VD;
       if (VR)
@@ -2420,7 +2432,8 @@ bool MallocChecker::SummarizeRegion(raw_ostream &os,
       return true;
     }
 
-    if (isa<StackArgumentsSpaceRegion>(MS)) {
+    if (isa<StackArgumentsSpaceRegion>(MS) ||
+        isa_and_nonnull<StackArgumentsSpaceRegion>(MSTrait)) {
       const VarRegion *VR = dyn_cast<VarRegion>(MR);
       const VarDecl *VD;
       if (VR)
@@ -2435,7 +2448,8 @@ bool MallocChecker::SummarizeRegion(raw_ostream &os,
       return true;
     }
 
-    if (isa<GlobalsSpaceRegion>(MS)) {
+    if (isa<GlobalsSpaceRegion>(MS) ||
+        isa_and_nonnull<GlobalsSpaceRegion>(MSTrait)) {
       const VarRegion *VR = dyn_cast<VarRegion>(MR);
       const VarDecl *VD;
       if (VR)
@@ -2489,8 +2503,8 @@ void MallocChecker::HandleNonHeapDealloc(CheckerContext &C, SVal ArgVal,
       os << "deallocator";
 
     os << " is ";
-    bool Summarized = MR ? SummarizeRegion(os, MR)
-                         : SummarizeValue(os, ArgVal);
+    bool Summarized =
+        MR ? SummarizeRegion(C.getState(), os, MR) : SummarizeValue(os, ArgVal);
     if (Summarized)
       os << ", which is not memory allocated by ";
     else
diff --git a/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
index 52416e21399147..d4ff8186ea69c9 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
@@ -22,6 +22,7 @@
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "llvm/ADT/StringSet.h"
 
 using namespace clang;
@@ -145,12 +146,14 @@ class MoveChecker
 
   // Obtains ObjectKind of an object. Because class declaration cannot always
   // be easily obtained from the memory region, it is supplied separately.
-  ObjectKind classifyObject(const MemRegion *MR, const CXXRecordDecl *RD) const;
+  ObjectKind classifyObject(ProgramStateRef State, const MemRegion *MR,
+                            const CXXRecordDecl *RD) const;
 
   // Classifies the object and dumps a user-friendly description string to
   // the stream.
-  void explainObject(llvm::raw_ostream &OS, const MemRegion *MR,
-                     const CXXRecordDecl *RD, MisuseKind MK) const;
+  void explainObject(ProgramStateRef State, llvm::raw_ostream &OS,
+                     const MemRegion *MR, const CXXRecordDecl *RD,
+                     MisuseKind MK) const;
 
   bool belongsTo(const CXXRecordDecl *RD, const llvm::StringSet<> &Set) const;
 
@@ -299,12 +302,12 @@ MoveChecker::MovedBugVisitor::VisitNode(const ExplodedNode *N,
   SmallString<128> Str;
   llvm::raw_svector_ostream OS(Str);
 
-  ObjectKind OK = Chk.classifyObject(Region, RD);
+  ObjectKind OK = Chk.classifyObject(State, Region, RD);
   switch (OK.StdKind) {
     case SK_SmartPtr:
       if (MK == MK_Dereference) {
         OS << "Smart pointer";
-        Chk.explainObject(OS, Region, RD, MK);
+        Chk.explainObject(State, OS, Region, RD, MK);
         OS << " is reset to null when moved from";
         break;
       }
@@ -315,12 +318,12 @@ MoveChecker::MovedBugVisitor::VisitNode(const ExplodedNode *N,
     case SK_NonStd:
     case SK_Safe:
       OS << "Object";
-      Chk.explainObject(OS, Region, RD, MK);
+      Chk.explainObject(State, OS, Region, RD, MK);
       OS << " is moved";
       break;
     case SK_Unsafe:
       OS << "Object";
-      Chk.explainObject(OS, Region, RD, MK);
+      Chk.explainObject(State, OS, Region, RD, MK);
       OS << " is left in a valid but unspecified state after move";
       break;
   }
@@ -353,7 +356,7 @@ void MoveChecker::modelUse(ProgramStateRef State, const MemRegion *Region,
                            CheckerContext &C) const {
   assert(!C.isDifferent() && "No transitions should have been made by now");
   const RegionState *RS = State->get<TrackedRegionMap>(Region);
-  ObjectKind OK = classifyObject(Region, RD);
+  ObjectKind OK = classifyObject(State, Region, RD);
 
   // Just in case: if it's not a smart pointer but it does have operator *,
   // we shouldn't call the bug a dereference.
@@ -406,24 +409,25 @@ ExplodedNode *MoveChecker::tryToReportBug(const MemRegion *Region,
     // Creating the error message.
     llvm::SmallString<128> Str;
     llvm::raw_svector_ostream OS(Str);
+    ProgramStateRef ErrorNodeState = N->getState();
     switch(MK) {
       case MK_FunCall:
         OS << "Method called on moved-from object";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         break;
       case MK_Copy:
         OS << "Moved-from object";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         OS << " is copied";
         break;
       case MK_Move:
         OS << "Moved-from object";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         OS << " is moved";
         break;
       case MK_Dereference:
         OS << "Dereference of null smart pointer";
-        explainObject(OS, Region, RD, MK);
+        explainObject(ErrorNodeState, OS, Region, RD, MK);
         break;
     }
 
@@ -482,7 +486,7 @@ void MoveChecker::checkPostCall(const CallEvent &Call,
     return;
 
   const CXXRecordDecl *RD = MethodDecl->getParent();
-  ObjectKind OK = classifyObject(ArgRegion, RD);
+  ObjectKind OK = classifyObject(State, ArgRegion, RD);
   if (shouldBeTracked(OK)) {
     // Mark object as moved-from.
     State = State->set<TrackedRegionMap>(ArgRegion, RegionState::getMoved());
@@ -549,7 +553,7 @@ bool MoveChecker::belongsTo(const CXXRecordDecl *RD,
 }
 
 MoveChecker::ObjectKind
-MoveChecker::classifyObject(const MemRegion *MR,
+MoveChecker::classifyObject(ProgramStateRef State, const MemRegion *MR,
                             const CXXRecordDecl *RD) const {
   // Local variables and local rvalue references are classified as "Local".
   // For the purposes of this checker, we classify move-safe STL types
@@ -557,7 +561,7 @@ MoveChecker::classifyObject(const MemRegion *MR,
   MR = unwrapRValueReferenceIndirection(MR);
   bool IsLocal =
       isa_and_nonnull<VarRegion, CXXLifetimeExtendedObjectRegion>(MR) &&
-      isa<StackSpaceRegion>(MR->getMemorySpace());
+      memspace::isMemSpaceOrTrait<StackSpaceRegion>(State, MR);
 
   if (!RD || !RD->getDeclContext()->isStdNamespace())
     return { IsLocal, SK_NonStd };
@@ -571,8 +575,9 @@ MoveChecker::classifyObject(const MemRegion *MR,
   return { IsLocal, SK_Unsafe };
 }
 
-void MoveChecker::explainObject(llvm::raw_ostream &OS, const MemRegion *MR,
-                                const CXXRecordDecl *RD, MisuseKind MK) const {
+void MoveChecker::explainObject(ProgramStateRef State, llvm::raw_ostream &OS,
+                                const MemRegion *MR, const CXXRecordDecl *RD,
+                                MisuseKind MK) const {
   // We may need a leading space every time we actually explain anything,
   // and we never know if we are to explain anything until we try.
   if (const auto DR =
@@ -581,7 +586,7 @@ void MoveChecker::explainObject(llvm::raw_ostream &OS, const MemRegion *MR,
     OS << " '" << RegionDecl->getDeclName() << "'";
   }
 
-  ObjectKind OK = classifyObject(MR, RD);
+  ObjectKind OK = classifyObject(State, MR, RD);
   switch (OK.StdKind) {
     case SK_NonStd:
     case SK_Safe:
diff --git a/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp
index bf81d57bf82fd3..4449cb4ae68eb9 100644
--- a/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/PutenvStackArrayChecker.cpp
@@ -21,6 +21,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 
 using namespace clang;
 using namespace ento;
@@ -45,10 +46,15 @@ void PutenvStackArrayChecker::checkPostCall(const CallEvent &Call,
   SVal ArgV = Call.getArgSVal(0);
   const Expr *ArgExpr = Call.getArgExpr(0);
 
-  const auto *SSR =
-      dyn_cast<StackSpaceRegion>(ArgV.getAsRegion()->getMemorySpace());
+  const MemRegion *MR = ArgV.getAsRegion();
+
+  const auto *SSR = dyn_cast<StackSpaceRegion>(MR->getMemorySpace());
+  if (!SSR)
+    SSR = dyn_cast_if_present<StackSpaceRegion>(
+        memspace::getMemSpaceTrait(C.getState(), MR));
   if (!SSR)
     return;
+
   const auto *StackFrameFuncD =
       dyn_cast_or_null<FunctionDecl>(SSR->getStackFrame()->getDecl());
   if (StackFrameFuncD && StackFrameFuncD->isMain())
diff --git a/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp b/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
index 456132ef0a0a22..ecc7563ceb4bba 100644
--- a/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
@@ -13,6 +13,7 @@
 
 #include "RetainCountDiagnostics.h"
 #include "RetainCountChecker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include <optional>
@@ -690,9 +691,14 @@ static AllocationInfo GetAllocationSite(ProgramStateManager &StateMgr,
       const MemRegion *R = FB.getRegion();
       // Do not show local variables belonging to a function other than
       // where the error is reported.
-      if (auto MR = dyn_cast<StackSpaceRegion>(R->getMemorySpace()))
-        if (MR->getStackFrame() == LeakContext->getStackFrame())
-          FirstBinding = R;
+      const StackSpaceRegion *MR =
+          dyn_cast<StackSpaceRegion>(R->getMemorySpace());
+      if (!MR)
+        MR = dyn_cast_if_present<StackSpaceRegion>(
+            memspace::getMemSpaceTrait(St, R));
+
+      if (MR && MR->getStackFrame() == LeakContext->getStackFrame())
+        FirstBinding = R;
     }
 
     // AllocationNode is the last node in which the symbol was tracked.
diff --git a/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
index f4de3b500499c4..4d8c8eea14ea63 100644
--- a/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
@@ -19,6 +19,7 @@
 #include "clang/StaticAnalyzer/Core/CheckerManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemSpaces.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/Support/raw_ostream.h"
@@ -61,7 +62,7 @@ class StackAddrEscapeChecker
                              ASTContext &Ctx);
   static SmallVector<const MemRegion *, 4>
   getCapturedStackRegions(const BlockDataRegion &B, CheckerContext &C);
-  static bool isNotInCurrentFrame(const MemRegion *R, CheckerContext &C);
+  static bool isNotInCurrentFrame(const MemSpaceRegion *MS, CheckerContext &C);
 };
 } // namespace
 
@@ -117,9 +118,9 @@ SourceRange StackAddrEscapeChecker::genName(raw_ostream &os, const MemRegion *R,
   return range;
 }
 
-bool StackAddrEscapeChecker::isNotInCurrentFrame(const MemRegion *R,
+bool StackAddrEscapeChecker::isNotInCurrentFrame(const MemSpaceRegion *MS,
                                                  CheckerContext &C) {
-  const StackSpaceRegion *S = cast<StackSpaceRegion>(R->getMemorySpace());
+  const StackSpaceRegion *S = cast<StackSpaceRegion>(MS);
   return S->getStackFrame() != C.getStackFrame();
 }
 
@@ -138,10 +139,11 @@ SmallVector<const MemRegion *, 4>
 StackAddrEscapeChecker::getCapturedStackRegions(const BlockDataRegion &B,
                                                 CheckerContext &C) {
   SmallVector<const MemRegion *, 4> Regions;
+  ProgramStateRef State = C.getState();
   for (auto Var : B.referenced_vars()) {
-    SVal Val = C.getState()->getSVal(Var.getCapturedRegion());
+    SVal Val = State->getSVal(Var.getCapturedRegion());
     const MemRegion *Region = Val.getA...
[truncated]

@Flandini
Copy link
Contributor Author

Flandini commented Jan 15, 2025

I think I changed all usages of memory spaces in Core/ and Checkers/. I looked for these with:

rg -i 'memspace|globalsspace|codespace|globalspace|globalsystemspace|globalimmutablespace|globalinternalspace|heapspace|stackspace|unknownspace' clang/include/clang/StaticAnalyzer/ clang/lib/StaticAnalyzer/

The only spot I thought was questionable to change is this lower bound check in ArrayBoundCheckerV2; I think it is fine to leave unchanged for now, thoughts?

// CHECK LOWER BOUND
const MemSpaceRegion *Space = Reg->getMemorySpace();
if (!(isa<SymbolicRegion>(Reg) && isa<UnknownSpaceRegion>(Space))) {

There is also currently no place where these traits are set, so everything should have the same behaviour as before.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly support the idea that we should eventually represent the memory space of a memory region by a state trait (instead of the "fake" super-region) and I am grateful that you're working on this.

I added a few suggestions in inline comments that could simplify the logic and ensure better behavior w.r.t. element and field subregions. (Also there are a few trivial nitpicks about comments.)

In my opinion it would be nice to have some code path that sets and uses these MemSpace traits (even in this initial commit), because that would let us test the code and verify that it's working correctly.

However, I'm not totally opposed to accepting this as an NFC (no functional change) commit, but in that case, please add [NFC] to the review/commit title (before merging the commit) as that helps developers who want to quickly find a commit which is responsible for a certain issue.

Copy link

github-actions bot commented Jan 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Flandini Flandini changed the title [analyzer] Add MemSpace trait to program state [nfc][analyzer] Add MemSpace trait to program state Jan 15, 2025
@Flandini
Copy link
Contributor Author

@NagyDonat, I made the requested changes, lmk if this looks better.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction of this PR but at the same time it makes memspaces a bit more error prone to use. Do you think we could find a way to prevent using isa on memspaces?

@Flandini
Copy link
Contributor Author

I like the direction of this PR but at the same time it makes memspaces a bit more error prone to use. Do you think we could find a way to prevent using isa on memspaces?

Do you mean moving away from a class hierarchy definition for memspaces and towards something else, maybe a enum Kind definition, or maybe just try to hide all the casting behind a different API? IIUC, from previous chat with @steakhal, the eventual goal is to get rid of the MemSpaceRegion part of the MemRegion class hierarchy and replace it with something else.

@Xazax-hun
Copy link
Collaborator

Do you mean moving away from a class hierarchy definition for memspaces and towards something else, maybe a enum Kind definition, or maybe just try to hide all the casting behind a different API?

I think it would be nice to do the latter in this patch so using isa directly will lead to a compilation error, would prevent some potential errors. We could do the former as a follow-up later.

@NagyDonat
Copy link
Contributor

@Xazax-hun
When you write that this commit "makes memspaces a bit more error prone to use. Do you think we could find a way to prevent using isa on memspaces?" do I understand it correctly you mean that it's error-prone to use isa<...>(MR->getMemorySpace()) directly? This expression is indeed error-prone, but the problem is not the use of isa but the use of the method getMemorySpace() which will sometimes return incorrect (not the most accurate) data after this commit.

I think and hope that this error-prone situation will be resolved soon in a follow-up commit that completely transfers the handling of memory spaces to the state trait, because after that point MemRegion won't have an internal not entirely accurate reference to its memory space and all memory space lookups would look like getMemSpace(MR, State) (and it will be fine to call isa on this function).

@Xazax-hun
Copy link
Collaborator

This expression is indeed error-prone, but the problem is not the use of isa but the use of the method getMemorySpace()

My bad! You are right, it is not the isa. :)

this error-prone situation will be resolved soon in a follow-up commit

I think it would be nice if we never had the error prone APIs to begin with since people can lose interest, or might get hit by a bus and we are left in an unfortunate state. That being said if there is no easy way to make this less error prone I am OK with delaying that to a follow up PR. But might worth exploring if this is something that can be done in a couple hours, why not.

@NagyDonat
Copy link
Contributor

The only spot I thought was questionable to change is this lower bound check in ArrayBoundCheckerV2; I think it is fine to leave unchanged for now, thoughts?

// CHECK LOWER BOUND
const MemSpaceRegion *Space = Reg->getMemorySpace();
if (!(isa<SymbolicRegion>(Reg) && isa<UnknownSpaceRegion>(Space))) {

I forgot to answer this earlier, but this location in ArrayBoundCheckerV2 is indeed a place where it's better to leave the code unchanged. In fact, if we consider a somewhat contrived example like

int globalArray[64];
void foo(int *p) {
  if (p == globalArray + 32) {
    p[-16] = 42
  }
}

then (1) p is initially a symbolic pointer in unknown space (we don't know anything about it) and then (2) within the if we can deduce that p is in the global memory space, but (3) we don't want to report p[-16] = 42 as an underflow error, because p is not the beginning of a memory area within the global space.

If the planned follow-up changes erase the distinction between "region constructed in a non-unknown space" and "initially unknown region whose memory space was deduced later", then we'll get regions (i.e. pointer values) that appear to be top-level regions (i.e. not an element or a field of something else) but point into the middle of an allocation. This would complicate the diagnosis of underflow errors in theory -- but in practice I think false positives from this source would be rare. Still, it would be nice to have a better solution than "just eat a few false positives".

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look at the PR, and over all it looks as expected. Good job!

Have you considered changing MemRegion::getMemorySpace() into MemRegion::getMemorySpace(ProgramStateRef)?

This should lessen the confusion of which APIs should a dev use to get the memspace of a region (aka. the getter). Speaking of the setter counter part, I think that's is special-purpose enough that most people shouldn't know it exist, so I don't think that would bring much confusion.

So the question is, would changing it into MemRegion::getMemorySpace(ProgramStateRef) be too intrusive to consider? (Maybe in some contexts where we currently call getMemorySpace we just don't have a State, and can't easily get one - and that would kill this approach.) Alternatively, we could say, just introduce this as an overload, and highlight that people should really really use this one. That would bring consistency while being pragmatic and accepting past design choices.

@Flandini
Copy link
Contributor Author

Have you considered changing MemRegion::getMemorySpace() into MemRegion::getMemorySpace(ProgramStateRef)?

I thought about this, but I decided against it since I am thinking that MemSpaces will eventually be their own separate thing, not part of the MemRegion hierarchy, i.e., MemSpaces will eventually just be a trait thing, maybe with its own checker depending on how much functionality can be added with this different implementation?

All this being said, I am happy to change the API of how we get memory spaces, I am not tied to the current approach/API, just that we can move towards better results for #115410.

(Maybe in some contexts where we currently call getMemorySpace we just don't have a State, and can't easily get one - and that would kill this approach.)

This has been the main thing I'm working through on this PR. This PR doesn't totally remove all getMemorySpace calls; this PR covers most of the files that use memory spaces, but there still remain some spots that are harder for me to change right now that I plan to work through. These are in RegionStore and ClusterAnalysis which don't take state, MemRegion (including the changes that would happen after removing MemSpace from the hierarchy) and all the print/dump methods which don't take state, and a dozen spots in StackAddrEscapeChecker that don't take state that I couldn't think of an immediate way to add state.

This is also related to @Xazax-hun's above question about whether it would be possible to make all the changes here at once, do you have thoughts on that @steakhal, I think you tried some stuff here?

@Flandini
Copy link
Contributor Author

Working on some benchmarking. Posted some problems in the discord channel, but working through it now, and benchmarking with the CodeChecker tool using only clangsa. Is there an already established set of benchmarks y'all use somewhere? IIRC, there was some github repo with some benchmarks that I saw linked earlier.

@steakhal
Copy link
Contributor

Have you considered changing MemRegion::getMemorySpace() into MemRegion::getMemorySpace(ProgramStateRef)?

I thought about this, but I decided against it since I am thinking that MemSpaces will eventually be their own separate thing, not part of the MemRegion hierarchy, i.e., MemSpaces will eventually just be a trait thing, maybe with its own checker depending on how much functionality can be added with this different implementation?

All this being said, I am happy to change the API of how we get memory spaces, I am not tied to the current approach/API, just that we can move towards better results for #115410.

Let express that I don't think we will ever drop memspaces from memregions, even if we strive for. The return of investment is not going to be there once we cover the unknownspace case - which you plan to fix the issue you referred to.
So I think we should aim for a consistent and better landscape to achieve that minimal goal.

I think having a single API (or overload set) is important for this. This is why I'd push for patching the getMemorySpace() overloads.

(Maybe in some contexts where we currently call getMemorySpace we just don't have a State, and can't easily get one - and that would kill this approach.)

This has been the main thing I'm working through on this PR. This PR doesn't totally remove all getMemorySpace calls; this PR covers most of the files that use memory spaces, but there still remain some spots that are harder for me to change right now that I plan to work through. These are in RegionStore and ClusterAnalysis which don't take state, MemRegion (including the changes that would happen after removing MemSpace from the hierarchy) and all the print/dump methods which don't take state, and a dozen spots in StackAddrEscapeChecker that don't take state that I couldn't think of an immediate way to add state.

Yea, in those cases I think we should just do what we currently do: don't care about dynamically deduced memspaces, aka. let them use the getMemorySpace() without the ProgramStateRef parameter. I'm not even sure if those usecases in the Store and the Cluster analysis are really justified. Probably they don't even bring any benefit, so we should worry about them too much.
Given that StackAddrEscapeChecker is a checker, we should always have some State. I wonder what do you have in mind there exactly.

This is also related to @Xazax-hun's above question about whether it would be possible to make all the changes here at once, do you have thoughts on that @steakhal, I think you tried some stuff here?

I think the best experience for everyone is if we make the smallest consistent step fixing #115410.
That is, be able to associate a deduced memspace with a memregion that was created with the Unknown memspace.

Once we have that, we can discuss how to move forward if we really want to move anywhere.

I've spent about one bit more than a day to completely get rid of memspaces from the memregions hierarchy. I should have chunk it up, but difficult to make incremental refactors with this so I gave up on that. I figured that I should spend my time on other things.

@Flandini
Copy link
Contributor Author

Flandini commented Jan 24, 2025

Benchmarking is taking a while for me, and I can only really run benchmarks with low run-time variance overnight. I will post more over time as I get more benchmark projects added.

Here is the result from benchmarking one C project, libsodium, so far. These results are for 13 runs after some warmup runs. This is with a debug build and default checkers enabled using CodeChecker:

base/main branch of clang:

  • Mean: 681.3 seconds
  • Stdev: 12.8 seconds

This branch:

  • Mean: 678.1 seconds
  • Stdev 15.4 seconds

edit: This is with the change that would not be merged: that there is a lookup in the trait for each memregion rather than only for those with UnknownSpaceRegion.

@steakhal
Copy link
Contributor

Working on some benchmarking. Posted some problems in the discord channel, but working through it now, and benchmarking with the CodeChecker tool using only clangsa. Is there an already established set of benchmarks y'all use somewhere? IIRC, there was some github repo with some benchmarks that I saw linked earlier.

This is a sad story. We don't really have the tooling. I mean, at Sonar, we have our internal tooling. Ericsson also has quite a bit of internal tooling built on top of https://github.com/Xazax-hun/csa-testbench.
There is some outdated dockerfile at https://github.com/llvm/llvm-project/tree/main/clang/utils/analyzer,

Like in most cases, we don't have the docs xD
The best I've found so far was this: https://reviews.llvm.org/D126196
You will see the last comment describes how they used it, so I guess that should work for you too.

Probably the SATest.py is the best you can reach for once you dust it off a bit.

@steakhal
Copy link
Contributor

Benchmarking is taking a while for me, and I can only really run benchmarks with low run-time variance overnight. I will post more over time as I get more benchmark projects added.

Here is the result from benchmarking one C project, libsodium, so far. These results are for 13 runs after some warmup runs. This is with a debug build and default checkers enabled using CodeChecker:

base/main branch of clang:

* Mean: 681.3 seconds

* Stdev: 12.8 seconds

This branch:

* Mean: 678.1 seconds

* Stdev 15.4 seconds

edit: This is with the change that would not be merged: that there is a lookup in the trait for each memregion rather than only for those with UnknownSpaceRegion.

I see no risk in this change to regress analysis time. Consequently, if I were you, I wouldn't run benchmarks. Unless my primary objective would be to dust off the SATest.py and get on hands experience using that tool.

@NagyDonat
Copy link
Contributor

NagyDonat commented Jan 27, 2025

Have you considered changing MemRegion::getMemorySpace() into MemRegion::getMemorySpace(ProgramStateRef)?

I thought about this, but I decided against it since I am thinking that MemSpaces will eventually be their own separate thing, not part of the MemRegion hierarchy, i.e., MemSpaces will eventually just be a trait thing, maybe with its own checker depending on how much functionality can be added with this different implementation?
All this being said, I am happy to change the API of how we get memory spaces, I am not tied to the current approach/API, just that we can move towards better results for #115410.

Let express that I don't think we will ever drop memspaces from memregions, even if we strive for. The return of investment is not going to be there once we cover the unknownspace case - which you plan to fix the issue you referred to. So I think we should aim for a consistent and better landscape to achieve that minimal goal.

I think having a single API (or overload set) is important for this. This is why I'd push for patching the getMemorySpace() overloads.

I would strongly support changing MemRegion::getMemorySpace() into MemRegion::getMemorySpace(ProgramStateRef) even within this commit if it's feasible (and perhaps leave behind a MemRegion::getRawMemorySpace() -- which is marked with a scary "deprecated" comment -- if it's needed e.g. for the ArrayBoundCheckerV2 lower bound check) -- simply because there is no reason to leave behind a method that appears to "do the job" but does only half of it, even temporarily. In cases like this when there are multiple functions "for the same job" it's important to (1) ensure that they are linked to each other (2) highlight the differences and deficiencies.

However, I would be very happy if this refactoring effort continued by dropping the memspaces from the memregions in follow-up commits -- I don't think that it would be terribly difficult and it would make it easier to reason about memregions. (If nobody else does it, I will eventually do it when I get fed up on some confusing MemRegion-related code.)

@Flandini Flandini requested a review from Xazax-hun January 27, 2025 20:56
@Flandini
Copy link
Contributor Author

Made the above changes. I moved it all into MemRegion and changed the names of and added deprecation comments to the old MemRegion instance methods that queried mem space info.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

I added some inline remarks, but overall I'm satisfied with the state of this PR and I hope that it can be merged soon.

However, I became a bit uncertain about the plans for follow-up changes, because it seems that there are several locations where the difference between "inherently in <memory space X>" and "originally in UnknownSpace, but we deduced that it's in <memory space X>" is significant. (I'm not entirely certain about this, because e.g. the schematics and applications of "deducing a memory space" are not yet fixed. This might deserve some additional thinking.)

Previously I strongly supported the idea that in the future we should store the MemSpace in a single location, as a state trait -- but if the inherent/acquired difference turns out to be relevant, then perhaps we should "stop after this PR" and stabilize this "hybrid representation" of memory spaces.

@@ -353,7 +352,7 @@ static std::string getRegionName(const SubRegion *Region) {
return "the memory returned by 'alloca'";

if (isa<SymbolicRegion>(Region) &&
isa<HeapSpaceRegion>(Region->getMemorySpace()))
isa<HeapSpaceRegion>(Region->getRawMemorySpace()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use getMemorySpace(State) if it's not too cumbersome to pass a state to this function. (However, this is not an important change, just marginal prettification of a report message.)

@@ -1544,7 +1544,7 @@ SVal RegionStoreManager::getBinding(RegionBindingsConstRef B, Loc L, QualType T)
// The location does not have a bound value. This means that it has
// the value it had upon its creation and/or entry to the analyzed
// function/method. These are either symbolic values or 'undefined'.
if (R->hasStackNonParametersStorage()) {
if (R->hasRawStackNonParametersStorage()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: This is another example (in addition to e.g. the ArrayBoundV2 lower bound check) where it seems to be important to distinguish the "raw" (inherent) memory space and the "acquired" memory space: this logic applies a default to "real" local stack variables, that should not apply to initially unknown regions where we later conclude that they're on the stack (but perhaps don't bind anything as their content).

No action expected in this commit, but this could be another reason for keeping the separate "inherent" memory space field in the future.

Comment on lines 951 to 952
const MemSpaceRegion *LeftMS = LeftBase->getRawMemorySpace();
const MemSpaceRegion *RightMS = RightBase->getRawMemorySpace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use the non-raw memory space here, if possible.

@steakhal
Copy link
Contributor

steakhal commented Feb 2, 2025

I've pushed a couple of commits to sync this PR with what I had in mind.
I hope you don't mind @Flandini.

Please anyone do a thorough review.


I've considered uplifiting a couple of the VarRegion::getStackFrame(), and similar getStackFrame() functions, because internally they would use the memory space to resolve to the stack frame. I still consider it, but it would need a similar migration: introduce a new overload that takes the State, and try to eradicate the overload that doesn't take any State. Then keep the cases that cannot be uplifted. I found this tedious, and not strictly necessary as a first step to merge this PR. It would also "pollute" the API surface with the leftover APIs, which isn't ideal.

@steakhal steakhal requested a review from NagyDonat February 2, 2025 19:33
@steakhal
Copy link
Contributor

steakhal commented Feb 3, 2025

I've checked just now that once this lands, we could add a checker callback like this to start modeling the memoryspaces transfers to unknown spaced regions:

void checkPreStmt(const BinaryOperator *BO, CheckerContext &C) const {
  if (CanQualType Ty = BO->getType()->getCanonicalTypeUnqualified();
      Ty != C.getSValBuilder().getConditionType())
    return;
  if (auto Op = BO->getOpcode(); Op != BO_EQ && Op != BO_NE)
    return;

  const Expr *LExp = BO->getLHS();
  const Expr *RExp = BO->getRHS();

  if (!LExp->getType()->isAnyPointerType() ||
      !RExp->getType()->isAnyPointerType())
    return;

  const auto *L = C.getSVal(LExp).getAsRegion();
  const auto *R = C.getSVal(RExp).getAsRegion();
  if (!L || !R)
    return;

  auto State = C.getState();
  const auto *LSpace = L->getMemorySpace(State);
  const auto *RSpace = R->getMemorySpace(State);
  if (isa<UnknownSpaceRegion>(LSpace) == isa<UnknownSpaceRegion>(RSpace))
    return;

  C.addTransition(isa<UnknownSpaceRegion>(LSpace)
                      ? L->setMemorySpace(State, RSpace)
                      : R->setMemorySpace(State, LSpace));
}

I can confirm that it would work (kindof). There are problems with it of course, as it should be part of a post callback. But the ExprEngineC.cpp:VisitBinaryOperator would sink the aliasing case because it would call the SVB.evalBinOp that would conclude that these pointers can't alias. So we should probably patch that code instead of add the trait before we would assume aliasing in SVB.

@steakhal
Copy link
Contributor

steakhal commented Feb 16, 2025

Ping. I wonder if there is anything missing for this to be merged.
@NagyDonat

Ah, yea, sure a rebase xD

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine going ahead with this.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the improvements :)

In the follow-up commits let's pay attention to the ArrayBound lower bound check where the code says that "A symbolic region in unknown space represents an unknown pointer that may point into the middle of an array, so we don't look for underflows." and assumes that e.g. a symbolic region in a known space necessarily points to the beginning of a memory area (i.e. negative indices are erroneous).

In general, if we see an allocation, we associate the immutable memory
space with the constructed memory region.
This works fine if we see the allocation.
However, with symbolic regions it's not great because there we don't
know anything about their memory spaces, thus put them into the Unknown
space.

The unfortunate consequence is that once we learn about some aliasing
with this Symbolic Region, we can't change the memory space to the
deduced one.

In this patch, we open up the memory spaces as a trait, basically
allowing associating a better memory space with a memregion that
was created with the Unknown memory space.

As a side effect, this means that now queriing the memory space of a
region depends on the State, but many places in the analyzer, such as
the Store, doesn't have  (and cannot have) access to the State by design.

This means that some uses must solely rely on the memspaces of the
region, but any other users should use the getter taking a State.
@steakhal steakhal changed the title [nfc][analyzer] Add MemSpace trait to program state [analyzer] Allow overriding Unknown memspaces using a ProgramState trait Feb 22, 2025
@steakhal steakhal force-pushed the flandini/unk-only-memspace-trait branch from 0c900fa to b59b7c7 Compare February 22, 2025 10:52
@steakhal steakhal merged commit cbd3801 into llvm:main Feb 22, 2025
11 checks passed
@vitalybuka
Copy link
Collaborator

@steakhal
Copy link
Contributor

vitalybuka added a commit that referenced this pull request Feb 22, 2025
In #123003 make_first_range was applied to temporarily.
@vitalybuka
Copy link
Collaborator

Should be fixed with #128372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants