Skip to content

[analyzer] Consolidate array bound checkers #125534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

NagyDonat
Copy link
Contributor

@NagyDonat NagyDonat commented Feb 3, 2025

Before this commit, there were two alpha checkers that used different algorithms/logic for detecting out of bounds memory access: the old alpha.security.ArrayBound and the experimental, more complex alpha.security.ArrayBoundV2.

After lots of quality improvement commits ArrayBoundV2 is now stable enough to be moved out of the alpha stage. As indexing (and dereference) are common operations, it still produces a significant amount of false positives, but not much more than e.g. core.NullDereference or core.UndefinedBinaryOperatorResult, so it should be acceptable as a non-core checker.

At this point alpha.security.ArrayBound became obsolete (there is a better tool for the same task), so I'm removing it from the codebase. With this I can eliminate the ugly "V2" version mark almost everywhere and rename alpha.security.ArrayBoundV2 to security.ArrayBound.

(The version mark is preserved in the filename "ArrayBoundCheckerV2", to ensure a clear git history. I'll rename it to "ArrayBoundChecker.cpp" in a separate commit.)

This commit adapts the unit tests of alpha.security.ArrayBound to testing the new security.ArrayBound (= old ArrayBoundV2). Currently the names of the test files are very haphazard, I'll probably create a separate followup commit that consolidates this.


The effects of enabling this checker (compared to a run where only the other stable checkers are running):

Project New Reports Resolved Reports Notes
memcached 2 new reports 0 resolved reports
tmux 1 new reports 0 resolved reports
curl 5 new reports 2 resolved reports
twin 15 new reports 2 resolved reports
vim 57 new reports 1 resolved reports
openssl 21 new reports 0 resolved reports
sqlite 16 new reports 3 resolved reports
ffmpeg 187 new reports 27 resolved reports 80 of the new reports (all reports in this file) are all duplicates coming from the same macro
postgres 66 new reports 6 resolved reports
tinyxml2 1 new reports 0 resolved reports
libwebm 23 new reports 0 resolved reports
xerces 4 new reports 2 resolved reports
bitcoin 9 new reports 0 resolved reports
protobuf 10 new reports 2 resolved reports
qtbase 69 new reports 5 resolved reports

Before this commit, there were two alpha checkers that used different
algorithms/logic for detecting out of bounds memory access: the old
`alpha.security.ArrayBound` and the experimental, more complex
`alpha.security.ArrayBoundV2`.

After lots of quality improvement commits ArrayBoundV2 is now stable
enough to be moved out of the alpha stage. As indexing (and dereference)
are common operations, it still produces a significant amount of false
positives, but not much more than e.g. `core.NullDereference` or
`core.UndefinedBinaryOperatorResult`, so it should be acceptable as a
non-`core` checker.

At this point `alpha.security.ArrayBound` became obsolete (there is a
better tool for the same task), so I'm removing it from the codebase.
With this I can eliminate the ugly "V2" version mark almost everywhere
and rename `alpha.security.ArrayBoundV2` to `security.ArrayBound`.

(The version mark is preserved in the filename "ArrayBoundCheckerV2",
to ensure a clear git history. I'll rename it to "ArrayBoundChecker.cpp"
in a separate commit.)

This commit adapts the unit tests of `alpha.security.ArrayBound` to
testing the new `security.ArrayBound` (= old ArrayBoundV2). Currently
the names of the test files are very haphazard, I'll probably create a
separate followup commit that consolidates this.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Feb 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 3, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Donát Nagy (NagyDonat)

Changes

Before this commit, there were two alpha checkers that used different algorithms/logic for detecting out of bounds memory access: the old alpha.security.ArrayBound and the experimental, more complex alpha.security.ArrayBoundV2.

After lots of quality improvement commits ArrayBoundV2 is now stable enough to be moved out of the alpha stage. As indexing (and dereference) are common operations, it still produces a significant amount of false positives, but not much more than e.g. core.NullDereference or core.UndefinedBinaryOperatorResult, so it should be acceptable as a non-core checker.

At this point alpha.security.ArrayBound became obsolete (there is a better tool for the same task), so I'm removing it from the codebase. With this I can eliminate the ugly "V2" version mark almost everywhere and rename alpha.security.ArrayBoundV2 to security.ArrayBound.

(The version mark is preserved in the filename "ArrayBoundCheckerV2", to ensure a clear git history. I'll rename it to "ArrayBoundChecker.cpp" in a separate commit.)

This commit adapts the unit tests of alpha.security.ArrayBound to testing the new security.ArrayBound (= old ArrayBoundV2). Currently the names of the test files are very haphazard, I'll probably create a separate followup commit that consolidates this.


The effects of enabling this checker (compared to a run where only the other stable checkers are running):

Project New Reports Resolved Reports
memcached 2 new reports 0 resolved reports
tmux 1 new reports 0 resolved reports
curl 5 new reports 2 resolved reports
twin 15 new reports 2 resolved reports
vim 57 new reports 1 resolved reports
openssl 21 new reports 0 resolved reports
sqlite 16 new reports 3 resolved reports
ffmpeg 187 new reports 27 resolved reports
postgres 66 new reports 6 resolved reports
tinyxml2 1 new reports 0 resolved reports
libwebm 23 new reports 0 resolved reports
xerces 4 new reports 2 resolved reports
bitcoin 9 new reports 0 resolved reports
protobuf 10 new reports 2 resolved reports
qtbase 69 new reports 5 resolved reports

Patch is 38.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125534.diff

26 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+5)
  • (modified) clang/docs/analyzer/checkers.rst (+60-73)
  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+35-32)
  • (removed) clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp (-92)
  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+28-22)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt (-1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp (+4-2)
  • (modified) clang/lib/StaticAnalyzer/Core/MemRegion.cpp (+2-3)
  • (modified) clang/test/Analysis/index-type.c (+2-2)
  • (modified) clang/test/Analysis/misc-ps-region-store.m (+11-6)
  • (modified) clang/test/Analysis/no-outofbounds.c (+1-1)
  • (renamed) clang/test/Analysis/out-of-bounds-constraint-check.c (+1-1)
  • (modified) clang/test/Analysis/out-of-bounds-diagnostics.c (+1-1)
  • (modified) clang/test/Analysis/out-of-bounds-new.cpp (+1-1)
  • (modified) clang/test/Analysis/out-of-bounds-notes.c (+1-1)
  • (modified) clang/test/Analysis/out-of-bounds.c (+1-1)
  • (modified) clang/test/Analysis/outofbound-notwork.c (+4-4)
  • (modified) clang/test/Analysis/outofbound.c (+15-13)
  • (removed) clang/test/Analysis/rdar-6541136-region.c (-27)
  • (modified) clang/test/Analysis/runtime-regression.c (+1-1)
  • (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+1-1)
  • (modified) clang/test/Analysis/taint-generic.c (+2-2)
  • (modified) clang/test/Analysis/taint-generic.cpp (+1-1)
  • (modified) clang/www/analyzer/open_projects.html (-13)
  • (modified) clang/www/analyzer/potential_checkers.html (+1-19)
  • (modified) llvm/utils/gn/secondary/clang/lib/StaticAnalyzer/Checkers/BUILD.gn (-1)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index a220e57d0b3222..a3aad565472eb4 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -231,6 +231,11 @@ Improvements
 Moved checkers
 ^^^^^^^^^^^^^^
 
+- After lots of improvements, the checker ``alpha.security.ArrayBoundV2`` is
+  renamed to ``security.ArrayBound``. As this checker is stable now, the old
+  checker ``alpha.security.ArrayBound`` (which was searching for the same kind
+  of bugs with an different, simpler and less accurate algorithm) is removed.
+
 .. _release-notes-sanitizers:
 
 Sanitizers
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index e093b2d672a74e..707067358fdfe3 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -1332,10 +1332,69 @@ security
 
 Security related checkers.
 
+.. _security-ArrayBound:
+
+security.ArrayBound (C, C++)
+""""""""""""""""""""""""""""
+Report out of bounds access to memory that is before the start or after the end
+of the accessed region (array, heap-allocated region, string literal etc.).
+This usually means incorrect indexing, but the checker also detects access via
+the operators ``*`` and ``->``.
+
+.. code-block:: c
+
+ void test_underflow(int x) {
+   int buf[100][100];
+   if (x < 0)
+     buf[0][x] = 1; // warn
+ }
+
+ void test_overflow() {
+   int buf[100];
+   int *p = buf + 100;
+   *p = 1; // warn
+ }
+
+If checkers like :ref:`unix-Malloc` or :ref:`cplusplus-NewDelete` are enabled
+to model the behavior of ``malloc()``, ``operator new`` and similar
+allocators), then this checker can also reports out of bounds access to
+dynamically allocated memory:
+
+.. code-block:: cpp
+
+ int *test_dynamic() {
+   int *mem = new int[100];
+   mem[-1] = 42; // warn
+   return mem;
+ }
+
+In uncertain situations (when the checker can neither prove nor disprove that
+overflow occurs), the checker assumes that the the index (more precisely, the
+memory offeset) is within bounds.
+
+However, if :ref:`optin-taint-GenericTaint` is enabled and the index/offset is
+tainted (i.e. it is influenced by an untrusted souce), then this checker
+reports the potential out of bounds access:
+
+.. code-block:: c
+
+ void test_with_tainted_index() {
+   char s[] = "abc";
+   int x = getchar();
+   char c = s[x]; // warn: potential out of bounds access with tainted index
+ }
+
+.. note::
+
+  This checker is an improved and renamed version of the checker that was
+  previously known as ``alpha.security.ArrayBoundV2``. The old checker
+  ``alpha.security.ArrayBound`` was removed when the (previously
+  "experimental") V2 variant became stable enough for regular use.
+
 .. _security-cert-env-InvalidPtr:
 
 security.cert.env.InvalidPtr
-""""""""""""""""""""""""""""""""""
+""""""""""""""""""""""""""""
 
 Corresponds to SEI CERT Rules `ENV31-C <https://wiki.sei.cmu.edu/confluence/display/c/ENV31-C.+Do+not+rely+on+an+environment+pointer+following+an+operation+that+may+invalidate+it>`_ and `ENV34-C <https://wiki.sei.cmu.edu/confluence/display/c/ENV34-C.+Do+not+store+pointers+returned+by+certain+functions>`_.
 
@@ -3216,78 +3275,6 @@ Warns against using one vs. many plural pattern in code when generating localize
 alpha.security
 ^^^^^^^^^^^^^^
 
-.. _alpha-security-ArrayBound:
-
-alpha.security.ArrayBound (C)
-"""""""""""""""""""""""""""""
-Warn about buffer overflows (older checker).
-
-.. code-block:: c
-
- void test() {
-   char *s = "";
-   char c = s[1]; // warn
- }
-
- struct seven_words {
-   int c[7];
- };
-
- void test() {
-   struct seven_words a, *p;
-   p = &a;
-   p[0] = a;
-   p[1] = a;
-   p[2] = a; // warn
- }
-
- // note: requires unix.Malloc or
- // alpha.unix.MallocWithAnnotations checks enabled.
- void test() {
-   int *p = malloc(12);
-   p[3] = 4; // warn
- }
-
- void test() {
-   char a[2];
-   int *b = (int*)a;
-   b[1] = 3; // warn
- }
-
-.. _alpha-security-ArrayBoundV2:
-
-alpha.security.ArrayBoundV2 (C)
-"""""""""""""""""""""""""""""""
-Warn about buffer overflows (newer checker).
-
-.. code-block:: c
-
- void test() {
-   char *s = "";
-   char c = s[1]; // warn
- }
-
- void test() {
-   int buf[100];
-   int *p = buf;
-   p = p + 99;
-   p[1] = 1; // warn
- }
-
- // note: compiler has internal check for this.
- // Use -Wno-array-bounds to suppress compiler warning.
- void test() {
-   int buf[100][100];
-   buf[0][-1] = 1; // warn
- }
-
- // note: requires optin.taint check turned on.
- void test() {
-   char s[] = "abc";
-   int x = getchar();
-   char c = s[x]; // warn: index is tainted
- }
-
 .. _alpha-security-ReturnPtrRange:
 
 alpha.security.ReturnPtrRange (C)
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 1361da46c3c81d..9bf491eac1e0eb 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -989,30 +989,41 @@ def decodeValueOfObjCType : Checker<"decodeValueOfObjCType">,
 
 let ParentPackage = Security in {
 
-def FloatLoopCounter : Checker<"FloatLoopCounter">,
-  HelpText<"Warn on using a floating point value as a loop counter (CERT: "
-           "FLP30-C, FLP30-CPP)">,
-  Dependencies<[SecuritySyntaxChecker]>,
-  Documentation<HasDocumentation>;
-
-def MmapWriteExecChecker : Checker<"MmapWriteExec">,
-  HelpText<"Warn on mmap() calls with both writable and executable access">,
-  Documentation<HasDocumentation>;
-
-def PointerSubChecker : Checker<"PointerSub">,
-  HelpText<"Check for pointer subtractions on two pointers pointing to "
-           "different memory chunks">,
-  Documentation<HasDocumentation>;
-
-def PutenvStackArray : Checker<"PutenvStackArray">,
-  HelpText<"Finds calls to the function 'putenv' which pass a pointer to "
-           "an automatic (stack-allocated) array as the argument.">,
-  Documentation<HasDocumentation>;
-
-def SetgidSetuidOrderChecker : Checker<"SetgidSetuidOrder">,
-  HelpText<"Warn on possible reversed order of 'setgid(getgid()))' and "
-           "'setuid(getuid())' (CERT: POS36-C)">,
-  Documentation<HasDocumentation>;
+  def ArrayBoundChecker : Checker<"ArrayBound">,
+                          HelpText<"Warn about out of bounds access to memory">,
+                          Documentation<HasDocumentation>;
+
+  def FloatLoopCounter
+      : Checker<"FloatLoopCounter">,
+        HelpText<
+            "Warn on using a floating point value as a loop counter (CERT: "
+            "FLP30-C, FLP30-CPP)">,
+        Dependencies<[SecuritySyntaxChecker]>,
+        Documentation<HasDocumentation>;
+
+  def MmapWriteExecChecker
+      : Checker<"MmapWriteExec">,
+        HelpText<
+            "Warn on mmap() calls with both writable and executable access">,
+        Documentation<HasDocumentation>;
+
+  def PointerSubChecker
+      : Checker<"PointerSub">,
+        HelpText<"Check for pointer subtractions on two pointers pointing to "
+                 "different memory chunks">,
+        Documentation<HasDocumentation>;
+
+  def PutenvStackArray
+      : Checker<"PutenvStackArray">,
+        HelpText<"Finds calls to the function 'putenv' which pass a pointer to "
+                 "an automatic (stack-allocated) array as the argument.">,
+        Documentation<HasDocumentation>;
+
+  def SetgidSetuidOrderChecker
+      : Checker<"SetgidSetuidOrder">,
+        HelpText<"Warn on possible reversed order of 'setgid(getgid()))' and "
+                 "'setuid(getuid())' (CERT: POS36-C)">,
+        Documentation<HasDocumentation>;
 
 } // end "security"
 
@@ -1035,14 +1046,6 @@ let ParentPackage = ENV in {
 
 let ParentPackage = SecurityAlpha in {
 
-def ArrayBoundChecker : Checker<"ArrayBound">,
-  HelpText<"Warn about buffer overflows (older checker)">,
-  Documentation<HasDocumentation>;
-
-def ArrayBoundCheckerV2 : Checker<"ArrayBoundV2">,
-  HelpText<"Warn about buffer overflows (newer checker)">,
-  Documentation<HasDocumentation>;
-
 def ReturnPointerRangeChecker : Checker<"ReturnPtrRange">,
   HelpText<"Check for an out-of-bound pointer being returned to callers">,
   Documentation<HasDocumentation>;
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
deleted file mode 100644
index c990ad138f8905..00000000000000
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
+++ /dev/null
@@ -1,92 +0,0 @@
-//== ArrayBoundChecker.cpp ------------------------------*- C++ -*--==//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file defines ArrayBoundChecker, which is a path-sensitive check
-// which looks for an out-of-bound array element access.
-//
-//===----------------------------------------------------------------------===//
-
-#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
-#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
-#include "clang/StaticAnalyzer/Core/Checker.h"
-#include "clang/StaticAnalyzer/Core/CheckerManager.h"
-#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
-#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
-#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
-
-using namespace clang;
-using namespace ento;
-
-namespace {
-class ArrayBoundChecker :
-    public Checker<check::Location> {
-  const BugType BT{this, "Out-of-bound array access"};
-
-public:
-  void checkLocation(SVal l, bool isLoad, const Stmt* S,
-                     CheckerContext &C) const;
-};
-}
-
-void ArrayBoundChecker::checkLocation(SVal l, bool isLoad, const Stmt* LoadS,
-                                      CheckerContext &C) const {
-  // Check for out of bound array element access.
-  const MemRegion *R = l.getAsRegion();
-  if (!R)
-    return;
-
-  const ElementRegion *ER = dyn_cast<ElementRegion>(R);
-  if (!ER)
-    return;
-
-  // Get the index of the accessed element.
-  DefinedOrUnknownSVal Idx = ER->getIndex().castAs<DefinedOrUnknownSVal>();
-
-  // Zero index is always in bound, this also passes ElementRegions created for
-  // pointer casts.
-  if (Idx.isZeroConstant())
-    return;
-
-  ProgramStateRef state = C.getState();
-
-  // Get the size of the array.
-  DefinedOrUnknownSVal ElementCount = getDynamicElementCount(
-      state, ER->getSuperRegion(), C.getSValBuilder(), ER->getValueType());
-
-  ProgramStateRef StInBound, StOutBound;
-  std::tie(StInBound, StOutBound) = state->assumeInBoundDual(Idx, ElementCount);
-  if (StOutBound && !StInBound) {
-    ExplodedNode *N = C.generateErrorNode(StOutBound);
-    if (!N)
-      return;
-
-    // FIXME: It would be nice to eventually make this diagnostic more clear,
-    // e.g., by referencing the original declaration or by saying *why* this
-    // reference is outside the range.
-
-    // Generate a report for this bug.
-    auto report = std::make_unique<PathSensitiveBugReport>(
-        BT, "Access out-of-bound array element (buffer overflow)", N);
-
-    report->addRange(LoadS->getSourceRange());
-    C.emitReport(std::move(report));
-    return;
-  }
-
-  // Array bound check succeeded.  From this point forward the array bound
-  // should always succeed.
-  C.addTransition(StInBound);
-}
-
-void ento::registerArrayBoundChecker(CheckerManager &mgr) {
-  mgr.registerChecker<ArrayBoundChecker>();
-}
-
-bool ento::shouldRegisterArrayBoundChecker(const CheckerManager &mgr) {
-  return true;
-}
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 6422933c8828a9..6f8d6dbd573f40 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -6,11 +6,17 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// This file defines ArrayBoundCheckerV2, which is a path-sensitive check
-// which looks for an out-of-bound array element access.
+// This file defines security.ArrayBound, which is a path-sensitive checker
+// that looks for out of bounds access of memory regions.
 //
 //===----------------------------------------------------------------------===//
 
+// NOTE: The name of this file ends with "V2" because previously
+// "ArrayBoundChecker.cpp" contained the implementation of another (older and
+// simpler) checker that was called `alpha.security.ArrayBound`.
+// TODO: Rename this file to "ArrayBoundChecker.cpp" when it won't be confused
+// with that older file.
+
 #include "clang/AST/CharUnits.h"
 #include "clang/AST/ParentMapContext.h"
 #include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
@@ -124,9 +130,9 @@ struct Messages {
 // callbacks, we'd need to duplicate the logic that evaluates these
 // expressions.) The `MemberExpr` callback would work as `PreStmt` but it's
 // defined as `PostStmt` for the sake of consistency with the other callbacks.
-class ArrayBoundCheckerV2 : public Checker<check::PostStmt<ArraySubscriptExpr>,
-                                           check::PostStmt<UnaryOperator>,
-                                           check::PostStmt<MemberExpr>> {
+class ArrayBoundChecker : public Checker<check::PostStmt<ArraySubscriptExpr>,
+                                         check::PostStmt<UnaryOperator>,
+                                         check::PostStmt<MemberExpr>> {
   BugType BT{this, "Out-of-bound access"};
   BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
@@ -547,7 +553,7 @@ bool StateUpdateReporter::providesInformationAboutInteresting(
   return false;
 }
 
-void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
+void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
   const SVal Location = C.getSVal(E);
 
   // The header ctype.h (from e.g. glibc) implements the isXXXXX() macros as
@@ -670,9 +676,9 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
   C.addTransition(State, SUR.createNoteTag(C));
 }
 
-void ArrayBoundCheckerV2::markPartsInteresting(PathSensitiveBugReport &BR,
-                                               ProgramStateRef ErrorState,
-                                               NonLoc Val, bool MarkTaint) {
+void ArrayBoundChecker::markPartsInteresting(PathSensitiveBugReport &BR,
+                                             ProgramStateRef ErrorState,
+                                             NonLoc Val, bool MarkTaint) {
   if (SymbolRef Sym = Val.getAsSymbol()) {
     // If the offset is a symbolic value, iterate over its "parts" with
     // `SymExpr::symbols()` and mark each of them as interesting.
@@ -693,10 +699,10 @@ void ArrayBoundCheckerV2::markPartsInteresting(PathSensitiveBugReport &BR,
   }
 }
 
-void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
-                                    ProgramStateRef ErrorState, Messages Msgs,
-                                    NonLoc Offset, std::optional<NonLoc> Extent,
-                                    bool IsTaintBug /*=false*/) const {
+void ArrayBoundChecker::reportOOB(CheckerContext &C, ProgramStateRef ErrorState,
+                                  Messages Msgs, NonLoc Offset,
+                                  std::optional<NonLoc> Extent,
+                                  bool IsTaintBug /*=false*/) const {
 
   ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
   if (!ErrorNode)
@@ -725,7 +731,7 @@ void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
   C.emitReport(std::move(BR));
 }
 
-bool ArrayBoundCheckerV2::isFromCtypeMacro(const Stmt *S, ASTContext &ACtx) {
+bool ArrayBoundChecker::isFromCtypeMacro(const Stmt *S, ASTContext &ACtx) {
   SourceLocation Loc = S->getBeginLoc();
   if (!Loc.isMacroID())
     return false;
@@ -744,7 +750,7 @@ bool ArrayBoundCheckerV2::isFromCtypeMacro(const Stmt *S, ASTContext &ACtx) {
           (MacroName == "isupper") || (MacroName == "isxdigit"));
 }
 
-bool ArrayBoundCheckerV2::isInAddressOf(const Stmt *S, ASTContext &ACtx) {
+bool ArrayBoundChecker::isInAddressOf(const Stmt *S, ASTContext &ACtx) {
   ParentMapContext &ParentCtx = ACtx.getParentMapContext();
   do {
     const DynTypedNodeList Parents = ParentCtx.getParents(*S);
@@ -756,10 +762,10 @@ bool ArrayBoundCheckerV2::isInAddressOf(const Stmt *S, ASTContext &ACtx) {
   return UnaryOp && UnaryOp->getOpcode() == UO_AddrOf;
 }
 
-bool ArrayBoundCheckerV2::isIdiomaticPastTheEndPtr(const Expr *E,
-                                                   ProgramStateRef State,
-                                                   NonLoc Offset, NonLoc Limit,
-                                                   CheckerContext &C) {
+bool ArrayBoundChecker::isIdiomaticPastTheEndPtr(const Expr *E,
+                                                 ProgramStateRef State,
+                                                 NonLoc Offset, NonLoc Limit,
+                                                 CheckerContext &C) {
   if (isa<ArraySubscriptExpr>(E) && isInAddressOf(E, C.getASTContext())) {
     auto [EqualsToThreshold, NotEqualToThreshold] = compareValueToThreshold(
         State, Offset, Limit, C.getSValBuilder(), /*CheckEquality=*/true);
@@ -768,10 +774,10 @@ bool ArrayBoundCheckerV2::isIdiomaticPastTheEndPtr(const Expr *E,
   return false;
 }
 
-void ento::registerArrayBoundCheckerV2(CheckerManager &mgr) {
-  mgr.registerChecker<ArrayBoundCheckerV2>();
+void ento::registerArrayBoundChecker(CheckerManager &mgr) {
+  mgr.registerChecker<ArrayBoundChecker>();
 }
 
-bool ento::shouldRegisterArrayBoundCheckerV2(const CheckerManager &mgr) {
+bool ento::shouldRegisterArrayBoundChecker(const CheckerManager &mgr) {
   return true;
 }
diff --git a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
index fcbe8b864b6e41..ccff5d0ac3b964 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
@@ -7,7 +7,6 @@ set(LLVM_LINK_COMPONENTS
 add_clang_library(clangStaticAnalyzerCheckers
   AnalysisOrderChecker.cpp
   AnalyzerStatsChecker.cpp
-  ArrayBoundChecker.cpp
   ArrayBoundCheckerV2.cpp
   BasicObjCFoundationChecks.cpp
   BitwiseShiftChecker.cpp
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 1a14f38e34f0e1..39dcaf02dbe258 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -547,8 +547,10 @@ ProgramStateRef CStringChecker::checkInit(CheckerContext &C,
   }
   return State;
 }
-
-// FIXME: This was originally copied from ArrayBoundChecker.cpp. Refactor?
+// FIXME: The root of this logic was copied from the old checker
+// alpha.security.ArrayBound (which is removed within this commit).
+// It should be refactored to use the different, more sophisticated bounds
+// checking logic used by the new checker ``security.ArrayBound``.
 ProgramStateRef CStringChecker::CheckLocation(CheckerContext &C,
                                               ProgramStateRef state,
                                               AnyArgExpr Buffer, SVal Element,
diff --git a/clang/lib/StaticAnalyzer/Core/MemRegion.cpp b/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
index 2c5cd2cf7630f6..c81bb632b83e94 100644
--- a/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
+++ b/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
@@ -891,9 +891,8 @@ DefinedOrUnknownSVal MemRegionManager::getStaticSize(const MemRegion *MR,
 
     return Size;
   }
-    // FIXME: The following are being used in 'SimpleSValBuilder' and in
-    // 'ArrayBoundChecker::checkLocation' because there is no symbol to
-    // represent the regions more appropriately.
+    // FIXME: The following are being used in 'SimpleSValBuilder' because there
+    // is no symbol to represent the regions more appropriately.
   case MemRegion::BlockDataRegionKind:
   case MemRegion::BlockCodeRegionKind:
   case MemRegion::FunctionCodeRegionKind:
diff --git a/clang/test/Analysis/index-type.c b/clang/test/Analysis/index-type.c
index 997d45c1e5abad..818806c4aff3b5 100644
--- a/clang/test/Analysis/index...
[truncated]

@NagyDonat NagyDonat requested a review from gamesh411 February 4, 2025 09:48
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG! Do you have an intuition where the rest of the false positives are coming from? Are there infeasible paths? Insufficient solvers? Would cross-checking with Z3 reduce the FPs emitted from this check?

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Feb 4, 2025

I evaluated a sample of 10 random ArrayBoundV2 results from FFMPEG and I found the following:

  • (1) Unjustified assumption of third iteration because my recent "don't assume third iteration" change doesn't handle the case when there is a short-circuiting operator in the loop condition.
    • IIRC this covers several (5-10??) false positives in this particular project, but should be significantly less common elsewhere (FFMPEG has an unusually high concentration of 2-element arrays).
    • Eliminating these would be a technically complex task, but has no fundamental obstacles. However they're rare enough that I'm not sure if they're worth the effort.
  • (2) Non-realistic overflow where the analyzer assumes that the code was able to read $\approx 2^{31}$ arguments from the input (which can't happen, but the analyzer cannot deduce this precondition of the analyzed function).
  • (3) Correlated if conditions: assuming that the av_clip call in line 1103 returns -s->ps.sps->qp_bd_offset and then assuming that this result is > 51 produces an array underflow. I suspect that this can't happen (e.g. I'd guess that s->ps.sps->qp_bd_offset is always nonnegative), but I don't think that Z3 crosscheck would be helpful.
  • (4) A classical "assuming skipped loop" error which would be silenced by the new assume-one-iteration option or a well-placed assertion. (Enabling that option would silence 34 of the 187 ArrayBoundV2 errors on this project.)
  • (5) Technically justified, but unwanted assumption of a third iteration: first there is an if where the analyzer can assume ctx->texture_count != 2 so later when the analyzer assumes that ctx->texture_count >= 2 (i.e. it may enter the second iteration of the loop), it gets "enough fuel" to also enter the third iteration (which is an overflow error). This is a correlated assumption error (similar to the "correlated if" issue), so solving it is blocked by fundamental theoretical obstacles.
  • (6) Yet another "assuming skipped loop" issue, very similar (but somewhat more complex) than the previous one.
  • (7) Technically a true positive where the analyzer assumes that the second element of static const char *opt_name_reinit_filters[] = {"reinit_filter", NULL}; may be non-NULL when the analyzed function is executed. (For a programmer it's obvious that this is a constant, but the analyzer cannot deduce this. The programmer could resolve this by adding an extra const qualifier.)
  • (8-10) The remaining three issues in my random sample are all duplicates of (7) -- they appear as separate reports because the error is within a macro that's expanded in several locations and apparently produces 80 (!) ArrayBoundV2 reports (which correspond to other analogous arrays instead of opt_name_reinit_filters).

These reports don't look like solver issues, I don't think that Z3 refutation or similar improvements would help with any of them. ( 🤔 In fact, perhaps Z3 refutation was already enabled -- I don't know what's the default setting in our CI.)

In my opinion the only actionable conclusion is that we should make report deduplication more aggressive around macros -- e.g. I could imagine a heuristic that uses the location within the macro (instead of the location of the particular macro expansion) for deduplication if the complexity/length of the macro is above a certain threshold. (Otherwise unrelated reports "coming from" a small macro like MAX or something similar shouldn't be conflated, but reports coming from the same part of a "big" macro should be unified.) This would consolidate almost half of the reports into a single report (which is arguably a true positive, so shouldn't be silenced). However this improvement is independent of ArrayBoundV2 and IMO shouldn't block this PR.

After that, assume-one-iteration can eliminate 34 additional reports which are mostly unwanted -- however that change would also cause the loss of true positives (on projects that are less stable and have true positives...), so it's likely that we don't want to enable it.

The remaining 74 reports are probably false positives that are caused by incompatible assumptions ("correlated if" issues), invisible preconditions (the programmer knows something about some argument/input of the toplevel analyzed function, but the analyzer cannot deduce this) or other random issues.

@Xazax-hun
Copy link
Collaborator

my recent "#119388" change doesn't handle the case when there is a short-circuiting operator in the loop condition

Is this something you plan to address or is it an inherent limitation?

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Feb 5, 2025

my recent "#119388" change doesn't handle the case when there is a short-circuiting operator in the loop condition

Is this something you plan to address or is it an inherent limitation?

This is not an inherent limitation, solving it would be definitely possible, but it would require a significant amount (perhaps 100 lines?) of very complicated code and I'm not sure that it's worth the effort.

Loops over small arrays are very rare in projects other than FFMPEG (1-2 "assuming third iteration" FPs / project) and even in FFMPEG where there were 100+ "assuming third iteration" FPs, there were only a few where the loop condition contained a short-circuiting operator which meant that the FP "survived" my change.

Personally I was planning to leave this issue "in the backlog" indefinitely, or until it surfaces in a real-world bug report; but if you feel that I should work on this, then I can try to squeeze it onto my TODO list (or others can also try to implement it).

@gamesh411
Copy link
Contributor

LGTM.
I assume you intentionally did not rename the ArrayBoundCheckerV2.cpp to ArrayBoundChecker.cpp to avoid the generating a diff that is much harder to understand than just the modification and the deletion of the old one. So consistency will be restored in trivial follow-up commit (which I think is the way to go as well).

@NagyDonat
Copy link
Contributor Author

I assume you intentionally did not rename the ArrayBoundCheckerV2.cpp to ArrayBoundChecker.cpp to avoid the generating a diff that is much harder to understand than just the modification and the deletion of the old one. So consistency will be restored in trivial follow-up commit (which I think is the way to go as well).

Yes, and I mentioned this in the commit message.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NagyDonat NagyDonat merged commit 6e17ed9 into llvm:main Feb 6, 2025
12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 7, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building clang,llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2243

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/38/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-20460-38-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=38 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


@NagyDonat
Copy link
Contributor Author

The unittest failure seems to be random nonsense, apparently the Python package psutil is not available on that windows machine:

llvm-lit.py: C:\a\lld-x86_64-win\build\utils\lit\tests\lit.cfg:111: warning: Setting a timeout per test not supported. Requires the Python psutil module but it could not be found. Try installing it via pip or via your operating system's package manager.

My commit doesn't change any lit test configuration and does not touch any test that has timeouts.

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Before this commit, there were two alpha checkers that used different
algorithms/logic for detecting out of bounds memory access: the old
`alpha.security.ArrayBound` and the experimental, more complex
`alpha.security.ArrayBoundV2`.

After lots of quality improvement commits ArrayBoundV2 is now stable
enough to be moved out of the alpha stage. As indexing (and dereference)
are common operations, it still produces a significant amount of false
positives, but not much more than e.g. `core.NullDereference` or
`core.UndefinedBinaryOperatorResult`, so it should be acceptable as a
non-`core` checker.

At this point `alpha.security.ArrayBound` became obsolete (there is a
better tool for the same task), so I'm removing it from the codebase.
With this I can eliminate the ugly "V2" version mark almost everywhere
and rename `alpha.security.ArrayBoundV2` to `security.ArrayBound`.

(The version mark is preserved in the filename "ArrayBoundCheckerV2", to
ensure a clear git history. I'll rename it to "ArrayBoundChecker.cpp" in
a separate commit.)

This commit adapts the unit tests of `alpha.security.ArrayBound` to
testing the new `security.ArrayBound` (= old ArrayBoundV2). Currently
the names of the test files are very haphazard, I'll probably create a
separate followup commit that consolidates this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants