Skip to content

[clang][analyzer] Improved PointerSubChecker #93676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 20 additions & 14 deletions clang/lib/StaticAnalyzer/Checkers/PointerSubChecker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class PointerSubChecker
void PointerSubChecker::checkPreStmt(const BinaryOperator *B,
CheckerContext &C) const {
// When doing pointer subtraction, if the two pointers do not point to the
// same memory chunk, emit a warning.
// same array, emit a warning.
if (B->getOpcode() != BO_Sub)
return;

Expand All @@ -44,24 +44,30 @@ void PointerSubChecker::checkPreStmt(const BinaryOperator *B,

const MemRegion *LR = LV.getAsRegion();
const MemRegion *RR = RV.getAsRegion();

if (!(LR && RR))
return;

const MemRegion *BaseLR = LR->getBaseRegion();
const MemRegion *BaseRR = RR->getBaseRegion();

if (BaseLR == BaseRR)
if (!LR || !RR)
return;

// Allow arithmetic on different symbolic regions.
if (isa<SymbolicRegion>(BaseLR) || isa<SymbolicRegion>(BaseRR))
return;
const auto *ElemLR = dyn_cast<ElementRegion>(LR);
const auto *ElemRR = dyn_cast<ElementRegion>(RR);
// FIXME: We want to verify that these are elements of an array.
// Because behavior of ElementRegion it may be confused with a cast.
// There is not a simple way to distinguish it from array element (check the
// types?). Because this missing check a warning is missing in the rare case
// when two casted pointers to the same region (variable) are subtracted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two pointers point to the same variable, it's valid to subtract them.

By the way, it's also valid to declare a long long variable, point a char * pointer at it (char * may point anywhere) and use it to iterate over the memory region of the long long as if it was a char[8]. (By the way in this case you have ElementRegions and your code behaves correctly.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is not clear what rules to apply.

int a[3][3];
int *b = a;
int d = &(a[2][2]) - &(b[3]);
long long l;
char *a = &l;
short *b = &l;
int d = &a[3] - (char *)&l;
d = (char *)&b[1] - &a[1];
int a[3][3];
int d = &a[2][2] - &a[1][1];

I only found that the operands must point into the same "array object" but not what exactly an "array object" is. Are the sub-arrays of a multidimensional array separate "array objects"? If an address (of a variable) is converted to a char * is this the same array object as the original variable (for example l)? Invalid indexing like int a[3][3]; int x = a[0][4]; is undefined behavior, but why should then be allowed to use pointers to such an object and index it like an one-dimensional array, or convert an array pointer into an array with different type and index it?

We should allow anything that points into the same memory block (that is a variable or any array), or only allow pointers to the same array with same type and same size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed your examples and questions with @whisperity and Zoltán Porkoláb, and I'm trying to summarize what we determined. Note that I'm mostly working from the most recent C++ draft standard (http://eel.is/c++draft/), so some the conclusions may be invalid in C or older versions of C++ (but I tried to highlight the differences that I know about).

(1) In your first example the step int *b = a is invalid because the int[3][3] array a decays to an int (*)[3] (pointer to array of 3 ints) and that type is not interconvertible with a plain int * (see [basic.compound] note 5 in the C++ draft standard). (And if you define b as int (*b)[3] = a, then the pointer subtraction will become invalid.)

(2) In the second example using a long * to initialize char * or short * variables is usually an error, it's accepted under old C, but in C++ you need an explicit cast and e.g. modern GCC also produces an error (-Wincompatible-pointer-types a warning that's by default an error) when it compiles this.

I'd guess that the actual pointer arithmetic is standard-compliant, but the relevant parts of the standard ([expr.reinterpret.cast] 7, [expr.static.cast] 14)are too vague to give a confident answer.

Note that under C++ accessing the element short b[1] would be a violation of [basic.lval] part 11, but based on [defns.access] I'd say that the expression &b[1] is not an access of b[1].

(3) In your third example &a[2][2] - &a[1][1] is clearly working in practically all implementations, but we're fairly sure that it's not compliant with the C++ standard, because [expr.add] part 5.2 speaks about "array elements i and j of the same array object x" (and later "i - j" which shows that "i" and "j" are scalar numbers) -- while in your example a[2][2] and a[1][1] are not (numerically indexed) elements within the same array. (This is coherent with [dcl.array] where an "array" implicitly means a one-dimensional array structure (whose elements may be other arrays).)

Nevertheless, it may be reasonable to avoid emitting warnings on this kind of code, because that could be better for the users.

(4) The definition of "what exactly an "array object" is" and "what is an element of an array" primarily appears at [dcl.array] part 6:

An object of type “array of N U” consists of a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.

(This clearly shows that the element of an element of an array is not an element of the array. The notion of "multi-dimensional arrays" only appears in the Example [dcl.array] Example 4 with word choices that suggests that this is just a mathematical/intuitive notion and not something that's well-defined in the standard.)

This general definition is augmented by [basic.compound] part 3, sentence 11 which states that:

For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.

(5) Yes, the sub-arrays of a multidimensional array are separate "array objects" that have their own elements -- but they are also single elements within the same bigger array. This means that if we have int arr[3][3], then

  • int x = &(arr[2]) - &(arr[1]) is valid as the difference of two int (*)[3] pointers that point to elements of the same array arr
  • int y = &(arr[2][1]) - &(arr[2][2]) is valid as the difference of two int * pointers that point to elements of the same array arr[2]
  • int z = &(arr[1][1]) - &(arr[2][2]) is invalid, because one pointer points to an element of arr[1], while the other points to element of arr[2]
  • int w = arr[2] - arr[1] is also invalid, because these arrays decay to int * pointer and this difference is equivalent to &(arr[2][0]) - &(arr[1][0])

(6) "If an address (of a variable) is converted to a char * is this the same array object as the original variable (for example l)?" -- My intuition is that here we're dealing with an imagined char[] array object that "covers" the same memory region as the original variable, and any pointer trickery that start from the original object and produce char * pointers pointing into the original object essentially produces pointers to elements of this imagined char[] array. (We might say that this imagined char[] array is the object representation of the array, although that's probably not exactly accurate.)

(7)

Invalid indexing like int a[3][3]; int x = a[0][4]; is undefined behavior but why should then be allowed to use pointers to such an object and index it like an one-dimensional array, [...]

I'm not exactly sure what you're speaking about here. I completely agree that int a[3][3]; int x = a[0][4]; is undefined behavior, but there may be multiple ways to convert the multidimensional array into a single-dimensional one and some of them might be legitimate.

(8)

or convert an array pointer into an array with different type and index it

This seems to be forbidden by [expr.add] part 6. Note that the definition of the subscript operator is based on pointer addition.

(9)

We should allow anything that points into the same memory block (that is a variable or any array), or only allow pointers to the same array with same type and same size.
I don't understand this sentence; but note that [expr.add] part 2.2 demands that in pointer subtraction the two operands must be pointers to (cv-qualified or cv-unqualified versions of) the same completely-defined object type.

if (ElemLR && ElemRR) {
const MemRegion *SuperLR = ElemLR->getSuperRegion();
const MemRegion *SuperRR = ElemRR->getSuperRegion();
if (SuperLR == SuperRR)
return;
// Allow arithmetic on different symbolic regions.
if (isa<SymbolicRegion>(SuperLR) || isa<SymbolicRegion>(SuperRR))
return;
}

if (ExplodedNode *N = C.generateNonFatalErrorNode()) {
constexpr llvm::StringLiteral Msg =
"Subtraction of two pointers that do not point to the same memory "
"chunk may cause incorrect result.";
"Subtraction of two pointers that do not point into the same array "
"is undefined behavior.";
auto R = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
R->addRange(B->getSourceRange());
C.emitReport(std::move(R));
Expand Down
74 changes: 74 additions & 0 deletions clang/test/Analysis/pointer-sub.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.PointerSub -verify %s

void f1(void) {
int x, y, z[10];
int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
d = z - &y; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
d = &x - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This corner case is explicitly allowed by the standard: non-array variables act as if they were single-element arrays and it's valid to do (trivial) pointer arithmetic on them.

Even calculating the past-the-end pointer of a non-array object (e.g. &x + 1) is a valid (but unusual) operation; however the description on cppreference suggests that it's UB to take the difference (&x+1)-&x. You should probably look this up in the standard itself if you want to be accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good, except for &x - &x, which should be valid.
Wrt. &x - 1, that should be valid as well, given that the resulting pointer is not dereferenced. However, clang's constrant interpreter disagrees with gcc's on that case: https://godbolt.org/z/dMd1rMazY
However, like I said, I believe gcc's right by accepting that code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt. &x - 1, that should be valid as well, given that the resulting pointer is not dereferenced.

The standard explicitly disallows this, see [expr.add] part 4.2 and 4.3 (in the most recent C++ draft standard, other versions may be different).

d = (long*)&x - (long*)&x;
}

void f2(void) {
int a[10], b[10], c;
int *p = &a[2];
int *q = &a[8];
int d = q - p; // no-warning

q = &b[3];
d = q - p; // expected-warning{{Subtraction of two pointers that}}

q = a + 10;
d = q - p; // no warning (use of pointer to one after the end is allowed)
d = &a[4] - a; // no warning

q = a + 11;
d = q - a; // ?

d = &c - p; // expected-warning{{Subtraction of two pointers that}}
}

void f3(void) {
int a[3][4];
int d;

d = &(a[2]) - &(a[1]);
d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}}
d = a[1] - a[1];
d = &(a[1][2]) - &(a[1][0]);
d = &(a[1][2]) - &(a[0][0]); // expected-warning{{Subtraction of two pointers that}}
}

void f4(void) {
int n = 4, m = 3;
int a[n][m];
int (*p)[m] = a; // p == &a[0]
p += 1; // p == &a[1]
int d = p - a; // d == 1 // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}}

d = &(a[2]) - &(a[1]); // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}}
d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}}
}

typedef struct {
int a;
int b;
int c[10];
int d[10];
} S;

void f5(void) {
S s;
int y;
int d;

d = &s.b - &s.a; // expected-warning{{Subtraction of two pointers that}}
d = &s.c[0] - &s.a; // expected-warning{{Subtraction of two pointers that}}
d = &s.b - &y; // expected-warning{{Subtraction of two pointers that}}
d = &s.c[3] - &s.c[2];
d = &s.d[3] - &s.c[2]; // expected-warning{{Subtraction of two pointers that}}
d = s.d - s.c; // expected-warning{{Subtraction of two pointers that}}

S sa[10];
d = &sa[2] - &sa[1];
d = &sa[2].a - &sa[1].b; // expected-warning{{Subtraction of two pointers that}}
}
14 changes: 2 additions & 12 deletions clang/test/Analysis/ptr-arith.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,alpha.core.PointerSub,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple x86_64-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,alpha.core.PointerSub,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple i686-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple x86_64-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s
// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.FixedAddr,alpha.core.PointerArithm,debug.ExprInspection -Wno-pointer-to-int-cast -verify -triple i686-apple-darwin9 -Wno-tautological-pointer-compare -analyzer-config eagerly-assume=false %s

void clang_analyzer_eval(int);
void clang_analyzer_dump(int);
Expand Down Expand Up @@ -35,16 +35,6 @@ domain_port (const char *domain_b, const char *domain_e,
return port;
}

void f3(void) {
int x, y;
int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point to the same memory chunk may cause incorrect result}}

int a[10];
int *p = &a[2];
int *q = &a[8];
d = q-p; // no-warning
}

void f4(void) {
int *p;
p = (int*) 0x10000; // expected-warning{{Using a fixed address is not portable because that address will probably not be valid in all environments or platforms}}
Expand Down