-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[clang][analyzer] Improved PointerSubChecker #93676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
a896030
7c2e560
d981a59
45c8c41
987d2b0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
// RUN: %clang_analyze_cc1 -analyzer-checker=alpha.core.PointerSub -verify %s | ||
|
||
void f1(void) { | ||
int x, y, z[10]; | ||
int d = &y - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}} | ||
d = z - &y; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}} | ||
d = &x - &x; // expected-warning{{Subtraction of two pointers that do not point into the same array is undefined behavior}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This corner case is explicitly allowed by the standard: non-array variables act as if they were single-element arrays and it's valid to do (trivial) pointer arithmetic on them. Even calculating the past-the-end pointer of a non-array object (e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tests look good, except for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The standard explicitly disallows this, see [expr.add] part 4.2 and 4.3 (in the most recent C++ draft standard, other versions may be different). |
||
d = (long*)&x - (long*)&x; | ||
} | ||
|
||
void f2(void) { | ||
int a[10], b[10], c; | ||
int *p = &a[2]; | ||
int *q = &a[8]; | ||
int d = q - p; // no-warning | ||
|
||
q = &b[3]; | ||
d = q - p; // expected-warning{{Subtraction of two pointers that}} | ||
|
||
q = a + 10; | ||
d = q - p; // no warning (use of pointer to one after the end is allowed) | ||
d = &a[4] - a; // no warning | ||
|
||
q = a + 11; | ||
d = q - a; // ? | ||
|
||
d = &c - p; // expected-warning{{Subtraction of two pointers that}} | ||
} | ||
|
||
void f3(void) { | ||
int a[3][4]; | ||
int d; | ||
|
||
d = &(a[2]) - &(a[1]); | ||
d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}} | ||
d = a[1] - a[1]; | ||
d = &(a[1][2]) - &(a[1][0]); | ||
d = &(a[1][2]) - &(a[0][0]); // expected-warning{{Subtraction of two pointers that}} | ||
} | ||
|
||
void f4(void) { | ||
int n = 4, m = 3; | ||
int a[n][m]; | ||
int (*p)[m] = a; // p == &a[0] | ||
p += 1; // p == &a[1] | ||
int d = p - a; // d == 1 // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}} | ||
|
||
d = &(a[2]) - &(a[1]); // expected-warning{{subtraction of pointers to type 'int[m]' of zero size has undefined behavior}} | ||
d = a[2] - a[1]; // expected-warning{{Subtraction of two pointers that}} | ||
steakhal marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
typedef struct { | ||
int a; | ||
int b; | ||
int c[10]; | ||
int d[10]; | ||
} S; | ||
|
||
void f5(void) { | ||
S s; | ||
int y; | ||
int d; | ||
|
||
d = &s.b - &s.a; // expected-warning{{Subtraction of two pointers that}} | ||
d = &s.c[0] - &s.a; // expected-warning{{Subtraction of two pointers that}} | ||
d = &s.b - &y; // expected-warning{{Subtraction of two pointers that}} | ||
d = &s.c[3] - &s.c[2]; | ||
d = &s.d[3] - &s.c[2]; // expected-warning{{Subtraction of two pointers that}} | ||
d = s.d - s.c; // expected-warning{{Subtraction of two pointers that}} | ||
|
||
S sa[10]; | ||
d = &sa[2] - &sa[1]; | ||
d = &sa[2].a - &sa[1].b; // expected-warning{{Subtraction of two pointers that}} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two pointers point to the same variable, it's valid to subtract them.
By the way, it's also valid to declare a
long long
variable, point achar *
pointer at it (char *
may point anywhere) and use it to iterate over the memory region of thelong long
as if it was achar[8]
. (By the way in this case you haveElementRegion
s and your code behaves correctly.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it is not clear what rules to apply.
I only found that the operands must point into the same "array object" but not what exactly an "array object" is. Are the sub-arrays of a multidimensional array separate "array objects"? If an address (of a variable) is converted to a
char *
is this the same array object as the original variable (for examplel
)? Invalid indexing likeint a[3][3]; int x = a[0][4];
is undefined behavior, but why should then be allowed to use pointers to such an object and index it like an one-dimensional array, or convert an array pointer into an array with different type and index it?We should allow anything that points into the same memory block (that is a variable or any array), or only allow pointers to the same array with same type and same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed your examples and questions with @whisperity and Zoltán Porkoláb, and I'm trying to summarize what we determined. Note that I'm mostly working from the most recent C++ draft standard (http://eel.is/c++draft/), so some the conclusions may be invalid in C or older versions of C++ (but I tried to highlight the differences that I know about).
(1) In your first example the step
int *b = a
is invalid because theint[3][3]
arraya
decays to anint (*)[3]
(pointer to array of 3int
s) and that type is not interconvertible with a plainint *
(see [basic.compound] note 5 in the C++ draft standard). (And if you defineb
asint (*b)[3] = a
, then the pointer subtraction will become invalid.)(2) In the second example using a
long *
to initializechar *
orshort *
variables is usually an error, it's accepted under old C, but in C++ you need an explicit cast and e.g. modern GCC also produces an error (-Wincompatible-pointer-types
a warning that's by default an error) when it compiles this.I'd guess that the actual pointer arithmetic is standard-compliant, but the relevant parts of the standard ([expr.reinterpret.cast] 7, [expr.static.cast] 14)are too vague to give a confident answer.
Note that under C++ accessing the element
short b[1]
would be a violation of [basic.lval] part 11, but based on [defns.access] I'd say that the expression&b[1]
is not an access ofb[1]
.(3) In your third example
&a[2][2] - &a[1][1]
is clearly working in practically all implementations, but we're fairly sure that it's not compliant with the C++ standard, because [expr.add] part 5.2 speaks about "array elements i and j of the same array object x" (and later "i - j" which shows that "i" and "j" are scalar numbers) -- while in your examplea[2][2]
anda[1][1]
are not (numerically indexed) elements within the same array. (This is coherent with [dcl.array] where an "array" implicitly means a one-dimensional array structure (whose elements may be other arrays).)Nevertheless, it may be reasonable to avoid emitting warnings on this kind of code, because that could be better for the users.
(4) The definition of "what exactly an "array object" is" and "what is an element of an array" primarily appears at [dcl.array] part 6:
(This clearly shows that the element of an element of an array is not an element of the array. The notion of "multi-dimensional arrays" only appears in the Example [dcl.array] Example 4 with word choices that suggests that this is just a mathematical/intuitive notion and not something that's well-defined in the standard.)
This general definition is augmented by [basic.compound] part 3, sentence 11 which states that:
(5) Yes, the sub-arrays of a multidimensional array are separate "array objects" that have their own elements -- but they are also single elements within the same bigger array. This means that if we have
int arr[3][3]
, thenint x = &(arr[2]) - &(arr[1])
is valid as the difference of twoint (*)[3]
pointers that point to elements of the same arrayarr
int y = &(arr[2][1]) - &(arr[2][2])
is valid as the difference of twoint *
pointers that point to elements of the same arrayarr[2]
int z = &(arr[1][1]) - &(arr[2][2])
is invalid, because one pointer points to an element ofarr[1]
, while the other points to element ofarr[2]
int w = arr[2] - arr[1]
is also invalid, because these arrays decay toint *
pointer and this difference is equivalent to&(arr[2][0]) - &(arr[1][0])
(6) "If an address (of a variable) is converted to a char * is this the same array object as the original variable (for example l)?" -- My intuition is that here we're dealing with an imagined
char[]
array object that "covers" the same memory region as the original variable, and any pointer trickery that start from the original object and producechar *
pointers pointing into the original object essentially produces pointers to elements of this imaginedchar[]
array. (We might say that this imaginedchar[]
array is the object representation of the array, although that's probably not exactly accurate.)(7)
I'm not exactly sure what you're speaking about here. I completely agree that
int a[3][3]; int x = a[0][4];
is undefined behavior, but there may be multiple ways to convert the multidimensional array into a single-dimensional one and some of them might be legitimate.(8)
This seems to be forbidden by [expr.add] part 6. Note that the definition of the subscript operator is based on pointer addition.
(9)