-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[analyzer] Suppress out of bounds reports after weak loop assumptions #109804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
23b2737
77fb227
8ae4b67
cdc73ab
e7e6ded
cbb46e5
e952f05
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -121,6 +121,34 @@ struct EvalCallOptions { | |
EvalCallOptions() {} | ||
}; | ||
|
||
/// Simple control flow statements like `if` can only produce a single two-way | ||
/// state split, so when the analyzer cannot determine the value of the | ||
/// condition, it can assume either of the two options, because the fact that | ||
/// they are in the source code implies that the programmer thought that they | ||
/// are possible (at least under some conditions). | ||
/// (Note that this heuristic is not entirely correct when there are _several_ | ||
/// `if` statements with unmarked logical connections between them, but it's | ||
/// still good enough and the analyzer heavily relies on it.) | ||
/// In contrast with this, a single loop statement can produce multiple state | ||
/// splits, and we cannot always single out safe assumptions where we can say | ||
/// that "the programmer included this loop in the source code, so they clearly | ||
/// thought that this execution path is possible". | ||
/// However, the analyzer wants to explore the code in and after the loop, so | ||
/// it makes assumptions about the loop condition (to get a concrete execution | ||
/// path) even when they are not justified. | ||
/// This function is called by the engine to mark the `State` when it makes an | ||
/// assumption which is "weak". Checkers may use this heuristical mark to | ||
/// discard the result and reduce the amount of false positives. | ||
/// TODO: Instead of just marking these branches for checker-specific handling, | ||
/// we could discard them completely. I suspect that this could eliminate some | ||
/// false positives without suppressing too many true positives, but I didn't | ||
/// have time to measure its effects. | ||
ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State); | ||
|
||
/// Returns true if `recordWeakLoopAssumption()` was called on the execution | ||
/// path which produced `State`. | ||
bool seenWeakLoopAssumption(ProgramStateRef State); | ||
|
||
class ExprEngine { | ||
void anchor(); | ||
|
||
|
@@ -322,13 +350,14 @@ class ExprEngine { | |
ExplodedNode *Pred); | ||
|
||
/// ProcessBranch - Called by CoreEngine. Used to generate successor | ||
/// nodes by processing the 'effects' of a branch condition. | ||
void processBranch(const Stmt *Condition, | ||
NodeBuilderContext& BuilderCtx, | ||
ExplodedNode *Pred, | ||
ExplodedNodeSet &Dst, | ||
const CFGBlock *DstT, | ||
const CFGBlock *DstF); | ||
/// nodes by processing the 'effects' of a branch condition. | ||
/// If the branch condition is a loop condition, IterationsFinishedInLoop is | ||
/// the number of already finished iterations (0, 1, 2, ...); otherwise it's | ||
/// std::nullopt. | ||
void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx, | ||
ExplodedNode *Pred, ExplodedNodeSet &Dst, | ||
const CFGBlock *DstT, const CFGBlock *DstF, | ||
std::optional<unsigned> IterationsFinishedInLoop); | ||
|
||
/// Called by CoreEngine. | ||
/// Used to generate successor nodes for temporary destructors depending | ||
|
@@ -583,11 +612,11 @@ class ExprEngine { | |
ExplodedNode *Pred, | ||
ExplodedNodeSet &Dst); | ||
|
||
/// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly assume symbolic | ||
/// expressions of the form 'x != 0' and generate new nodes (stored in Dst) | ||
/// with those assumptions. | ||
void evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, ExplodedNodeSet &Src, | ||
const Expr *Ex); | ||
/// evalEagerlyAssumeOpBifurcation - Given the nodes in 'Src', eagerly assume | ||
/// symbolic expressions of the form 'x != 0' or '!x' and generate new nodes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please don't rename this method. The functionality of the method doesn't change, it still does what it used to do, so the renaming just creates noise in the diff. If you think the name is wrong, you can open a separate NFC to address that. Since more than one method would need to be renamed after this one, I suggest you leave this alone. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name of this method is incorrect and misleading, which wasted my time while I was debugging some aspect of my patch, so I want to rename it. I'm not interested in renaming other methods, and I think it's a big waste of time to create a separate commit for four lines of completely NFC changes. (Also, creating a separate patch for every small change is polluting the diff log with small inconsequential commits.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My point was that some other methods that this method calls also have the same misleading and incorrect name like This is why I suggested a separate patch, or if you want to do it in this patch just keep them in 2 different commits so they are separated when the PR is merged. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you've already seen, I created a separate NFC PR for this renaming and a few other things: #112209 Once that's merged, I'll rebase this patch onto it. |
||
/// (stored in Dst) with those assumptions. | ||
void evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst, | ||
ExplodedNodeSet &Src, const Expr *Ex); | ||
|
||
static std::pair<const ProgramPointTag *, const ProgramPointTag *> | ||
geteagerlyAssumeBinOpBifurcationTags(); | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -212,6 +212,24 @@ typedef llvm::ImmutableMap<const LocationContext *, unsigned> | |||||
REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction, | ||||||
PendingArrayDestructionMap) | ||||||
|
||||||
// This trait is used to heuristically filter out results produced from | ||||||
// execution paths that took "weak" assumptions within a loop. | ||||||
REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool) | ||||||
|
||||||
ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) { | ||||||
return State->set<SeenWeakLoopAssumption>(true); | ||||||
} | ||||||
|
||||||
bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) { | ||||||
return State->get<SeenWeakLoopAssumption>(); | ||||||
} | ||||||
Comment on lines
+219
to
+225
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Every registered trait should be removed when they are no longer needed to stop polluting the states. Also, aren't the traits part of the exploded graph? If they are and they are not cleaned up, they pollute the egraph too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This bit doesn't have any "no longer needed" point -- if we made some shaky assumption for a loop condition, I want to suppress all the ArrayBoundV2 reports that are reached through a path which takes that particular branch. (By the way, each There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Does it still worth to keep them once we finished processing the loop? The loop condition only gives us information while we are inside the loop the condition belongs to, right? Once the loop is exited, the trait could be safely removed I think. I only see the pattern that every state trait that we have now comes with a method that sets it and another one that removes it, so I assume we remove them for a reason. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is nothing to be removed here as this trait is a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, there is a common false positive pattern where an unfounded assumption within the loop is used after the loop and produces an unwanted result at that point. See e.g. the testcase There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but with this left turned on, the coverage drop is huge even in cases that are not affected by the assumption. void foo(int x, int y) {
for (unsigned i = 0; i < x; i++) ; // split the state and set SeenWeakLoopAssumption to 'true'
if (x != 0) return; // drop the 'true' branch
// no warnings are reported from this point on
int buf[1] = {0};
for (int i = 0; i < y; i++)
buf[i] = 1; // SeenWeakLoopAssumption is 'true', so the warning is suppressed
} This goes on through multiple function calls too. void a() {}
void b() { a(); }
void c() { b(); }
void d() { c(); }
void main() {
for (unsigned i = 0; i < x; i++) ;
if (x != 0) return;
// no warnings are reported from this point on
d();
} If a warning is found inside any of Since we generate a sink it is just 1 false negative though, while relying on an unfounded assumption might trigger a false positive in one of the nested functions, so I guess we can live with this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The example void foo(int x, int y) {
for (unsigned i = 0; i < x; i++) ; // split the state and set SeenWeakLoopAssumption to 'true'
if (x != 0) return; // drop the 'true' branch
// no warnings are reported from this point on
} is a very good point and I'll probably add it to the tests to highlight this limitation of the heuristic. However, I've seen {{ArrayBoundV2}} reports where lots of stuff happens between the point where we assume that a loop can have 0 iterations (i.e. some length/size variable is equal to 0) and the point where this triggers an unwanted report; so I don't think that we can have a natural cutoff where the "SeenWeakLoopAssumption" bit may be safely cleared. I don't see a way to avoid these kinds of false negatives without a completely different loop handling approach, so I think we should accept them in the foreseeable future. (There are already lots of precedents for losing coverage after loops.)
Comment on lines
+219
to
+225
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please move these definitions to the place, where the other similar definitions for other state traits are found, so that they stay grouped together? |
||||||
|
||||||
// This trait points to the last expression (logical operator) where an eager | ||||||
// assumption introduced a state split (i.e. both cases were feasible). This is | ||||||
// used by the WeakLoopAssumption heuristic to find situations where an eager | ||||||
// assumption introduces a state split in the evaluation of a loop condition. | ||||||
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to emphasize that I'm only marking the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggested a renaming only because I'm not sure that |
||||||
|
||||||
//===----------------------------------------------------------------------===// | ||||||
// Engine construction and deletion. | ||||||
//===----------------------------------------------------------------------===// | ||||||
|
@@ -2128,7 +2146,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred, | |||||
(B->isRelationalOp() || B->isEqualityOp())) { | ||||||
ExplodedNodeSet Tmp; | ||||||
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Tmp); | ||||||
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, cast<Expr>(S)); | ||||||
evalEagerlyAssumeOpBifurcation(Dst, Tmp, cast<Expr>(S)); | ||||||
} | ||||||
else | ||||||
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Dst); | ||||||
|
@@ -2401,7 +2419,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred, | |||||
if (AMgr.options.ShouldEagerlyAssume && (U->getOpcode() == UO_LNot)) { | ||||||
ExplodedNodeSet Tmp; | ||||||
VisitUnaryOperator(U, Pred, Tmp); | ||||||
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, U); | ||||||
evalEagerlyAssumeOpBifurcation(Dst, Tmp, U); | ||||||
} | ||||||
else | ||||||
VisitUnaryOperator(U, Pred, Dst); | ||||||
|
@@ -2761,12 +2779,10 @@ assumeCondition(const Stmt *Condition, ExplodedNode *N) { | |||||
return State->assume(V); | ||||||
} | ||||||
|
||||||
void ExprEngine::processBranch(const Stmt *Condition, | ||||||
NodeBuilderContext& BldCtx, | ||||||
ExplodedNode *Pred, | ||||||
ExplodedNodeSet &Dst, | ||||||
const CFGBlock *DstT, | ||||||
const CFGBlock *DstF) { | ||||||
void ExprEngine::processBranch( | ||||||
const Stmt *Condition, NodeBuilderContext &BldCtx, ExplodedNode *Pred, | ||||||
ExplodedNodeSet &Dst, const CFGBlock *DstT, const CFGBlock *DstF, | ||||||
std::optional<unsigned> IterationsFinishedInLoop) { | ||||||
assert((!Condition || !isa<CXXBindTemporaryExpr>(Condition)) && | ||||||
"CXXBindTemporaryExprs are handled by processBindTemporary."); | ||||||
const LocationContext *LCtx = Pred->getLocationContext(); | ||||||
|
@@ -2804,31 +2820,59 @@ void ExprEngine::processBranch(const Stmt *Condition, | |||||
ProgramStateRef PrevState = PredN->getState(); | ||||||
|
||||||
ProgramStateRef StTrue, StFalse; | ||||||
if (const auto KnownCondValueAssumption = assumeCondition(Condition, PredN)) | ||||||
StTrue = StFalse = PrevState; | ||||||
|
||||||
if (const auto KnownCondValueAssumption = | ||||||
assumeCondition(Condition, PredN)) { | ||||||
std::tie(StTrue, StFalse) = *KnownCondValueAssumption; | ||||||
else { | ||||||
assert(!isa<ObjCForCollectionStmt>(Condition)); | ||||||
builder.generateNode(PrevState, true, PredN); | ||||||
builder.generateNode(PrevState, false, PredN); | ||||||
continue; | ||||||
|
||||||
if (!StTrue) | ||||||
builder.markInfeasible(true); | ||||||
|
||||||
if (!StFalse) | ||||||
builder.markInfeasible(false); | ||||||
} | ||||||
if (StTrue && StFalse) | ||||||
bool BothFeasible = builder.isFeasible(true) && builder.isFeasible(false); | ||||||
|
||||||
if (BothFeasible) | ||||||
assert(!isa<ObjCForCollectionStmt>(Condition)); | ||||||
|
||||||
const Expr *EagerlyAssumeExpr = | ||||||
PrevState->get<LastEagerlyAssumeAssumptionAt>(); | ||||||
bool DidEagerlyAssume = EagerlyAssumeExpr == dyn_cast<Expr>(Condition); | ||||||
|
||||||
// Process the true branch. | ||||||
if (builder.isFeasible(true)) { | ||||||
if (StTrue) | ||||||
builder.generateNode(StTrue, true, PredN); | ||||||
else | ||||||
builder.markInfeasible(true); | ||||||
if ((BothFeasible || DidEagerlyAssume) && IterationsFinishedInLoop && | ||||||
*IterationsFinishedInLoop >= 2) { | ||||||
// When programmers write a loop, they imply that at least two | ||||||
// iterations are possible (otherwise they would just write an `if`), | ||||||
// but the third iteration is not implied: there are situations where | ||||||
// the programmer knows that there won't be a third iteration, but | ||||||
// this is not marked in the source code. (For example, the ffmpeg | ||||||
// project has 2-element arrays which are accessed from loops where | ||||||
// the number of steps is opaque and the analyzer cannot deduce that | ||||||
// there are <= 2 iterations.) | ||||||
// Checkers may use this heuristic mark to discard results found on | ||||||
// branches that contain this "weak" assumption. | ||||||
StTrue = recordWeakLoopAssumption(StTrue); | ||||||
} | ||||||
builder.generateNode(StTrue, true, PredN); | ||||||
} | ||||||
|
||||||
// Process the false branch. | ||||||
if (builder.isFeasible(false)) { | ||||||
if (StFalse) | ||||||
builder.generateNode(StFalse, false, PredN); | ||||||
else | ||||||
builder.markInfeasible(false); | ||||||
if ((BothFeasible || DidEagerlyAssume) && IterationsFinishedInLoop && | ||||||
*IterationsFinishedInLoop == 0) { | ||||||
// There are many situations where the programmers know that there | ||||||
// will be at least one iteration in a loop (e.g. a structure is not | ||||||
// empty) but the analyzer cannot deduce this and reports false | ||||||
// positives after skipping the loop. | ||||||
// Checkers may use this heuristic mark to discard results found on | ||||||
// branches that contain this "weak" assumption. | ||||||
StFalse = recordWeakLoopAssumption(StFalse); | ||||||
} | ||||||
builder.generateNode(StFalse, false, PredN); | ||||||
} | ||||||
} | ||||||
currBldrCtx = nullptr; | ||||||
|
@@ -3752,9 +3796,9 @@ ExprEngine::geteagerlyAssumeBinOpBifurcationTags() { | |||||
&eagerlyAssumeBinOpBifurcationFalse); | ||||||
} | ||||||
|
||||||
void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, | ||||||
ExplodedNodeSet &Src, | ||||||
const Expr *Ex) { | ||||||
void ExprEngine::evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst, | ||||||
ExplodedNodeSet &Src, | ||||||
const Expr *Ex) { | ||||||
StmtNodeBuilder Bldr(Src, Dst, *currBldrCtx); | ||||||
|
||||||
for (const auto Pred : Src) { | ||||||
|
@@ -3776,6 +3820,11 @@ void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, | |||||
ProgramStateRef StateTrue, StateFalse; | ||||||
std::tie(StateTrue, StateFalse) = state->assume(*SEV); | ||||||
|
||||||
if (StateTrue && StateFalse) { | ||||||
StateTrue = StateTrue->set<LastEagerlyAssumeAssumptionAt>(Ex); | ||||||
StateFalse = StateFalse->set<LastEagerlyAssumeAssumptionAt>(Ex); | ||||||
} | ||||||
isuckatcs marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
// First assume that the condition is true. | ||||||
if (StateTrue) { | ||||||
SVal Val = svalBuilder.makeIntVal(1U, Ex->getType()); | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While browsing through the codebase, I've seen that these methods for the other traits are defined inside
class ExprEngine
asstatic
methods with setters and deleters beingprivate
and getters beingpublic
.IIUC, they are in the global scope so that they can be accessed from
ArrayBoundCheckerV2
. Tbh, I would makerecordWeakLoopAssumption()
aprivate
static
method of theExprEngine
class, as only this class is able to assume anything about the condition, so it should be the only one, that can set any state traits related to those assumptions, whileseenWeakLoopAssumption()
could be apublic
static
member function ofExprEngine
to follow the pattern.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer plain functions instead of
static
methods that are only vaguely connected to a class, but I can move them toExprEngine
to follow the precedents.Making
recordWeakLoopAssumption()
private is also a good point -- I'll either do so or just inline its short one-line definition at the few locations where it's called. (I think the transparentState->set<>
would be clearer than just calling some random method. If we need more complex logic later, we can reintroduce a method like it.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these methods are strongly connected to this class, as they only make sense in the context of
ExprEngine
, which is also indicated by them being inExprEngine.h
. AlsoExprEngine
is the only class that can set this trait, as no one else has access to the required information to do so.