Skip to content

[analyzer] Suppress out of bounds reports after weak loop assumptions #109804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 41 additions & 12 deletions clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,34 @@ struct EvalCallOptions {
EvalCallOptions() {}
};

/// Simple control flow statements like `if` can only produce a single two-way
/// state split, so when the analyzer cannot determine the value of the
/// condition, it can assume either of the two options, because the fact that
/// they are in the source code implies that the programmer thought that they
/// are possible (at least under some conditions).
/// (Note that this heuristic is not entirely correct when there are _several_
/// `if` statements with unmarked logical connections between them, but it's
/// still good enough and the analyzer heavily relies on it.)
/// In contrast with this, a single loop statement can produce multiple state
/// splits, and we cannot always single out safe assumptions where we can say
/// that "the programmer included this loop in the source code, so they clearly
/// thought that this execution path is possible".
/// However, the analyzer wants to explore the code in and after the loop, so
/// it makes assumptions about the loop condition (to get a concrete execution
/// path) even when they are not justified.
/// This function is called by the engine to mark the `State` when it makes an
/// assumption which is "weak". Checkers may use this heuristical mark to
/// discard the result and reduce the amount of false positives.
/// TODO: Instead of just marking these branches for checker-specific handling,
/// we could discard them completely. I suspect that this could eliminate some
/// false positives without suppressing too many true positives, but I didn't
/// have time to measure its effects.
ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);

/// Returns true if `recordWeakLoopAssumption()` was called on the execution
/// path which produced `State`.
bool seenWeakLoopAssumption(ProgramStateRef State);
Comment on lines +146 to +150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While browsing through the codebase, I've seen that these methods for the other traits are defined inside class ExprEngine as static methods with setters and deleters being private and getters being public.

IIUC, they are in the global scope so that they can be accessed from ArrayBoundCheckerV2. Tbh, I would make recordWeakLoopAssumption() a private static method of the ExprEngine class, as only this class is able to assume anything about the condition, so it should be the only one, that can set any state traits related to those assumptions, while seenWeakLoopAssumption() could be a public static member function of ExprEngine to follow the pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer plain functions instead of static methods that are only vaguely connected to a class, but I can move them to ExprEngine to follow the precedents.

Making recordWeakLoopAssumption() private is also a good point -- I'll either do so or just inline its short one-line definition at the few locations where it's called. (I think the transparent State->set<> would be clearer than just calling some random method. If we need more complex logic later, we can reintroduce a method like it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods that are only vaguely connected to a class

I think these methods are strongly connected to this class, as they only make sense in the context of ExprEngine, which is also indicated by them being in ExprEngine.h. Also ExprEngine is the only class that can set this trait, as no one else has access to the required information to do so.


class ExprEngine {
void anchor();

Expand Down Expand Up @@ -322,13 +350,14 @@ class ExprEngine {
ExplodedNode *Pred);

/// ProcessBranch - Called by CoreEngine. Used to generate successor
/// nodes by processing the 'effects' of a branch condition.
void processBranch(const Stmt *Condition,
NodeBuilderContext& BuilderCtx,
ExplodedNode *Pred,
ExplodedNodeSet &Dst,
const CFGBlock *DstT,
const CFGBlock *DstF);
/// nodes by processing the 'effects' of a branch condition.
/// If the branch condition is a loop condition, IterationsFinishedInLoop is
/// the number of already finished iterations (0, 1, 2, ...); otherwise it's
/// std::nullopt.
void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
ExplodedNode *Pred, ExplodedNodeSet &Dst,
const CFGBlock *DstT, const CFGBlock *DstF,
std::optional<unsigned> IterationsFinishedInLoop);

/// Called by CoreEngine.
/// Used to generate successor nodes for temporary destructors depending
Expand Down Expand Up @@ -583,11 +612,11 @@ class ExprEngine {
ExplodedNode *Pred,
ExplodedNodeSet &Dst);

/// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly assume symbolic
/// expressions of the form 'x != 0' and generate new nodes (stored in Dst)
/// with those assumptions.
void evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, ExplodedNodeSet &Src,
const Expr *Ex);
/// evalEagerlyAssumeOpBifurcation - Given the nodes in 'Src', eagerly assume
/// symbolic expressions of the form 'x != 0' or '!x' and generate new nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't rename this method. The functionality of the method doesn't change, it still does what it used to do, so the renaming just creates noise in the diff. If you think the name is wrong, you can open a separate NFC to address that.

Since more than one method would need to be renamed after this one, I suggest you leave this alone.

Copy link
Contributor Author

@NagyDonat NagyDonat Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method is incorrect and misleading, which wasted my time while I was debugging some aspect of my patch, so I want to rename it. I'm not interested in renaming other methods, and I think it's a big waste of time to create a separate commit for four lines of completely NFC changes. (Also, creating a separate patch for every small change is polluting the diff log with small inconsequential commits.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that some other methods that this method calls also have the same misleading and incorrect name like geteagerlyAssumeBinOpBifurcationTags(). I think if we rename this method, every other similar method should also be renamed.

This is why I suggested a separate patch, or if you want to do it in this patch just keep them in 2 different commits so they are separated when the PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've already seen, I created a separate NFC PR for this renaming and a few other things: #112209

Once that's merged, I'll rebase this patch onto it.

/// (stored in Dst) with those assumptions.
void evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
ExplodedNodeSet &Src, const Expr *Ex);

static std::pair<const ProgramPointTag *, const ProgramPointTag *>
geteagerlyAssumeBinOpBifurcationTags();
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -697,6 +697,11 @@ void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
ProgramStateRef ErrorState, Messages Msgs,
NonLoc Offset, std::optional<NonLoc> Extent,
bool IsTaintBug /*=false*/) const {
// Suppress results found through execution paths where in some loop the
// analyzer arbitrarily assumed either that the loop is skipped (0 iterations)
// or that 3 or more iterations are executed.
if (seenWeakLoopAssumption(ErrorState))
return;

ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
if (!ErrorNode)
Expand Down
24 changes: 23 additions & 1 deletion clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -441,10 +441,32 @@ void CoreEngine::HandleCallEnter(const CallEnter &CE, ExplodedNode *Pred) {
void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
const CFGBlock * B, ExplodedNode *Pred) {
assert(B->succ_size() == 2);

const LocationContext *LC = Pred->getLocationContext();
BlockCounter Counter = WList->getBlockCounter();
unsigned BlockCount =
Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
std::optional<unsigned> IterationsFinishedInLoop = std::nullopt;
if (isa<ForStmt, WhileStmt, CXXForRangeStmt>(Term)) {
// FIXME: This code approximates the number of finished iterations based on
// the block count, i.e. the number of evaluations of the terminator block
// on the current execution path (which includes the current evaluation, so
// is always >= 1). This is probably acceptable for the checker-specific
// false positive suppression that currently uses this value, but it would
// be better to calcuate an accurate count of iterations.
assert(BlockCount >= 1);
IterationsFinishedInLoop = BlockCount - 1;
} else if (isa<DoStmt>(Term)) {
// FIXME: The fixme note in the previous branch also applies here.
// In a do-while loop one iteration happens before the first evaluation of
// the loop condition, so we don't subtract one from the block count.
IterationsFinishedInLoop = BlockCount;
}

NodeBuilderContext Ctx(*this, B, Pred);
ExplodedNodeSet Dst;
ExprEng.processBranch(Cond, Ctx, Pred, Dst, *(B->succ_begin()),
*(B->succ_begin() + 1));
*(B->succ_begin() + 1), IterationsFinishedInLoop);
// Enqueue the new frontier onto the worklist.
enqueue(Dst);
}
Expand Down
101 changes: 75 additions & 26 deletions clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,24 @@ typedef llvm::ImmutableMap<const LocationContext *, unsigned>
REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction,
PendingArrayDestructionMap)

// This trait is used to heuristically filter out results produced from
// execution paths that took "weak" assumptions within a loop.
REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool)

ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) {
return State->set<SeenWeakLoopAssumption>(true);
}

bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
return State->get<SeenWeakLoopAssumption>();
}
Comment on lines +219 to +225
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every registered trait should be removed when they are no longer needed to stop polluting the states. Also, aren't the traits part of the exploded graph? If they are and they are not cleaned up, they pollute the egraph too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit doesn't have any "no longer needed" point -- if we made some shaky assumption for a loop condition, I want to suppress all the ArrayBoundV2 reports that are reached through a path which takes that particular branch.

(By the way, each ExplodedNode has an associated State, which is essentially a big heap of traits that are introduced by various parts of the engine and various checkers; so yes, traits are contained in the egraph as well.)

Copy link
Member

@isuckatcs isuckatcs Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit doesn't have any "no longer needed" point

Does it still worth to keep them once we finished processing the loop? The loop condition only gives us information while we are inside the loop the condition belongs to, right? Once the loop is exited, the trait could be safely removed I think.

I only see the pattern that every state trait that we have now comes with a method that sets it and another one that removes it, so I assume we remove them for a reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing to be removed here as this trait is a bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still worth to keep them once we finished processing the loop? The loop condition only gives us information while we are inside the loop the condition belongs to, right? Once the loop is exited, the trait could be safely removed I think.

No, there is a common false positive pattern where an unfounded assumption within the loop is used after the loop and produces an unwanted result at that point. See e.g. the testcase loop_suppress_after_zero_iterations where the assumption in the loop introduces len == 0 as a separate branch, while there is no justified reason to handle this separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with this left turned on, the coverage drop is huge even in cases that are not affected by the assumption.

void foo(int x, int y) {
  for (unsigned i = 0; i < x; i++) ; // split the state and set SeenWeakLoopAssumption to 'true'
  if (x != 0) return;                // drop the 'true' branch

  // no warnings are reported from this point on

  int buf[1] = {0};
  for (int i = 0; i < y; i++)
    buf[i] = 1;                      // SeenWeakLoopAssumption is 'true', so the warning is suppressed
}

This goes on through multiple function calls too.

void a() {}

void b() { a(); }

void c() { b(); }

void d() { c(); }

void main() {
  for (unsigned i = 0; i < x; i++)  ;
  if (x != 0) return;

  // no warnings are reported from this point on

  d();
}

If a warning is found inside any of a(), b(), c() or d(), it is suppressed because the trait is set on the top of the execution path.

Since we generate a sink it is just 1 false negative though, while relying on an unfounded assumption might trigger a false positive in one of the nested functions, so I guess we can live with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example

void foo(int x, int y) {
  for (unsigned i = 0; i < x; i++) ; // split the state and set SeenWeakLoopAssumption to 'true'
  if (x != 0) return;                // drop the 'true' branch

  // no warnings are reported from this point on
}

is a very good point and I'll probably add it to the tests to highlight this limitation of the heuristic.

However, I've seen {{ArrayBoundV2}} reports where lots of stuff happens between the point where we assume that a loop can have 0 iterations (i.e. some length/size variable is equal to 0) and the point where this triggers an unwanted report; so I don't think that we can have a natural cutoff where the "SeenWeakLoopAssumption" bit may be safely cleared.

I don't see a way to avoid these kinds of false negatives without a completely different loop handling approach, so I think we should accept them in the foreseeable future. (There are already lots of precedents for losing coverage after loops.)

Comment on lines +219 to +225
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move these definitions to the place, where the other similar definitions for other state traits are found, so that they stay grouped together?


// This trait points to the last expression (logical operator) where an eager
// assumption introduced a state split (i.e. both cases were feasible). This is
// used by the WeakLoopAssumption heuristic to find situations where an eager
// assumption introduces a state split in the evaluation of a loop condition.
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerAssumptionAt, const Expr *)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to emphasize that I'm only marking the EagerlyAssume checks that did assume something new (i.e. they introduce a state split). I'll probably rename this to LastEagerAssumptionStateSplitAt or something similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested a renaming only because I'm not sure that EagerlyAssumeAssumption is a correct english phrase.


//===----------------------------------------------------------------------===//
// Engine construction and deletion.
//===----------------------------------------------------------------------===//
Expand Down Expand Up @@ -2128,7 +2146,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
(B->isRelationalOp() || B->isEqualityOp())) {
ExplodedNodeSet Tmp;
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Tmp);
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, cast<Expr>(S));
evalEagerlyAssumeOpBifurcation(Dst, Tmp, cast<Expr>(S));
}
else
VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Dst);
Expand Down Expand Up @@ -2401,7 +2419,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
if (AMgr.options.ShouldEagerlyAssume && (U->getOpcode() == UO_LNot)) {
ExplodedNodeSet Tmp;
VisitUnaryOperator(U, Pred, Tmp);
evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, U);
evalEagerlyAssumeOpBifurcation(Dst, Tmp, U);
}
else
VisitUnaryOperator(U, Pred, Dst);
Expand Down Expand Up @@ -2761,12 +2779,10 @@ assumeCondition(const Stmt *Condition, ExplodedNode *N) {
return State->assume(V);
}

void ExprEngine::processBranch(const Stmt *Condition,
NodeBuilderContext& BldCtx,
ExplodedNode *Pred,
ExplodedNodeSet &Dst,
const CFGBlock *DstT,
const CFGBlock *DstF) {
void ExprEngine::processBranch(
const Stmt *Condition, NodeBuilderContext &BldCtx, ExplodedNode *Pred,
ExplodedNodeSet &Dst, const CFGBlock *DstT, const CFGBlock *DstF,
std::optional<unsigned> IterationsFinishedInLoop) {
assert((!Condition || !isa<CXXBindTemporaryExpr>(Condition)) &&
"CXXBindTemporaryExprs are handled by processBindTemporary.");
const LocationContext *LCtx = Pred->getLocationContext();
Expand Down Expand Up @@ -2804,31 +2820,59 @@ void ExprEngine::processBranch(const Stmt *Condition,
ProgramStateRef PrevState = PredN->getState();

ProgramStateRef StTrue, StFalse;
if (const auto KnownCondValueAssumption = assumeCondition(Condition, PredN))
StTrue = StFalse = PrevState;

if (const auto KnownCondValueAssumption =
assumeCondition(Condition, PredN)) {
std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
else {
assert(!isa<ObjCForCollectionStmt>(Condition));
builder.generateNode(PrevState, true, PredN);
builder.generateNode(PrevState, false, PredN);
continue;

if (!StTrue)
builder.markInfeasible(true);

if (!StFalse)
builder.markInfeasible(false);
}
if (StTrue && StFalse)
bool BothFeasible = builder.isFeasible(true) && builder.isFeasible(false);

if (BothFeasible)
assert(!isa<ObjCForCollectionStmt>(Condition));

const Expr *EagerlyAssumeExpr =
PrevState->get<LastEagerlyAssumeAssumptionAt>();
bool DidEagerlyAssume = EagerlyAssumeExpr == dyn_cast<Expr>(Condition);

// Process the true branch.
if (builder.isFeasible(true)) {
if (StTrue)
builder.generateNode(StTrue, true, PredN);
else
builder.markInfeasible(true);
if ((BothFeasible || DidEagerlyAssume) && IterationsFinishedInLoop &&
*IterationsFinishedInLoop >= 2) {
// When programmers write a loop, they imply that at least two
// iterations are possible (otherwise they would just write an `if`),
// but the third iteration is not implied: there are situations where
// the programmer knows that there won't be a third iteration, but
// this is not marked in the source code. (For example, the ffmpeg
// project has 2-element arrays which are accessed from loops where
// the number of steps is opaque and the analyzer cannot deduce that
// there are <= 2 iterations.)
// Checkers may use this heuristic mark to discard results found on
// branches that contain this "weak" assumption.
StTrue = recordWeakLoopAssumption(StTrue);
}
builder.generateNode(StTrue, true, PredN);
}

// Process the false branch.
if (builder.isFeasible(false)) {
if (StFalse)
builder.generateNode(StFalse, false, PredN);
else
builder.markInfeasible(false);
if ((BothFeasible || DidEagerlyAssume) && IterationsFinishedInLoop &&
*IterationsFinishedInLoop == 0) {
// There are many situations where the programmers know that there
// will be at least one iteration in a loop (e.g. a structure is not
// empty) but the analyzer cannot deduce this and reports false
// positives after skipping the loop.
// Checkers may use this heuristic mark to discard results found on
// branches that contain this "weak" assumption.
StFalse = recordWeakLoopAssumption(StFalse);
}
builder.generateNode(StFalse, false, PredN);
}
}
currBldrCtx = nullptr;
Expand Down Expand Up @@ -3752,9 +3796,9 @@ ExprEngine::geteagerlyAssumeBinOpBifurcationTags() {
&eagerlyAssumeBinOpBifurcationFalse);
}

void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
ExplodedNodeSet &Src,
const Expr *Ex) {
void ExprEngine::evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
ExplodedNodeSet &Src,
const Expr *Ex) {
StmtNodeBuilder Bldr(Src, Dst, *currBldrCtx);

for (const auto Pred : Src) {
Expand All @@ -3776,6 +3820,11 @@ void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
ProgramStateRef StateTrue, StateFalse;
std::tie(StateTrue, StateFalse) = state->assume(*SEV);

if (StateTrue && StateFalse) {
StateTrue = StateTrue->set<LastEagerlyAssumeAssumptionAt>(Ex);
StateFalse = StateFalse->set<LastEagerlyAssumeAssumptionAt>(Ex);
}

// First assume that the condition is true.
if (StateTrue) {
SVal Val = svalBuilder.makeIntVal(1U, Ex->getType());
Expand Down
6 changes: 3 additions & 3 deletions clang/test/Analysis/loop-unrolling.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ int simple_unknown_bound_loop() {
#ifdef DFS
clang_analyzer_numTimesReached(); // expected-warning {{16}}
#else
clang_analyzer_numTimesReached(); // expected-warning {{8}}
clang_analyzer_numTimesReached(); // expected-warning {{10}}
#endif
}
return 0;
Expand All @@ -369,9 +369,9 @@ int nested_inlined_no_unroll1() {
int k;
for (int i = 0; i < 9; i++) {
#ifdef DFS
clang_analyzer_numTimesReached(); // expected-warning {{18}}
clang_analyzer_numTimesReached(); // expected-warning {{20}}
#else
clang_analyzer_numTimesReached(); // expected-warning {{14}}
clang_analyzer_numTimesReached(); // expected-warning {{18}}
#endif
k = simple_unknown_bound_loop(); // reevaluation without inlining, splits the state as well
}
Expand Down
Loading
Loading