-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Introduce intra-procedural lifetime analysis in Clang #142313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c8f6277
to
3624359
Compare
✅ With the latest revision this PR passed the Python code formatter. |
668e329
to
7b929ee
Compare
136238f
to
942648e
Compare
942648e
to
cb7bd7f
Compare
cb7bd7f
to
0cd187b
Compare
@llvm/pr-subscribers-clang-analysis Author: Utkarsh Saxena (usx95) ChangesThis patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291 The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches. Key Components
Next Steps(Not covered in this PR but planned for subsequent patches) The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:
<details> Performance on pathological test cases: </summary> The pathological case arise when we have struct MyObj {
int id;
~MyObj() {}
};
void long_cycle(bool condition) {
MyObj v1{1};
MyObj v2{1};
MyObj v3{1};
MyObj v4{1};
MyObj* p1 = &v1;
MyObj* p2 = &v2;
MyObj* p3 = &v3;
MyObj* p4 = &v4;
while (condition) {
MyObj* temp = p1;
p1 = p2;
p2 = p3;
p3 = p4;
p4 = temp;
}
} </details> Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff 5 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+ AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
FixitUtil.cpp
IntervalPartition.cpp
IssueHash.cpp
+ LifetimeSafety.cpp
LiveVariables.cpp
MacroExpansionContext.cpp
ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+ const clang::CFGBlock *Block;
+ /// Index into Block->Elements().
+ unsigned ElementIndex;
+
+ Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+ : Block(B), ElementIndex(Idx) {}
+
+ bool operator==(const Point &Other) const {
+ return Block == Other.Block && ElementIndex == Other.ElementIndex;
+ }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+ const clang::ValueDecl *D;
+
+ enum class Kind : uint8_t {
+ StackVariable,
+ Heap, // TODO: Handle.
+ Field, // TODO: Handle.
+ ArrayElement, // TODO: Handle.
+ TemporaryObject, // TODO: Handle.
+ StaticOrGlobal, // TODO: Handle.
+ };
+
+ Kind PathKind;
+
+ Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+ /// TODO: Represent opaque loans.
+ /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+ /// is represented as empty LoanSet
+ LoanID ID;
+ Path SourcePath;
+ SourceLocation IssueLoc;
+
+ LoanInfo(LoanID id, Path path, SourceLocation loc)
+ : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+ OriginID ID;
+ OriginKind Kind;
+ union {
+ const clang::ValueDecl *Decl;
+ const clang::Expr *Expression;
+ };
+ OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+ : ID(id), Kind(kind), Decl(D) {}
+ OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+ : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+ LoanManager() = default;
+
+ LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+ NextLoanIDVal++;
+ AllLoans.emplace_back(NextLoanIDVal, path, loc);
+ return AllLoans.back();
+ }
+
+ const LoanInfo *getLoanInfo(LoanID id) const {
+ if (id < AllLoans.size())
+ return &AllLoans[id];
+ return nullptr;
+ }
+ llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+ LoanID NextLoanIDVal = 0;
+ llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+ OriginManager() = default;
+
+ OriginID getNextOriginID() { return NextOriginIDVal++; }
+ OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+ assert(D != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::Variable, D);
+ return AllOrigins.back();
+ }
+ OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+ assert(E != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+ return AllOrigins.back();
+ }
+
+ OriginID getOrCreate(const Expr *E) {
+ auto It = ExprToOriginID.find(E);
+ if (It != ExprToOriginID.end())
+ return It->second;
+
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+ // Origin of DeclRefExpr is that of the declaration it refers to.
+ return getOrCreate(DRE->getDecl());
+ }
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, E);
+ ExprToOriginID[E] = NewID;
+ return NewID;
+ }
+
+ const OriginInfo *getOriginInfo(OriginID id) const {
+ if (id < AllOrigins.size())
+ return &AllOrigins[id];
+ return nullptr;
+ }
+
+ llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+ OriginID getOrCreate(const ValueDecl *D) {
+ auto It = DeclToOriginID.find(D);
+ if (It != DeclToOriginID.end())
+ return It->second;
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, D);
+ DeclToOriginID[D] = NewID;
+ return NewID;
+ }
+
+private:
+ OriginID NextOriginIDVal = 0;
+ llvm::SmallVector<OriginInfo> AllOrigins;
+ llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+ llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+ enum class Kind : uint8_t {
+ /// A new loan is issued from a borrow expression (e.g., &x).
+ Issue,
+ /// A loan expires as its underlying storage is freed (e.g., variable goes
+ /// out of scope).
+ Expire,
+ /// An origin is propagated from a source to a destination (e.g., p = q).
+ AssignOrigin,
+ /// An origin is part of a function's return value.
+ ReturnOfOrigin
+ };
+
+private:
+ Kind K;
+
+protected:
+ Point P;
+ Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+ virtual ~Fact() = default;
+ Kind getKind() const { return K; }
+ Point getPoint() const { return P; }
+
+ template <typename T> const T *getAs() const {
+ if (T::classof(this))
+ return static_cast<const T *>(this);
+ return nullptr;
+ }
+
+ virtual void dump(llvm::raw_ostream &OS) const {
+ OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+ << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+ }
+};
+
+class IssueFact : public Fact {
+ LoanID LID;
+ OriginID OID;
+
+public:
+ IssueFact(LoanID LID, OriginID OID, Point Pt)
+ : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+ LoanID getLoanID() const { return LID; }
+ OriginID getOriginID() const { return OID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+ << ")\n";
+ }
+};
+
+class ExpireFact : public Fact {
+ LoanID LID;
+
+public:
+ ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+ LoanID getLoanID() const { return LID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Expire (LoanID: " << getLoanID() << ")\n";
+ }
+};
+
+class AssignOriginFact : public Fact {
+ OriginID OIDDest;
+ OriginID OIDSrc;
+
+public:
+ AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+ : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+ OriginID getDestOriginID() const { return OIDDest; }
+ OriginID getSrcOriginID() const { return OIDSrc; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::AssignOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "AssignOrigin (DestID: " << getDestOriginID()
+ << ", SrcID: " << getSrcOriginID() << ")\n";
+ }
+};
+
+class ReturnOfOriginFact : public Fact {
+ OriginID OID;
+
+public:
+ ReturnOfOriginFact(OriginID OID, Point Pt)
+ : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+ OriginID getReturnedOriginID() const { return OID; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::ReturnOfOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+ }
+};
+
+class FactManager {
+public:
+ llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end())
+ return It->second;
+ return {};
+ }
+
+ void addFact(const CFGBlock *B, Fact *NewFact) {
+ BlockToFactsMap[B].push_back(NewFact);
+ }
+
+ template <typename FactType, typename... Args>
+ FactType *createFact(Args &&...args) {
+ void *Mem = FactAllocator.Allocate<FactType>();
+ return new (Mem) FactType(std::forward<Args>(args)...);
+ }
+
+ void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+ llvm::dbgs() << "==========================================\n";
+ llvm::dbgs() << " Lifetime Analysis Facts:\n";
+ llvm::dbgs() << "==========================================\n";
+ if (const Decl *D = AC.getDecl()) {
+ if (const auto *ND = dyn_cast<NamedDecl>(D))
+ llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+ }
+ // Print blocks in the order as they appear in code for a stable ordering.
+ ForwardDataflowWorklist worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ worklist.enqueueBlock(B);
+ while (const CFGBlock *B = worklist.dequeue()) {
+ llvm::dbgs() << " Block B" << B->getBlockID() << ":\n";
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end()) {
+ for (const Fact *F : It->second) {
+ llvm::dbgs() << " ";
+ F->dump(llvm::dbgs());
+ }
+ }
+ llvm::dbgs() << " End of Block\n";
+ }
+ }
+
+private:
+ llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+ BlockToFactsMap;
+ llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+ FactManager Facts;
+ LoanManager Loans;
+ OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+ FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+ : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+ void run() {
+ llvm::TimeTraceScope TimeProfile("Fact Generation");
+ // Iterate through the CFG blocks in pre-order to ensure that
+ // initializations and destructions are processed in the correct sequence.
+ // TODO: A pre-order traversal utility should be provided by Dataflow
+ // framework.
+ ForwardDataflowWorklist Worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ Worklist.enqueueBlock(B);
+ while (const CFGBlock *Block = Worklist.dequeue()) {
+ CurrentBlock = Block;
+ for (unsigned I = 0; I < Block->size(); ++I) {
+ const CFGElement &Element = Block->Elements[I];
+ CurrentPoint = Point(Block, I);
+ if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+ Visit(CS->getStmt());
+ else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+ Element.getAs<CFGAutomaticObjDtor>())
+ handleDestructor(*DtorOpt);
+ }
+ }
+ }
+
+ void VisitDeclStmt(const DeclStmt *DS) {
+ for (const Decl *D : DS->decls()) {
+ if (const auto *VD = dyn_cast<VarDecl>(D)) {
+ if (hasOrigin(VD->getType())) {
+ if (const Expr *InitExpr = VD->getInit()) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+ if (!hasOrigin(ICE->getType()))
+ return;
+ // An ImplicitCastExpr node itself gets an origin, which flows from the
+ // origin of its sub-expression (after stripping its own parens/casts).
+ if (ICE->getCastKind() == CK_LValueToRValue) {
+ OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+ OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ IceOID, SubExprOID, CurrentPoint));
+ }
+ }
+
+ void VisitUnaryOperator(const UnaryOperator *UO) {
+ if (UO->getOpcode() == UO_AddrOf) {
+ const Expr *SubExpr = UO->getSubExpr();
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+ if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+ // Check if it's a local variable.
+ if (VD->hasLocalStorage()) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+ Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+ LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+ UO->getOperatorLoc());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<IssueFact>(
+ Loan.ID, OID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitReturnStmt(const ReturnStmt *RS) {
+ if (const Expr *RetExpr = RS->getRetValue()) {
+ if (hasOrigin(RetExpr->getType())) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+ }
+ }
+ }
+
+ void VisitBinaryOperator(const BinaryOperator *BO) {
+ if (BO->isAssignmentOp()) {
+ const Expr *LHSExpr = BO->getLHS();
+ const Expr *RHSExpr = BO->getRHS();
+
+ // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+ // LHS must be a pointer/reference type that can be an origin.
+ // RHS must also represent an origin (either another pointer/ref or an
+ // address-of).
+ if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+ if (const auto *VD_LHS =
+ dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+ VD_LHS && hasOrigin(VD_LHS->getType())) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+private:
+ // Check if a type have an origin.
+ bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+ void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+ /// TODO: Also handle trivial destructors (e.g., for `int`
+ // variables) which will never have a CFGAutomaticObjDtor node.
+ /// TODO: Handle loans to temporaries.
+ const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+ if (!DestructedVD)
+ return;
+ // Iterate through all loans to see if any expire.
+ for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+ const Path &LoanPath = Loan.SourcePath;
+ // Check if the loan is for a stack variable and if that variable
+ // is the one being destructed.
+ if (LoanPath.PathKind == Path::Kind::StackVariable) {
+ if (LoanPath.D == DestructedVD) {
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+ FactsContext &FactsCtx;
+ const CFG &Cfg;
+ AnalysisDeclContext &AC;
+ const CFGBlock *CurrentBlock;
+ Point CurrentPoint;
+};
+
+// ========================================================================= //
+// The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+ OriginLoanMap::Factory OriginMapFact;
+ LoanSet::Factory LoanSetFact;
+
+ LoanSet createLoanSet(LoanID LID) {
+ return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+ }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+ /// The map from an origin to the set of loans it contains.
+ /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+ /// not expressions, because expressions are not visible across blocks.
+ OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+ explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+ LifetimeLattice() = default;
+
+ bool operator==(const LifetimeLattice &Other) const {
+ return Origins == Other.Origins;
+ }
+ bool operator!=(const LifetimeLattice &Other) const {
+ return !(*this == Other);
+ }
+
+ LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+ if (auto *Loans = Origins.lookup(OID))
+ return *Loans;
+ return Factory.LoanSetFact.getEmptySet();
+ }
+
+ /// Computes the union of two lattices by performing a key-wise merge of
+ /// their OriginLoanMaps.
+ LifetimeLattice merge(const LifetimeLattice &Other,
+ LifetimeFactory &Factory) const {
+ /// Merge the smaller map into the larger one ensuring we iterate over the
+ /// smaller map.
+ if (Origins.getHeight() < Other.Origins.getHeight())
+ return Other.merge(*this, Factory);
+
+ OriginLoanMap MergedState = Origins;
+ // For each origin in the other map, union its loan set with ours.
+ for (const auto &Entry : Other.Origins) {
+ OriginID OID = Entry.first;
+ LoanSet OtherLoanSet = Entry.second;
+ MergedState = Factory.OriginMapFact.add(
+ MergedState, OID,
+ merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+ }
+ return LifetimeLattice(MergedState);
+ }
+
+ LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+ /// Merge the smaller set into the larger one ensuring we iterate over the
+ /// smaller set.
+ if (a.getHeight() < b.getHeight())
+ std::swap(a, b);
+ LoanSet Result = a;
+ for (LoanID LID : b) {
+ /// TODO(opt): Profiling shows that this loop is a major performance
+ /// bottleneck. Investigate using a BitVector to represent the set of
+ /// loans for improved merge performance.
+ Result = Factory.LoanSetFact.add(Result, LID);
+ }
+ return Result;
+ }
+
+ void dump(llvm::raw_ostream &OS) const {
+ OS << "LifetimeLattice State:\n";
+ if (Origins.isEmpty())
+ OS << " <empty>\n";
+ for (const auto &Entry : Origi...
[truncated]
|
@llvm/pr-subscribers-clang Author: Utkarsh Saxena (usx95) ChangesThis patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291 The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches. Key Components
Next Steps(Not covered in this PR but planned for subsequent patches) The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:
<details> Performance on pathological test cases: </summary> The pathological case arise when we have struct MyObj {
int id;
~MyObj() {}
};
void long_cycle(bool condition) {
MyObj v1{1};
MyObj v2{1};
MyObj v3{1};
MyObj v4{1};
MyObj* p1 = &v1;
MyObj* p2 = &v2;
MyObj* p3 = &v3;
MyObj* p4 = &v4;
while (condition) {
MyObj* temp = p1;
p1 = p2;
p2 = p3;
p3 = p4;
p4 = temp;
}
} </details> Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff 5 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+ AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
FixitUtil.cpp
IntervalPartition.cpp
IssueHash.cpp
+ LifetimeSafety.cpp
LiveVariables.cpp
MacroExpansionContext.cpp
ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+ const clang::CFGBlock *Block;
+ /// Index into Block->Elements().
+ unsigned ElementIndex;
+
+ Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+ : Block(B), ElementIndex(Idx) {}
+
+ bool operator==(const Point &Other) const {
+ return Block == Other.Block && ElementIndex == Other.ElementIndex;
+ }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+ const clang::ValueDecl *D;
+
+ enum class Kind : uint8_t {
+ StackVariable,
+ Heap, // TODO: Handle.
+ Field, // TODO: Handle.
+ ArrayElement, // TODO: Handle.
+ TemporaryObject, // TODO: Handle.
+ StaticOrGlobal, // TODO: Handle.
+ };
+
+ Kind PathKind;
+
+ Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+ /// TODO: Represent opaque loans.
+ /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+ /// is represented as empty LoanSet
+ LoanID ID;
+ Path SourcePath;
+ SourceLocation IssueLoc;
+
+ LoanInfo(LoanID id, Path path, SourceLocation loc)
+ : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+ OriginID ID;
+ OriginKind Kind;
+ union {
+ const clang::ValueDecl *Decl;
+ const clang::Expr *Expression;
+ };
+ OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+ : ID(id), Kind(kind), Decl(D) {}
+ OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+ : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+ LoanManager() = default;
+
+ LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+ NextLoanIDVal++;
+ AllLoans.emplace_back(NextLoanIDVal, path, loc);
+ return AllLoans.back();
+ }
+
+ const LoanInfo *getLoanInfo(LoanID id) const {
+ if (id < AllLoans.size())
+ return &AllLoans[id];
+ return nullptr;
+ }
+ llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+ LoanID NextLoanIDVal = 0;
+ llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+ OriginManager() = default;
+
+ OriginID getNextOriginID() { return NextOriginIDVal++; }
+ OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+ assert(D != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::Variable, D);
+ return AllOrigins.back();
+ }
+ OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+ assert(E != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+ return AllOrigins.back();
+ }
+
+ OriginID getOrCreate(const Expr *E) {
+ auto It = ExprToOriginID.find(E);
+ if (It != ExprToOriginID.end())
+ return It->second;
+
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+ // Origin of DeclRefExpr is that of the declaration it refers to.
+ return getOrCreate(DRE->getDecl());
+ }
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, E);
+ ExprToOriginID[E] = NewID;
+ return NewID;
+ }
+
+ const OriginInfo *getOriginInfo(OriginID id) const {
+ if (id < AllOrigins.size())
+ return &AllOrigins[id];
+ return nullptr;
+ }
+
+ llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+ OriginID getOrCreate(const ValueDecl *D) {
+ auto It = DeclToOriginID.find(D);
+ if (It != DeclToOriginID.end())
+ return It->second;
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, D);
+ DeclToOriginID[D] = NewID;
+ return NewID;
+ }
+
+private:
+ OriginID NextOriginIDVal = 0;
+ llvm::SmallVector<OriginInfo> AllOrigins;
+ llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+ llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+ enum class Kind : uint8_t {
+ /// A new loan is issued from a borrow expression (e.g., &x).
+ Issue,
+ /// A loan expires as its underlying storage is freed (e.g., variable goes
+ /// out of scope).
+ Expire,
+ /// An origin is propagated from a source to a destination (e.g., p = q).
+ AssignOrigin,
+ /// An origin is part of a function's return value.
+ ReturnOfOrigin
+ };
+
+private:
+ Kind K;
+
+protected:
+ Point P;
+ Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+ virtual ~Fact() = default;
+ Kind getKind() const { return K; }
+ Point getPoint() const { return P; }
+
+ template <typename T> const T *getAs() const {
+ if (T::classof(this))
+ return static_cast<const T *>(this);
+ return nullptr;
+ }
+
+ virtual void dump(llvm::raw_ostream &OS) const {
+ OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+ << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+ }
+};
+
+class IssueFact : public Fact {
+ LoanID LID;
+ OriginID OID;
+
+public:
+ IssueFact(LoanID LID, OriginID OID, Point Pt)
+ : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+ LoanID getLoanID() const { return LID; }
+ OriginID getOriginID() const { return OID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+ << ")\n";
+ }
+};
+
+class ExpireFact : public Fact {
+ LoanID LID;
+
+public:
+ ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+ LoanID getLoanID() const { return LID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Expire (LoanID: " << getLoanID() << ")\n";
+ }
+};
+
+class AssignOriginFact : public Fact {
+ OriginID OIDDest;
+ OriginID OIDSrc;
+
+public:
+ AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+ : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+ OriginID getDestOriginID() const { return OIDDest; }
+ OriginID getSrcOriginID() const { return OIDSrc; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::AssignOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "AssignOrigin (DestID: " << getDestOriginID()
+ << ", SrcID: " << getSrcOriginID() << ")\n";
+ }
+};
+
+class ReturnOfOriginFact : public Fact {
+ OriginID OID;
+
+public:
+ ReturnOfOriginFact(OriginID OID, Point Pt)
+ : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+ OriginID getReturnedOriginID() const { return OID; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::ReturnOfOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+ }
+};
+
+class FactManager {
+public:
+ llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end())
+ return It->second;
+ return {};
+ }
+
+ void addFact(const CFGBlock *B, Fact *NewFact) {
+ BlockToFactsMap[B].push_back(NewFact);
+ }
+
+ template <typename FactType, typename... Args>
+ FactType *createFact(Args &&...args) {
+ void *Mem = FactAllocator.Allocate<FactType>();
+ return new (Mem) FactType(std::forward<Args>(args)...);
+ }
+
+ void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+ llvm::dbgs() << "==========================================\n";
+ llvm::dbgs() << " Lifetime Analysis Facts:\n";
+ llvm::dbgs() << "==========================================\n";
+ if (const Decl *D = AC.getDecl()) {
+ if (const auto *ND = dyn_cast<NamedDecl>(D))
+ llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+ }
+ // Print blocks in the order as they appear in code for a stable ordering.
+ ForwardDataflowWorklist worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ worklist.enqueueBlock(B);
+ while (const CFGBlock *B = worklist.dequeue()) {
+ llvm::dbgs() << " Block B" << B->getBlockID() << ":\n";
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end()) {
+ for (const Fact *F : It->second) {
+ llvm::dbgs() << " ";
+ F->dump(llvm::dbgs());
+ }
+ }
+ llvm::dbgs() << " End of Block\n";
+ }
+ }
+
+private:
+ llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+ BlockToFactsMap;
+ llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+ FactManager Facts;
+ LoanManager Loans;
+ OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+ FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+ : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+ void run() {
+ llvm::TimeTraceScope TimeProfile("Fact Generation");
+ // Iterate through the CFG blocks in pre-order to ensure that
+ // initializations and destructions are processed in the correct sequence.
+ // TODO: A pre-order traversal utility should be provided by Dataflow
+ // framework.
+ ForwardDataflowWorklist Worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ Worklist.enqueueBlock(B);
+ while (const CFGBlock *Block = Worklist.dequeue()) {
+ CurrentBlock = Block;
+ for (unsigned I = 0; I < Block->size(); ++I) {
+ const CFGElement &Element = Block->Elements[I];
+ CurrentPoint = Point(Block, I);
+ if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+ Visit(CS->getStmt());
+ else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+ Element.getAs<CFGAutomaticObjDtor>())
+ handleDestructor(*DtorOpt);
+ }
+ }
+ }
+
+ void VisitDeclStmt(const DeclStmt *DS) {
+ for (const Decl *D : DS->decls()) {
+ if (const auto *VD = dyn_cast<VarDecl>(D)) {
+ if (hasOrigin(VD->getType())) {
+ if (const Expr *InitExpr = VD->getInit()) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+ if (!hasOrigin(ICE->getType()))
+ return;
+ // An ImplicitCastExpr node itself gets an origin, which flows from the
+ // origin of its sub-expression (after stripping its own parens/casts).
+ if (ICE->getCastKind() == CK_LValueToRValue) {
+ OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+ OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ IceOID, SubExprOID, CurrentPoint));
+ }
+ }
+
+ void VisitUnaryOperator(const UnaryOperator *UO) {
+ if (UO->getOpcode() == UO_AddrOf) {
+ const Expr *SubExpr = UO->getSubExpr();
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+ if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+ // Check if it's a local variable.
+ if (VD->hasLocalStorage()) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+ Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+ LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+ UO->getOperatorLoc());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<IssueFact>(
+ Loan.ID, OID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitReturnStmt(const ReturnStmt *RS) {
+ if (const Expr *RetExpr = RS->getRetValue()) {
+ if (hasOrigin(RetExpr->getType())) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+ }
+ }
+ }
+
+ void VisitBinaryOperator(const BinaryOperator *BO) {
+ if (BO->isAssignmentOp()) {
+ const Expr *LHSExpr = BO->getLHS();
+ const Expr *RHSExpr = BO->getRHS();
+
+ // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+ // LHS must be a pointer/reference type that can be an origin.
+ // RHS must also represent an origin (either another pointer/ref or an
+ // address-of).
+ if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+ if (const auto *VD_LHS =
+ dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+ VD_LHS && hasOrigin(VD_LHS->getType())) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+private:
+ // Check if a type have an origin.
+ bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+ void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+ /// TODO: Also handle trivial destructors (e.g., for `int`
+ // variables) which will never have a CFGAutomaticObjDtor node.
+ /// TODO: Handle loans to temporaries.
+ const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+ if (!DestructedVD)
+ return;
+ // Iterate through all loans to see if any expire.
+ for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+ const Path &LoanPath = Loan.SourcePath;
+ // Check if the loan is for a stack variable and if that variable
+ // is the one being destructed.
+ if (LoanPath.PathKind == Path::Kind::StackVariable) {
+ if (LoanPath.D == DestructedVD) {
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+ FactsContext &FactsCtx;
+ const CFG &Cfg;
+ AnalysisDeclContext &AC;
+ const CFGBlock *CurrentBlock;
+ Point CurrentPoint;
+};
+
+// ========================================================================= //
+// The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+ OriginLoanMap::Factory OriginMapFact;
+ LoanSet::Factory LoanSetFact;
+
+ LoanSet createLoanSet(LoanID LID) {
+ return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+ }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+ /// The map from an origin to the set of loans it contains.
+ /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+ /// not expressions, because expressions are not visible across blocks.
+ OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+ explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+ LifetimeLattice() = default;
+
+ bool operator==(const LifetimeLattice &Other) const {
+ return Origins == Other.Origins;
+ }
+ bool operator!=(const LifetimeLattice &Other) const {
+ return !(*this == Other);
+ }
+
+ LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+ if (auto *Loans = Origins.lookup(OID))
+ return *Loans;
+ return Factory.LoanSetFact.getEmptySet();
+ }
+
+ /// Computes the union of two lattices by performing a key-wise merge of
+ /// their OriginLoanMaps.
+ LifetimeLattice merge(const LifetimeLattice &Other,
+ LifetimeFactory &Factory) const {
+ /// Merge the smaller map into the larger one ensuring we iterate over the
+ /// smaller map.
+ if (Origins.getHeight() < Other.Origins.getHeight())
+ return Other.merge(*this, Factory);
+
+ OriginLoanMap MergedState = Origins;
+ // For each origin in the other map, union its loan set with ours.
+ for (const auto &Entry : Other.Origins) {
+ OriginID OID = Entry.first;
+ LoanSet OtherLoanSet = Entry.second;
+ MergedState = Factory.OriginMapFact.add(
+ MergedState, OID,
+ merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+ }
+ return LifetimeLattice(MergedState);
+ }
+
+ LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+ /// Merge the smaller set into the larger one ensuring we iterate over the
+ /// smaller set.
+ if (a.getHeight() < b.getHeight())
+ std::swap(a, b);
+ LoanSet Result = a;
+ for (LoanID LID : b) {
+ /// TODO(opt): Profiling shows that this loop is a major performance
+ /// bottleneck. Investigate using a BitVector to represent the set of
+ /// loans for improved merge performance.
+ Result = Factory.LoanSetFact.add(Result, LID);
+ }
+ return Result;
+ }
+
+ void dump(llvm::raw_ostream &OS) const {
+ OS << "LifetimeLattice State:\n";
+ if (Origins.isEmpty())
+ OS << " <empty>\n";
+ for (const auto &Entry : Origi...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a very rudimentary first pass, will take a second look a bit later.
@@ -2842,6 +2844,12 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings( | |||
} | |||
} | |||
|
|||
DEBUG_WITH_TYPE( | |||
"ExperimentalLifetimeAnalysis", if (S.getLangOpts().CPlusPlus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is restricting this to C++ a temporary or is it not planned to support other languages like C?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary to initially scope it to C++. Added a TODO to enable it later.
namespace clang { | ||
namespace { | ||
|
||
struct Point { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to add a comment to specify how to interpret this point. Is this the program point before or after the element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I wonder if Point
is self-descriptive enough or whether this should be ProgramPoint
. Does not have a strong feeling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed Point
at the moment. I think this can be added later when this actually looks useful. At the moment it is primarily mapping Facts to CFGStmt which is not used.
struct Path { | ||
const clang::ValueDecl *D; | ||
|
||
enum class Kind : uint8_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we do not know where does the storage location live. Do we plan to have Unknown
kinds?
Also, sometimes we might learn more about a location during analysis. E.g., in a branch if (p == q)
, the two pointers must have the same Kind. Or if we see delete p;
we learned that p
must be a heap location.
But I am wondering what is this information used for? If we use it to derive lifetime facts, knowing that a variable is on the stack might not be sufficient. We might want to know the exact scope information, see the difference between p
and q
:
{ int p; }
int q;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to have Unknown kinds?
Yes. I have a TODO
to represent Opaque
loans in LoanInfo
where the AccessPath
is not known.
But I am wondering what is this information used for?
The Path is primarily used to generate 2 facts: IssueLoan and Expireloan. When we see a destructor/lifetime end, we find the loans with paths which just got destroyed. These loans should not expire. Each of the entities like stack variable, temporaries, fields (of stack variables) would need to be handled slightly differently and this is only where this PathKind would be used.
Also, sometimes we might learn more about a location during analysis. E.g., in a branch if (p == q), the two pointers must have the same Kind. Or if we see delete p; we learned that p must be a heap location.
I do not have plans to modify the loans and facts based on the control flow or the path. I would prefer to not consider path-sensitive information to keep the analysis simple. Once the facts are generated, the dataflow would not generate new facts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of the entities like stack variable, temporaries, fields (of stack variables) would need to be handled slightly differently and this is only where this PathKind would be used.
I am not yet convinced that we will actually need this, but this may be because we only have one kind implemented so far:
if (LoanPath.PathKind == AccessPath::Kind::StackVariable) {
if (LoanPath.D == DestructedVD) {
Here, I'd imagine when we issue the ExpireLoan, we know exactly what kind of entity is expiring. The AST/CFG should tell us if it was a stack variable, a heap location, or a temporary. After that, I'd expect that the AccessPath
should have all the information we need to check what Loans should expire here (e.g., the fields of an expired loan should also expire).
}; | ||
OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D) | ||
: ID(id), Kind(kind), Decl(D) {} | ||
OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to create a Variable OriginKind with an Expr? If not, I think the ctor should set the right OriginKind and not leave this to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
private: | ||
LoanID NextLoanIDVal = 0; | ||
llvm::SmallVector<LoanInfo> AllLoans; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important to address now, but I wonder whether small buffer optimisations are beneficial here. I can imagine that most functions might end up generating enough loans for us to heap allocate. But we can measure this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Added a TODO to reconsider this during performance evaluations.
|
||
private: | ||
OriginID NextOriginIDVal = 0; | ||
llvm::SmallVector<OriginInfo> AllOrigins; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concerns about the usefulness of small buffer optimisation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a todo to profile and reconsider later.
|
||
OriginID getNextOriginID() { return NextOriginIDVal++; } | ||
OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) { | ||
assert(D != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take a reference instead if it cannot be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Dataflow.run(); | ||
DEBUG_WITH_TYPE("LifetimeDataflow", Dataflow.dump()); | ||
} | ||
} // namespace clang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing new line at the end of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't have any major comments on the design, I have mostly some nits inline.
namespace clang { | ||
namespace { | ||
|
||
struct Point { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I wonder if Point
is self-descriptive enough or whether this should be ProgramPoint
. Does not have a strong feeling.
/// Index into Block->Elements(). | ||
unsigned ElementIndex; | ||
|
||
Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do we need this ctor? Alternatively, we could just have some default member initializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added default member init and removed ctor.
Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0) | ||
: Block(B), ElementIndex(Idx) {} | ||
|
||
bool operator==(const Point &Other) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Point is fairly small, we could take it by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(obsolete)
|
||
class FactManager { | ||
public: | ||
llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could facts change after they are added? Should we always have const Fact *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. No they cannot change. Done.
return; | ||
// An ImplicitCastExpr node itself gets an origin, which flows from the | ||
// origin of its sub-expression (after stripping its own parens/casts). | ||
if (ICE->getCastKind() == CK_LValueToRValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test so I can see how this works?
int a;
int *p = &a;
int **pp = &p;
int *q = *pp;
bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); } | ||
|
||
void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) { | ||
/// TODO: Also handle trivial destructors (e.g., for `int` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at clang::CFG::BuildOptions::AddLifetime
? That should generate CFGElement::LifetimeEnds
for primitive types too.
struct LifetimeLattice { | ||
/// The map from an origin to the set of loans it contains. | ||
/// TODO(opt): To reduce the lattice size, propagate origins of declarations, | ||
/// not expressions, because expressions are not visible across blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because expressions are not visible across blocks
This is not entirely true, e.g., f(cond ? *p : *q)
; Here, *p
and *q
are not declarations but they are visible across blocks. That being said, there are a limited number of ways this can happen, so we still can make the optimization suggested here. It is just a bit more elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.
|
||
OriginLoanMap MergedState = Origins; | ||
// For each origin in the other map, union its loan set with ours. | ||
for (const auto &Entry : Other.Origins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if the merging could be more efficient if it was implemented in ImmutableMap
proper. Not for this PR but a potential optimization in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree. I highly suspect the current merge performance will become a bottleneck for meeting default enablement standards. Merging is by far our most expensive operation.
A merge inside ImmutableMap would be the right place for this optimization. However, it can't be implemented efficiently with the current llvm::ImmutableMap
, which is backed by an AVL tree. The issue is that an AVL tree's structure is heavily dependent on the history of insertions and deletions due to self-balancing rotations. Two maps with many shared keys can have completely different tree shapes, making it impossible to identify and reuse identical subtrees.
For an efficient merge, a persistent HAMT (Hash Array Mapped Trie) would be the ideal backing data structure. An HAMT's structure is determined by its keys' hashes, not insertion order. This property allows a merge algorithm to find identical subtrees and share them, performing work proportional only to the O(difference)
between the two maps. In our use case, this difference is expected to be small: roughly the size of the branch being merged (as compared to the size of the function).
I will mention this and leave it for a future PR.
@ymand was this the data structure you were referring to last time we talked ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is the relationship between HAMT and Patricia Trees, but the latter one is frequently used in dataflow analysis for its great merge performance.
There is one implementation here with a reference to the original paper: https://github.com/facebook/SPARTA/blob/42d4ccd2e5e0ea31ee7969578a685e1d0407c206/include/sparta/PatriciaTreeCore.h#L35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Both of them are essentially tries, Patricia is a compressed binary trie while HAMT has a wide branching factor so should be subtly comparable. I can imagine both of them to support this set-difference-based merging algorithm. Will keep in mind for later. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of Union-Find, but sounds like Patricia Tries are the way to go (I remember really liking this data structure when I first learned about it my first year in college! :))
Transferer Xfer; | ||
|
||
/// Stores the merged analysis state at the entry of each CFG block. | ||
llvm::DenseMap<const CFGBlock *, LifetimeLattice> BlockEntryStates; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we save lookups if we had something akin to llvm::DenseMap<const CFGBlock *, std::pair<LifetimeLattice, LifetimeLattice>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question though. I sort of expected this to require a liveness analysis. Is this something we will need down the line? Or is that not required for polonius?
Good point. This is actually going to be a part of the error reporting. And yes, polonius does have a notion of live origins/loans.
I prefer approach 1 as this more directly works on the invalid memory as compared to origins carrying invalid loans to a potential use. |
This patch is already big enough. But I think a small comment of how reporting will work down the line might be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks very good. I would recommend you consider splitting this into multiple smaller PRs, so reviews can focus on a specific aspect. E.g. the FactGenerator and the LifetimeDataflow seems to deserve focused reviews.
/// Represents the storage location being borrowed, e.g., a specific stack | ||
/// variable. | ||
/// TODO: Handle member accesseslike `s.y`. | ||
struct Path { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like Gabor's point above, maybe a more specific name, like AccessPath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
using LoanID = uint32_t; | ||
using OriginID = uint32_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From experience, it's better to use a proper data type for IDs. Using integers directly almost always causes pain later (say if you need to change the underlying type, or if you want to prevent arithmetic, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
: ID(id), SourcePath(path), IssueLoc(loc) {} | ||
}; | ||
|
||
enum class OriginKind : uint8_t { Variable, ExpressionResult }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a variable (reference) is an expression, this is bit surprising. Is the Variable
case referring to the variable declaration? If so, maybe make that expression (like Declaration vs Expression instead)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah yes. This might be confusing. Yes we are referring to the Declaration here.
OriginKind Kind; | ||
union { | ||
const clang::ValueDecl *Decl; | ||
const clang::Expr *Expression; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider using a std::variant
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah thanks. I think I missed this one. Switched to the llvm::PointerUnion
return AllLoans.back(); | ||
} | ||
|
||
const LoanInfo *getLoanInfo(LoanID id) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any situation in which you expect it valid to provide an id
that's not in AllLoans
? If not, consider returning a reference instead and CHECK failing on bad ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (hasOrigin(VD->getType())) { | ||
if (const Expr *InitExpr = VD->getInit()) { | ||
OriginID DestOID = FactsCtx.Origins.getOrCreate(VD); | ||
OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that InitExpr
has already been visited at this point? Put another way, why the flexible getOrCreate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Good point. I think check-failing on non-existing source originID would help us catch the parts of unhandled AST. Added.
struct LifetimeLattice { | ||
/// The map from an origin to the set of loans it contains. | ||
/// TODO(opt): To reduce the lattice size, propagate origins of declarations, | ||
/// not expressions, because expressions are not visible across blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.
|
||
/// Computes the union of two lattices by performing a key-wise merge of | ||
/// their OriginLoanMaps. | ||
LifetimeLattice merge(const LifetimeLattice &Other, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since this is a dataflow computation, I would stick w/ standard terminology (assuming it fits): join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LifetimeLattice NewSuccEntryState = | ||
OldSuccEntryState.merge(ExitState, LifetimeFact); | ||
if (SuccIt == BlockEntryStates.end() || | ||
NewSuccEntryState != OldSuccEntryState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance optimization (potentially for a separate PR): If this equality operation isn't cheap, then consider changing the merge
operation to indicate whether it produced any changes. This is often cheaper to compute as part of merging (if a little more of a nuisance).
/// Orchestrates the analysis by iterating over the CFG using a worklist | ||
/// algorithm. It computes a fixed point by propagating the LifetimeLattice | ||
/// state through each block until the state no longer changes. | ||
/// TODO: Maybe use the dataflow framework! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the current form, where you use CFGBlocks to drive the computation, using the framework would make sense. But, if you're attached to the current algorithm (comparing at the before state rather than the after state), you'll need to implement it as an alternative (or modify the current one), as we discussed in our last meeting.
|
||
LoanInfo &addLoanInfo(Path path, SourceLocation loc) { | ||
NextLoanIDVal++; | ||
AllLoans.emplace_back(NextLoanIDVal, path, loc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that this starts at LoanId 1 instead of 0 (because the NextLoanIDVal++
happens first)?
Does that affect the getLoanInfo() indexing?
// Origin of DeclRefExpr is that of the declaration it refers to. | ||
return getOrCreate(DRE->getDecl()); | ||
} | ||
OriginID NewID = getNextOriginID(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question -- would this start at 1 instead of 0, and does that affect getOriginInfo(id) indexing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed few comments.
/// Index into Block->Elements(). | ||
unsigned ElementIndex; | ||
|
||
Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added default member init and removed ctor.
namespace clang { | ||
namespace { | ||
|
||
struct Point { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed Point
at the moment. I think this can be added later when this actually looks useful. At the moment it is primarily mapping Facts to CFGStmt which is not used.
}; | ||
OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D) | ||
: ID(id), Kind(kind), Decl(D) {} | ||
OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2842,6 +2844,12 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings( | |||
} | |||
} | |||
|
|||
DEBUG_WITH_TYPE( | |||
"ExperimentalLifetimeAnalysis", if (S.getLangOpts().CPlusPlus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary to initially scope it to C++. Added a TODO to enable it later.
Dataflow.run(); | ||
DEBUG_WITH_TYPE("LifetimeDataflow", Dataflow.dump()); | ||
} | ||
} // namespace clang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
void addFact(const CFGBlock *B, Fact *NewFact) { | ||
BlockToFactsMap[B].push_back(NewFact); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
|
||
struct FactsContext { | ||
FactManager Facts; | ||
LoanManager Loans; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Moved LoanManager and OriginManager into FactManager.
// TODO: A pre-order traversal utility should be provided by Dataflow | ||
// framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I fell for "Reverse postordering is not the same as preordering". I meant reverse post-order which gives a topological order.
What I expect is for the dataflow framework or the CFG to provide this iteration order utility. Right now I have to use a worklist to get this effect which involves queuing all the nodes followed by deqeueing all nodes.
if (hasOrigin(VD->getType())) { | ||
if (const Expr *InitExpr = VD->getInit()) { | ||
OriginID DestOID = FactsCtx.Origins.getOrCreate(VD); | ||
OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Good point. I think check-failing on non-existing source originID would help us catch the parts of unhandled AST. Added.
struct LoanTag {}; | ||
struct OriginTag {}; | ||
|
||
using LoanID = ID<LoanTag>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could even do using LoanID = ID<struct LoanTag>;
, so no separate definition is required for these tags.
public: | ||
OriginManager() = default; | ||
|
||
OriginID getNextOriginID() { return ++NextOriginID; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this and the below APIs to be public? I'd expect users to only rely on the higher level get
and getOrCreate
methods.
} | ||
|
||
void VisitCXXNullPtrLiteralExpr(const CXXNullPtrLiteralExpr *N) { | ||
/// TODO: Handle nullptr expr as a special 'null' loan. Uninintialed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Uninintialed -> Uninitialized
struct Path { | ||
const clang::ValueDecl *D; | ||
|
||
enum class Kind : uint8_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of the entities like stack variable, temporaries, fields (of stack variables) would need to be handled slightly differently and this is only where this PathKind would be used.
I am not yet convinced that we will actually need this, but this may be because we only have one kind implemented so far:
if (LoanPath.PathKind == AccessPath::Kind::StackVariable) {
if (LoanPath.D == DestructedVD) {
Here, I'd imagine when we issue the ExpireLoan, we know exactly what kind of entity is expiring. The AST/CFG should tell us if it was a stack variable, a heap location, or a temporary. After that, I'd expect that the AccessPath
should have all the information we need to check what Loans should expire here (e.g., the fields of an expired loan should also expire).
// TODO: A pre-order traversal utility should be provided by Dataflow | ||
// framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could do that via: AnalysisDeclContext::getAnalysis<PostOrderCFGView>()
. But maybe it would be worth to have a convenience method added to the worklist implementation.
This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291
The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.
Key Components
Loan
,Origin
, andPath
to model memory borrows and the lifetime of pointers.Origin
to the set ofLoans
it may contain at any given program point.llvm-lit
tests validate the analysis by checking the generated facts and final dataflow state.Next Steps
(Not covered in this PR but planned for subsequent patches)
The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:
[[clang::lifetimebound]]
annotations and by conservatively handling opaque/un-annotated functions.Performance on pathological test cases:
The pathological case arise when we have
N
origins initially holdingN
loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches afterN
iterations when all the origins contain all the loans.For
N
= 4, the test case would like like: