Skip to content

(New) Add option to generate additional debug info for expression dereferencing pointer to pointers #95298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 15, 2024

Conversation

huangjd
Copy link
Contributor

@huangjd huangjd commented Jun 12, 2024

This is a different implementation to #94100, which has been reverted.

When -fdebug-info-for-profiling is specified, for any Load expression if the pointer operand is not a declared variable, clang will emit debug info describing the type of the pointer operand (which can be an intermediate expr)

…eferencing pointer to pointers

This is a different implementation to llvm#94100, which has been reverted.

When -fdebug-info-for-profiling is specified, for any Load expression if
the pointer operand is not a declared variable, clang will emit debug
info describing the type of the pointer operand (which can be an
intermediate expr)
@huangjd huangjd added clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo labels Jun 12, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 12, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-clang-codegen

Author: William Junda Huang (huangjd)

Changes

This is a different implementation to #94100, which has been reverted.

When -fdebug-info-for-profiling is specified, for any Load expression if the pointer operand is not a declared variable, clang will emit debug info describing the type of the pointer operand (which can be an intermediate expr)


Full diff: https://github.com/llvm/llvm-project/pull/95298.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+51)
  • (modified) clang/lib/CodeGen/CGDebugInfo.h (+6)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+20-1)
  • (added) clang/test/CodeGenCXX/debug-info-ptr-to-ptr.cpp (+115)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 99e12da0081af..d3b754d14b19e 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -5746,6 +5746,57 @@ void CGDebugInfo::EmitExternalVariable(llvm::GlobalVariable *Var,
   Var->addDebugInfo(GVE);
 }
 
+void CGDebugInfo::EmitPseudoVariable(CGBuilderTy &Builder,
+                                     llvm::Instruction *Value, QualType Ty) {
+  // Only when -g2 or above is specified, debug info for variables will be
+  // generated.
+  if (CGM.getCodeGenOpts().getDebugInfo() <=
+      llvm::codegenoptions::DebugLineTablesOnly)
+    return;
+
+  llvm::DILocation *DIL = Value->getDebugLoc().get();
+  if (!DIL)
+    return;
+
+  llvm::DIFile *Unit = DIL->getFile();
+  llvm::DIType *Type = getOrCreateType(Ty, Unit);
+
+  // Check if Value is already a declared variable and has debug info, in this
+  // case we have nothing to do. Clang emits declared variable as alloca, and
+  // it is loaded upon use, so we identify such pattern here.
+  if (llvm::LoadInst *Load = dyn_cast<llvm::LoadInst>(Value)) {
+    llvm::Value *Var = Load->getPointerOperand();
+    // There can be implicit type cast applied on a variable if it is an opaque
+    // ptr, in this case its debug info may not match the actual type of object
+    // being used as in the next instruction, so we will need to emit a pseudo
+    // variable for type-casted value.
+    auto DeclareTypeMatches = [&](auto *DbgDeclare) {
+      return DbgDeclare->getVariable()->getType() == Type;
+    };
+    if (any_of(llvm::findDbgDeclares(Var), DeclareTypeMatches) ||
+        any_of(llvm::findDVRDeclares(Var), DeclareTypeMatches))
+      return;
+  }
+
+  // Find the correct location to insert the debug value.
+  llvm::BasicBlock *InsertBB = Value->getParent();
+  llvm::Instruction *InsertBefore = Value->getIterator()->getNextNode();
+  if (llvm::InvokeInst *Invoke = dyn_cast<llvm::InvokeInst>(Value)) {
+    InsertBB = Invoke->getNormalDest();
+    InsertBefore = InsertBB->size() > 0 ? &(InsertBB->front()) : nullptr;
+  }
+
+  llvm::DILocalVariable *D = DBuilder.createAutoVariable(
+      LexicalBlockStack.back(), "", nullptr, 0, Type, false,
+      llvm::DINode::FlagArtificial);
+  if (InsertBefore)
+    DBuilder.insertDbgValueIntrinsic(Value, D, DBuilder.createExpression(), DIL,
+                                   InsertBefore);
+  else
+    DBuilder.insertDbgValueIntrinsic(Value, D, DBuilder.createExpression(), DIL,
+                                     InsertBB);
+}
+
 void CGDebugInfo::EmitGlobalAlias(const llvm::GlobalValue *GV,
                                   const GlobalDecl GD) {
 
diff --git a/clang/lib/CodeGen/CGDebugInfo.h b/clang/lib/CodeGen/CGDebugInfo.h
index 8fe738be21568..da466837aa3c3 100644
--- a/clang/lib/CodeGen/CGDebugInfo.h
+++ b/clang/lib/CodeGen/CGDebugInfo.h
@@ -530,6 +530,12 @@ class CGDebugInfo {
   /// Emit information about an external variable.
   void EmitExternalVariable(llvm::GlobalVariable *GV, const VarDecl *Decl);
 
+  /// Emit a pseudo variable and debug info for an intermediate value if it does
+  /// not correspond to a variable in the source code, so that a profiler can
+  /// track more accurate usage of certain instructions of interest.
+  void EmitPseudoVariable(CGBuilderTy &Builder, llvm::Instruction *Value,
+                          QualType Ty);
+
   /// Emit information about global variable alias.
   void EmitGlobalAlias(const llvm::GlobalValue *GV, const GlobalDecl Decl);
 
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 1b144c178ce96..58f0a3113b4f8 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -1937,7 +1937,26 @@ Value *ScalarExprEmitter::VisitMemberExpr(MemberExpr *E) {
     }
   }
 
-  return EmitLoadOfLValue(E);
+  llvm::Value *Result = EmitLoadOfLValue(E);
+
+  // If -fdebug-info-for-profiling is specified, emit a pseudo variable and its
+  // debug info for the pointer, even if there is no variable associated with
+  // the pointer's expression.
+  if (CGF.CGM.getCodeGenOpts().DebugInfoForProfiling && CGF.getDebugInfo()) {
+    if (llvm::LoadInst *Load = dyn_cast<llvm::LoadInst>(Result)) {
+      if (llvm::GetElementPtrInst *GEP =
+              dyn_cast<llvm::GetElementPtrInst>(Load->getPointerOperand())) {
+        if (llvm::Instruction *Pointer =
+                dyn_cast<llvm::Instruction>(GEP->getPointerOperand())) {
+          QualType Ty = E->getBase()->getType();
+          if (!E->isArrow())
+            Ty = CGF.getContext().getPointerType(Ty);
+          CGF.getDebugInfo()->EmitPseudoVariable(Builder, Pointer, Ty);
+        }
+      }
+    }
+  }
+  return Result;
 }
 
 Value *ScalarExprEmitter::VisitArraySubscriptExpr(ArraySubscriptExpr *E) {
diff --git a/clang/test/CodeGenCXX/debug-info-ptr-to-ptr.cpp b/clang/test/CodeGenCXX/debug-info-ptr-to-ptr.cpp
new file mode 100644
index 0000000000000..51ce0f107b5e3
--- /dev/null
+++ b/clang/test/CodeGenCXX/debug-info-ptr-to-ptr.cpp
@@ -0,0 +1,115 @@
+// Test debug info for intermediate value of a chained pointer deferencing
+// expression when the flag -fdebug-info-for-pointer-type is enabled.
+// RUN: %clang_cc1 -emit-llvm -triple x86_64-linux-gnu %s -fdebug-info-for-profiling -debug-info-kind=constructor -o - | FileCheck %s
+
+class A {
+public:
+  int i;
+  char c;
+  void *p;
+  int arr[3];
+};
+
+class B {
+public:
+  A* a;
+};
+
+class C {
+public:
+  B* b;
+  A* a;
+  A arr[10];
+};
+
+// CHECK-LABEL: define dso_local noundef i32 @{{.*}}func1{{.*}}(
+// CHECK:         [[A_ADDR:%.*]] = getelementptr inbounds %class.B, ptr {{%.*}}, i32 0, i32 0, !dbg [[DBG1:![0-9]+]]
+// CHECK-NEXT:    [[A:%.*]] = load ptr, ptr [[A_ADDR]], align {{.*}}, !dbg [[DBG1]]
+// CHECK-NEXT:    call void @llvm.dbg.value(metadata ptr [[A]], metadata [[META1:![0-9]+]], metadata !DIExpression()), !dbg [[DBG1]]
+// CHECK-NEXT:    {{%.*}} = getelementptr inbounds %class.A, ptr [[A]], i32 0, i32 0,
+int func1(B *b) {
+  return b->a->i;
+}
+
+// Should generate a pseudo variable when pointer is type-casted.
+// CHECK-LABEL: define dso_local noundef ptr @{{.*}}func2{{.*}}(
+// CHECK:         call void @llvm.dbg.declare(metadata ptr [[B_ADDR:%.*]], metadata [[META2:![0-9]+]], metadata !DIExpression())
+// CHECK-NEXT:    [[B:%.*]] = load ptr, ptr [[B_ADDR]],
+// CHECK-NEXT:    call void @llvm.dbg.value(metadata ptr [[B]], metadata [[META3:![0-9]+]], metadata !DIExpression())
+// CHECK-NEXT:    {{%.*}} = getelementptr inbounds %class.B, ptr [[B]], i32 0,
+A* func2(void *b) {
+  return ((B*)b)->a;
+}
+
+// Should not generate pseudo variable in this case.
+// CHECK-LABEL: define dso_local noundef i32 @{{.*}}func3{{.*}}(
+// CHECK:    call void @llvm.dbg.declare(metadata ptr [[B_ADDR:%.*]], metadata [[META4:![0-9]+]], metadata !DIExpression())
+// CHECK:    call void @llvm.dbg.declare(metadata ptr [[LOCAL1:%.*]], metadata [[META5:![0-9]+]], metadata !DIExpression())
+// CHECK-NOT: call void @llvm.dbg.value(metadata ptr
+int func3(B *b) {
+  A *local1 = b->a;
+  return local1->i;
+}
+
+// CHECK-LABEL: define dso_local noundef signext i8 @{{.*}}func4{{.*}}(
+// CHECK:         [[A_ADDR:%.*]] = getelementptr inbounds %class.C, ptr {{%.*}}, i32 0, i32 1
+// CHECK-NEXT:    [[A:%.*]] = load ptr, ptr [[A_ADDR]],
+// CHECK-NEXT:    call void @llvm.dbg.value(metadata ptr [[A]], metadata [[META6:![0-9]+]], metadata !DIExpression())
+// CHECK-NEXT:    {{%.*}} = getelementptr inbounds %class.A, ptr [[A]], i32 0, i32 0,
+// CHECK:         [[CALL:%.*]] = call noundef ptr @{{.*}}foo{{.*}}(
+// CHECK-NEXT:    call void @llvm.dbg.value(metadata ptr [[CALL]], metadata [[META6]], metadata !DIExpression())
+// CHECK-NEXT:    [[I1:%.*]] = getelementptr inbounds %class.A, ptr [[CALL]], i32 0, i32 1
+char func4(C *c) {
+  extern A* foo(int x);
+  return foo(c->a->i)->c;
+}
+
+// CHECK-LABEL: define dso_local noundef signext i8 @{{.*}}func5{{.*}}(
+// CHECK:         call void @llvm.dbg.declare(metadata ptr {{%.*}}, metadata [[META7:![0-9]+]], metadata !DIExpression())
+// CHECK:         call void @llvm.dbg.declare(metadata ptr {{%.*}}, metadata [[META8:![0-9]+]], metadata !DIExpression())
+// CHECK:         [[A_ADDR:%.*]] = getelementptr inbounds %class.A, ptr {{%.*}}, i64 {{%.*}},
+// CHECK-NEXT:    call void @llvm.dbg.value(metadata ptr [[A_ADDR]], metadata [[META9:![0-9]+]], metadata !DIExpression())
+// CHECK-NEXT:    {{%.*}} = getelementptr inbounds %class.A, ptr [[A_ADDR]], i32 0, i32 1,
+char func5(void *arr, int n) {
+  return ((A*)arr)[n].c;
+}
+
+// CHECK-LABEL: define dso_local noundef i32 @{{.*}}func6{{.*}}(
+// CHECK:         call void @llvm.dbg.declare(metadata ptr {{%.*}}, metadata [[META10:![0-9]+]], metadata !DIExpression())
+// CHECK:         call void @llvm.dbg.value(metadata ptr {{%.*}}, metadata [[META11:![0-9]+]], metadata !DIExpression())
+int func6(B &b) {
+  return reinterpret_cast<A&>(b).i;
+}
+
+// CHECK-LABEL: define dso_local noundef i32 @{{.*}}global{{.*}}(
+// CHECK:         [[GA:%.*]] = load ptr, ptr @ga
+// CHECK-NEXT:    call void @llvm.dbg.value(metadata ptr [[GA]], metadata [[META12:![0-9]+]], metadata !DIExpression())
+A *ga;
+int global() {
+  return ga->i;
+}
+
+
+// CHECK-DAG: [[META_A:![0-9]+]] = distinct !DICompositeType(tag: DW_TAG_class_type, name: "A",
+// CHECK-DAG: [[META_AP:![0-9]+]] = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: [[META_A]],
+// CHECK-DAG: [[META_B:![0-9]+]] = distinct !DICompositeType(tag: DW_TAG_class_type, name: "B",
+// CHECK-DAG: [[META_BP:![0-9]+]] = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: [[META_B]],
+// CHECK-DAG: [[META_C:![0-9]+]] = distinct !DICompositeType(tag: DW_TAG_class_type, name: "C",
+// CHECK-DAG: [[META_CP:![0-9]+]] = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: [[META_C]],
+// CHECK-DAG: [[META_VP:![0-9]+]] = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null,
+// CHECK-DAG: [[META_I32:![0-9]+]] = !DIBasicType(name: "int", size: 32,
+// CHECK-DAG: [[META_BR:![0-9]+]] = !DIDerivedType(tag: DW_TAG_reference_type, baseType: [[META_B]],
+
+// CHECK-DAG: [[DBG1]] = !DILocation(line: 31, column: 13,
+// CHECK-DAG: [[META1]] = !DILocalVariable(scope: {{.*}}, type: [[META_AP]], flags: DIFlagArtificial)
+// CHECK-DAG: [[META2]] = !DILocalVariable(name: "b", arg: 1, scope: {{.*}}, file: {{.*}}, line: 40, type: [[META_VP]])
+// CHECK-DAG: [[META3]] = !DILocalVariable(scope: {{.*}}, type: [[META_BP]], flags: DIFlagArtificial)
+// CHECK-DAG: [[META4]] = !DILocalVariable(name: "b", arg: 1, scope: {{.*}}, file: {{.*}}, line: 49, type: [[META_BP]])
+// CHECK-DAG: [[META5]] = !DILocalVariable(name: "local1", scope: {{.*}}, file: {{.*}}, line: 50, type: [[META_AP]])
+// CHECK-DAG: [[META6]] = !DILocalVariable(scope: {{.*}}, type: [[META_AP]], flags: DIFlagArtificial)
+// CHECK-DAG: [[META7]] = !DILocalVariable(name: "arr", arg: 1, scope: {{.*}}, file: {{.*}}, line: 73, type: [[META_VP]])
+// CHECK-DAG: [[META8]] = !DILocalVariable(name: "n", arg: 2, scope: {{.*}}, file: {{.*}}, line: 73, type: [[META_I32]])
+// CHECK-DAG: [[META9]] = !DILocalVariable(scope: {{.*}}, type: [[META_AP]], flags: DIFlagArtificial)
+// CHECK-DAG: [[META10]] = !DILocalVariable(name: "b", arg: 1, scope: {{.*}}, file: {{.*}}, line: 80, type: [[META_BR]])
+// CHECK-DAG: [[META11]] = !DILocalVariable(scope: {{.*}}, type: [[META_AP]], flags: DIFlagArtificial)
+// CHECK-DAG: [[META12]] = !DILocalVariable(scope: {{.*}}, type: [[META_AP]], flags: DIFlagArtificial)

@huangjd
Copy link
Contributor Author

huangjd commented Jun 12, 2024

It seems the restriction for dbg.value appearing in the front end was lifted recently, so now I can just emit dbg.value directly

Copy link

github-actions bot commented Jun 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@pogo59
Copy link
Collaborator

pogo59 commented Jun 13, 2024

You should have a look at this imminent patch which will affect your test.

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in principle, with a couple of nits.

Comment on lines 5781 to 5787
// Find the correct location to insert the debug value.
llvm::BasicBlock *InsertBB = Value->getParent();
llvm::Instruction *InsertBefore = Value->getIterator()->getNextNode();
if (llvm::InvokeInst *Invoke = dyn_cast<llvm::InvokeInst>(Value)) {
InsertBB = Invoke->getNormalDest();
InsertBefore = InsertBB->size() > 0 ? &(InsertBB->front()) : nullptr;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the move towards debug-info away from intrinsics involves not inserting with raw instruction pointers, but instead iterators [0]. In practice I believe all that needs to be done is using InsertBB->begin() to fetch an iterator, you'll likely need to take end() and check it below for the empty-block case.

I understand #94224 provides a more graceful solution by allowing insertion at end(), however that won't land for a short while.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Copy link
Contributor

@SLTozer SLTozer Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add a little extra to this, it looks like what this code is trying to do is the same as llvm::Instruction::getInsertionPointAfterDef - would that be a suitable substitution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

llvm::DIType *Type = getOrCreateType(Ty, Unit);

// Check if Value is already a declared variable and has debug info, in this
// case we have nothing to do. Clang emits declared variable as alloca, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// case we have nothing to do. Clang emits declared variable as alloca, and
// case we have nothing to do. Clang emits declared variable as alloca, and
Suggested change
// case we have nothing to do. Clang emits declared variable as alloca, and
// case we have nothing to do. Clang emits declared variables as alloca, and

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@huangjd huangjd requested a review from jmorse June 14, 2024 02:13
Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for rapidly rewriting this. I think dereferencing the insertion iterator with &** will eventually need addressing with the move away from intrinsics and insertion, but that can be a matter for when the DIBuilder APIs get deprecated / retired.

@SLTozer
Copy link
Contributor

SLTozer commented Jun 14, 2024

Small recommendation that the patch @pogo59 mentioned above has now landed, so before merging this you should rebase and update the test accordingly.

@huangjd huangjd merged commit 675d8d6 into llvm:main Jun 15, 2024
7 checks passed
@dyung
Copy link
Collaborator

dyung commented Jun 15, 2024

@huangjd, I'm not sure the buildbot emails on failure are working, but the test you added seems to be failing on many bots. Can you take a look and revert if you need time to investigate?

huangjd added a commit to huangjd/llvm-project that referenced this pull request Jun 15, 2024
Fix test case in llvm#95298 because another recent submitted patch removed
llvm.dbg intrinsics, updated test case accordingly
@huangjd
Copy link
Contributor Author

huangjd commented Jun 15, 2024

@huangjd, I'm not sure the buildbot emails on failure are working, but the test you added seems to be failing on many bots. Can you take a look and revert if you need time to investigate?

Test updated #95637

jayfoad pushed a commit that referenced this pull request Jun 15, 2024
Fix test case in #95298 because another recent submitted patch removed
llvm.dbg intrinsics, updated test case accordingly
@huangjd huangjd deleted the debuginfoforpointertype branch June 17, 2024 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category debuginfo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants