Skip to content

[clang][DebugInfo] Don't mark structured bindings as artificial #100355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 9, 2024

Conversation

Michael137
Copy link
Member

@Michael137 Michael137 commented Jul 24, 2024

This patch is motivated by the debug-info issue in #48909. Clang is currently emitting the DW_AT_artificial attribute on debug-info entries for structured bindings whereas GCC does not. GCC's interpretation of the DWARF spec is more user-friendly in this regard, so we would like to do the same in Clang. CGDebugInfo uses isImplicit to decide which variables to mark artificial (e.g., ImplicitParamDecls, compiler-generated variables, etc.). But for the purposes of debug-info, when we say "artificial", what we really mean in many cases is: "not explicitly spelled out in source". VarDecls that back tuple-like bindings are technically compiler-generated, but that is a confusing notion for debug-info, since these bindings are spelled out in source. The documentation for isImplicit does to some extent imply that implicit variables aren't written in source.

This patch adds another condition to deciding whether a VarDecl should be marked artificial. Specifically, don't treat structured bindings as artificial.

Main alternatives considered

  1. Don't use isImplicit in CGDebugInfo when determining whether to add DW_AT_artificial. Instead use some other property of the AST that would tell us whether a node was explicitly spelled out in source or not
  • Considered using SourceRange or SourceLocation to tell us this, but didn't find a good way to, e.g., correctly identify that the implicit this parameter wasn't spelled out in source (as opposed to an unnamed parameter in a function declaration)
  1. We could've also added a bit to VarDeclBitFields that indicates that a VarDecl is a holding var, but the use-case didn't feel like good enough justification for this
  2. Don't set the VarDecl introduced as part of a tuple-like decomposition as implicit.
  • This may affect AST matching/traversal and this specific use-case wasn't enough to justify such a change

Fixes #48909

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2024

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-clang

Author: Michael Buch (Michael137)

Changes

This patch is motivated by the debug-info issue in #48909. Clang is currently emitting the DW_AT_artificial attribute on debug-info entries for structured bindings whereas GCC does not. GCC's interpretation of the DWARF spec is more user-friendly in this regard, so we would like to do the same in Clang. CGDebugInfo uses isImplicit to decide which variables to mark artificial (e.g., ImplicitParamDecls, compiler-generated variables, etc.). But for the purposes of debug-info, when we say "artificial", what we really mean in many cases is: "not explicitly spelled out in source". VarDecls that back tuple-like bindings are technically compiler-generated, but that is a confusing notion for debug-info, since these bindings are spelled out in source. The documentation for isImplicit does to some extent imply that implicit variables aren't written in source.

This patch avoids marking the variables introduced as part of a decomposition as artificial, resolving the debug-info confusion.

Affects on AST matchers/traversal

  1. Matchers/visitors that previously ignored these VarDecls using TK_IgnoreUnlessSpelledInSource now wouldn't (this is tested in the newly added unit-test)
  2. In MatchASTVisitor::TraverseDecl, the children of BindingDecls aren't visited anyway in TK_IgnoreUnlessSpelledInSource mode, so not having implicit VarDecls won't affect visitation of the BindingDecls they back. This is already tested in the ASTTraverser unit-tests.

Main alternatives considered

  1. Don't use isImplicit in CGDebugInfo when determining whether to add DW_AT_artificial. Instead use some other property of the AST that would tell us whether a node was explicitly spelled out in source or not
    • Considered using SourceRange or SourceLocation to tell us this, but didn't find a good way to, e.g., correctly identify that the implicit this parameter wasn't spelled out in source (as opposed to an unnamed parameter in a function declaration)
  2. In CGDebugInfo, don't mark VarDecls as artificial if they are the holding vars for a BindingDecl
    • We could detect that a VarDecl is actually a tuple-like decomposition using something like the following (modulo error handling/null-checks):
auto * RefExpr = llvm::cast<DeclRefExpr>(Decl->getInit()->IgnoreUnlessSpelledInSource());
bool IsDecomposition = llvm::isa<DecompositionDecl>(RefExpr->getDecl());

Though this felt like we'd be over-relying on the structure of the AST (but maybe that's stable enough?). Alternatively we could've also added a bit to VarDeclBitFields that indicates that a VarDecl is a holding var, but the use-case didn't feel like good enough justification for this.

Fixes #48909


Full diff: https://github.com/llvm/llvm-project/pull/100355.diff

4 Files Affected:

  • (modified) clang/lib/AST/DeclCXX.cpp (-1)
  • (modified) clang/lib/Sema/SemaDeclCXX.cpp (-1)
  • (modified) clang/test/CodeGenCXX/debug-info-structured-binding.cpp (+31-5)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp (+57)
diff --git a/clang/lib/AST/DeclCXX.cpp b/clang/lib/AST/DeclCXX.cpp
index 72d68f39a97a5..2c22f42a5e0a8 100644
--- a/clang/lib/AST/DeclCXX.cpp
+++ b/clang/lib/AST/DeclCXX.cpp
@@ -3327,7 +3327,6 @@ VarDecl *BindingDecl::getHoldingVar() const {
     return nullptr;
 
   auto *VD = cast<VarDecl>(DRE->getDecl());
-  assert(VD->isImplicit() && "holding var for binding decl not implicit");
   return VD;
 }
 
diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index f24912cde275a..f4cc1fc4583aa 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -1316,7 +1316,6 @@ static bool checkTupleLikeDecomposition(Sema &S,
         S.Context.getTrivialTypeSourceInfo(T, Loc), Src->getStorageClass());
     RefVD->setLexicalDeclContext(Src->getLexicalDeclContext());
     RefVD->setTSCSpec(Src->getTSCSpec());
-    RefVD->setImplicit();
     if (Src->isInlineSpecified())
       RefVD->setInlineSpecified();
     RefVD->getLexicalDeclContext()->addHiddenDecl(RefVD);
diff --git a/clang/test/CodeGenCXX/debug-info-structured-binding.cpp b/clang/test/CodeGenCXX/debug-info-structured-binding.cpp
index c88a5cdaa20e7..4a3126a36598d 100644
--- a/clang/test/CodeGenCXX/debug-info-structured-binding.cpp
+++ b/clang/test/CodeGenCXX/debug-info-structured-binding.cpp
@@ -5,20 +5,46 @@
 // CHECK: #dbg_declare(ptr %{{[0-9]+}}, ![[VAR_2:[0-9]+]], !DIExpression(DW_OP_plus_uconst, 4),
 // CHECK: #dbg_declare(ptr %{{[0-9]+}}, ![[VAR_3:[0-9]+]], !DIExpression(DW_OP_deref),
 // CHECK: #dbg_declare(ptr %{{[0-9]+}}, ![[VAR_4:[0-9]+]], !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 4),
+// CHECK: #dbg_declare(ptr %z1, ![[VAR_5:[0-9]+]], !DIExpression()
+// CHECK: #dbg_declare(ptr %z2, ![[VAR_6:[0-9]+]], !DIExpression()
 // CHECK: ![[VAR_0]] = !DILocalVariable(name: "a"
-// CHECK: ![[VAR_1]] = !DILocalVariable(name: "x1"
-// CHECK: ![[VAR_2]] = !DILocalVariable(name: "y1"
-// CHECK: ![[VAR_3]] = !DILocalVariable(name: "x2"
-// CHECK: ![[VAR_4]] = !DILocalVariable(name: "y2"
+// CHECK: ![[VAR_1]] = !DILocalVariable(name: "x1", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
+// CHECK: ![[VAR_2]] = !DILocalVariable(name: "y1", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
+// CHECK: ![[VAR_3]] = !DILocalVariable(name: "x2", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
+// CHECK: ![[VAR_4]] = !DILocalVariable(name: "y2", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
+// CHECK: ![[VAR_5]] = !DILocalVariable(name: "z1", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
+// CHECK: ![[VAR_6]] = !DILocalVariable(name: "z2", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
 
 struct A {
   int x;
   int y;
 };
 
+struct B {
+  int w;
+  int z;
+  template<int> int get();
+  template<> int get<0>() { return w; }
+  template<> int get<1>() { return z; }
+};
+
+namespace std {
+template<typename T> struct tuple_size {
+};
+template<>
+struct tuple_size<B> {
+    static constexpr int value = 2;
+};
+
+template<int, typename T> struct tuple_element {};
+template<> struct tuple_element<0, B> { using type = int; };
+template<> struct tuple_element<1, B> { using type = int; };
+}
+
 int f() {
   A a{10, 20};
   auto [x1, y1] = a;
   auto &[x2, y2] = a;
-  return x1 + y1 + x2 + y2;
+  auto [z1, z2] = B{1, 2};
+  return x1 + y1 + x2 + y2 + z1 + z2;
 }
diff --git a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
index 47a71134d5027..06ac07cb12509 100644
--- a/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
+++ b/clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
@@ -3163,6 +3163,63 @@ void foo()
         matchesConditionally(Code, traverse(TK_IgnoreUnlessSpelledInSource, M),
                              true, {"-std=c++17"}));
   }
+
+  Code = R"cpp(
+struct Pair
+{
+    int x, y;
+};
+
+// Note: these utilities are required to force binding to tuple like structure
+namespace std
+{
+    template <typename E>
+    struct tuple_size
+    {
+    };
+
+    template <>
+    struct tuple_size<Pair>
+    {
+        static constexpr unsigned value = 2;
+    };
+
+    template <unsigned I, class T>
+    struct tuple_element
+    {
+        using type = int;
+    };
+
+};
+
+template <unsigned I>
+int &&get(Pair &&p);
+
+void decompTuple()
+{
+    Pair p{1, 2};
+    auto [a, b] = p;
+
+    a = 3;
+}
+  )cpp";
+
+  {
+    auto M = bindingDecl(hasName("a"));
+    EXPECT_TRUE(
+        matchesConditionally(Code, traverse(TK_AsIs, M), true, {"-std=c++17"}));
+    EXPECT_TRUE(
+        matchesConditionally(Code, traverse(TK_IgnoreUnlessSpelledInSource, M),
+                             true, {"-std=c++17"}));
+  }
+  {
+    auto M = varDecl(hasName("a"));
+    EXPECT_TRUE(
+        matchesConditionally(Code, traverse(TK_AsIs, M), true, {"-std=c++17"}));
+    EXPECT_TRUE(
+        matchesConditionally(Code, traverse(TK_IgnoreUnlessSpelledInSource, M),
+                             true, {"-std=c++17"}));
+  }
 }
 
 TEST(Traversal, traverseNoImplicit) {

@cor3ntin
Copy link
Contributor

Thanks for the patch!

I would actually prefer option 2 here.

isImplicit has other uses - namely for AST matchers and refactoring tools that also rely on knowing whether something was spelled in source.

Note that I cannot comment on what make sense for debugs info, but I don't think modifying properties of the AST is reasonable, and your suggested option 2 is fine.

The structure of the AST is not stable but whoever change is would have to fix the code anyway as it's all the same project so it's a non-issue (as long as you add sufficient test coverage)

@AaronBallman
Copy link
Collaborator

CC @zygoloid for design opinions.

This holding variable is really weird in that it's both not spelled in source (the user doesn't give it a name) but is spelled in source (only exists because of a source construct). I think that makes it hard to know whether it should or should not be marked as implicit.

@shafik
Copy link
Collaborator

shafik commented Jul 24, 2024

The structure of the AST is not stable but whoever change is would have to fix the code anyway as it's all the same project so it's a non-issue (as long as you add sufficient test coverage)

I agree relying on the AST is fine w/ sufficient testing.

@Michael137
Copy link
Member Author

Michael137 commented Jul 24, 2024

Thanks for the patch!

I would actually prefer option 2 here.

isImplicit has other uses - namely for AST matchers and refactoring tools that also rely on knowing whether something was spelled in source.

Note that I cannot comment on what make sense for debugs info, but I don't think modifying properties of the AST is reasonable, and your suggested option 2 is fine.

The structure of the AST is not stable but whoever change is would have to fix the code anyway as it's all the same project so it's a non-issue (as long as you add sufficient test coverage)

Thanks all for the input! Happy to go with option 2 instead. I agree that changing the AST here just for DW_AT_artificial emission does feel a bit drastic/is not obviously the right thing to do. I guess if other consumers of isImplicit bump into the same issue in the future we could consider a more general solution.

@llvmbot llvmbot added clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo labels Jul 24, 2024
@Michael137 Michael137 changed the title [clang][Sema] Don't mark VarDecls of bindings in tuple-like decompositions as implicit [clang][DebugInfo] Don't emit DW_AT_artificial for bindings of tuple-like decompositions Jul 24, 2024
@Michael137
Copy link
Member Author

Updated the PR and description with the alternative approach

@Michael137 Michael137 changed the title [clang][DebugInfo] Don't emit DW_AT_artificial for bindings of tuple-like decompositions [clang][DebugInfo] Don't mark structured bindings as artificial Jul 24, 2024
@zygoloid
Copy link
Collaborator

It looks like this is working for all other kinds of structured binding because EmitDeclare(BindingDecl*, ...) emits a proper DILocalVariable for them. But it skips BindingDecls that have a holding variable for some reason. Perhaps a cleaner approach would be to make it handle that case too, and leave the holding variables themselves as artificial?

@Michael137
Copy link
Member Author

Michael137 commented Jul 25, 2024

It looks like this is working for all other kinds of structured binding because EmitDeclare(BindingDecl*, ...) emits a proper DILocalVariable for them. But it skips BindingDecls that have a holding variable for some reason. Perhaps a cleaner approach would be to make it handle that case too, and leave the holding variables themselves as artificial?

IIUC the suggestion is to unify the tuple and non-tuple cases in CGDebugInfo::EmitDeclare(BindingDecl*, ...)? I was thinking of doing this, but wasn't quite sure how to best re-architect the codegen for structured bindings to make this happen.

We currently emit debug-info for structured bindings in 2 steps. The DecompositionDecl itself gets codegen'd first. Which is what calls into CGDebugInfo::EmitDeclare(BindingDecl*, ...). This creates a debug-info entry for the unnamed decomposition pair. But then CodeGenFunction::EmitDecl calls EmitVarDecl for each holding var. This then goes through various layers of codegen until we hit CGDebugInfo::EmitDeclare(VarDecl*,...). Hence we bail out early in the case of tuple-like bindings in step 1, and defer to CGDebugInfo::EmitDeclare(VarDecl*) to emit debug-info for each individual holding var.

AFAICT, at some point in the codegen process we need to bail out early in the case of structured bindings and we have two options for doing so:

  1. Detect the fact that we're dealing with holding vars before calling CGDebugInfo::EmitDeclare(VarDecl*) . And just defer emission all the debug-info for a decomposition to CGDebugInfo::EmitDeclare(BindingDecl*, ...). I'm actually not sure this would buy us much. One way or another we still need to emit a DILocalVariable for each holding var. Might be missing something here.
  2. OR, detect that we're emitting holding vars in CGDebugInfo::EmitDeclare(BindingDecl*, ...), bail out, and emit the debug-info for them in CGDebugInfo::EmitDeclare(VarDecl*, ...). Which is what the current patch is doing.

Despite having this weird 2-step emission for structured bindings, option 2 seems a bit more natural to me (also I'm not sure option 1 would change anything).

@Michael137
Copy link
Member Author

ping

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this patch makes structured bindings not artificial, so they aren't hidden in debuggers by default. That sounds right to me.

@Michael137 Michael137 merged commit 310a9f3 into llvm:main Aug 9, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category debuginfo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DW_AT_artificial for structured bindings is incorrectly set
7 participants