Skip to content

DW_AT_artificial for structured bindings is incorrectly set #48909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
llvmbot opened this issue Mar 12, 2021 · 12 comments · Fixed by #100355
Closed

DW_AT_artificial for structured bindings is incorrectly set #48909

llvmbot opened this issue Mar 12, 2021 · 12 comments · Fixed by #100355
Labels
bugzilla Issues migrated from bugzilla c++17 clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo

Comments

@llvmbot
Copy link
Member

llvmbot commented Mar 12, 2021

Bugzilla Link 49565
Version trunk
OS Linux
Attachments Test case
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@pogo59,@zygoloid

Extended Description

With structured bindings:

  auto [test_i, test_s] = GetTest();

Clang generates three variables in the DWARF symbols, one unnamed one for the std::pair returned from GetTest(), and then test_i and test_s. But test_i and test_s are marked as DW_AT_artifical even though they were user-declared, and the unnamed one is not, even though it was not user-declared:

0x000048c6:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -40)
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_type	(0x00002b98 "pair<int, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>>")

0x000048d0:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -48)
                  DW_AT_name	("test_i")
                  DW_AT_type	(0x0000484a "int&&")
                  DW_AT_artificial	(true)

0x000048dc:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -56)
                  DW_AT_name	("test_s")
                  DW_AT_type	(0x000030db "basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>&&")
                  DW_AT_artificial	(true)

This is backwards. The unnamed pair should be marked "artificial" while the two named variables in the structured bindings should not.

GCC encodes the artificial tags in way I propose:

0x00005ed0:     DW_TAG_variable
                  DW_AT_type	(0x00003316 "pair<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >")
                  DW_AT_artificial	(true)
                  DW_AT_location	(DW_OP_fbreg -96)

0x00005ed9:     DW_TAG_variable
                  DW_AT_name	("test_i")
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_decl_column	(0x09)
                  DW_AT_type	(0x00005ef8 "type&&")
                  DW_AT_location	(DW_OP_fbreg -40)

0x00005ee8:     DW_TAG_variable
                  DW_AT_name	("test_s")
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_decl_column	(0x11)
                  DW_AT_type	(0x00005efe "type&&")
                  DW_AT_location	(DW_OP_fbreg -48)

Fuchsia clang version 11.0.0 (https://fuchsia.googlesource.com/a/third_party/llvm-project 65f5887)

@dwblaikie
Copy link
Collaborator

Sure, sounds reasonable to me.

Do you have a DWARF Consumer that's using the artificial attribute for anything?

@llvmbot
Copy link
Member Author

llvmbot commented Mar 13, 2021

I just worked around this in the Fuchsia debugger:
https://fuchsia-review.googlesource.com/c/fuchsia/+/500813

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@llvmbot
Copy link
Member Author

llvmbot commented Apr 14, 2024

@llvm/issue-subscribers-debuginfo

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [49565](https://llvm.org/bz49565) | | Version | trunk | | OS | Linux | | Attachments | [Test case](https://user-images.githubusercontent.com/60944935/143761896-ffd25b9a-11e8-4f33-be1d-e13e4a4b03f9.gz) | | Reporter | LLVM Bugzilla Contributor | | CC | @dwblaikie,@pogo59,@zygoloid |

Extended Description

With structured bindings:

  auto [test_i, test_s] = GetTest();

Clang generates three variables in the DWARF symbols, one unnamed one for the std::pair returned from GetTest(), and then test_i and test_s. But test_i and test_s are marked as DW_AT_artifical even though they were user-declared, and the unnamed one is not, even though it was not user-declared:

0x000048c6:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -40)
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_type	(0x00002b98 "pair&lt;int, std::__2::basic_string&lt;char, std::__2::char_traits&lt;char&gt;, std::__2::allocator&lt;char&gt;&gt;&gt;")

0x000048d0:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -48)
                  DW_AT_name	("test_i")
                  DW_AT_type	(0x0000484a "int&amp;&amp;")
                  DW_AT_artificial	(true)

0x000048dc:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -56)
                  DW_AT_name	("test_s")
                  DW_AT_type	(0x000030db "basic_string&lt;char, std::__2::char_traits&lt;char&gt;, std::__2::allocator&lt;char&gt;&gt;&amp;&amp;")
                  DW_AT_artificial	(true)

This is backwards. The unnamed pair should be marked "artificial" while the two named variables in the structured bindings should not.

GCC encodes the artificial tags in way I propose:

0x00005ed0:     DW_TAG_variable
                  DW_AT_type	(0x00003316 "pair&lt;int, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;")
                  DW_AT_artificial	(true)
                  DW_AT_location	(DW_OP_fbreg -96)

0x00005ed9:     DW_TAG_variable
                  DW_AT_name	("test_i")
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_decl_column	(0x09)
                  DW_AT_type	(0x00005ef8 "type&amp;&amp;")
                  DW_AT_location	(DW_OP_fbreg -40)

0x00005ee8:     DW_TAG_variable
                  DW_AT_name	("test_s")
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_decl_column	(0x11)
                  DW_AT_type	(0x00005efe "type&amp;&amp;")
                  DW_AT_location	(DW_OP_fbreg -48)

Fuchsia clang version 11.0.0 (https://fuchsia.googlesource.com/a/third_party/llvm-project 65f5887)

@OCHyams
Copy link
Contributor

OCHyams commented Jul 18, 2024

Just a quick comment as this popped up in my notifications recently:

Looks like this still reproduces, except that we don't emit the anonymous aggregate variable: https://godbolt.org/z/Mf734WG6E

Curiously using a struct rather than std::pair doesn't reproduce it: https://godbolt.org/z/G3dzf6KPh

@Michael137
Copy link
Member

Michael137 commented Jul 18, 2024

Just a quick comment as this popped up in my notifications recently:

Looks like this still reproduces, except that we don't emit the anonymous aggregate variable: https://godbolt.org/z/Mf734WG6E

Hmmm that seems like a regression we'd want to fix. In LLDB we got away with the DW_AT_artificial flag being on the BindingDecls by at least displaying the anonymous pair:

(lldb) n                                                                                   
Process 94216 stopped                                                                      
* thread #1, queue = 'com.apple.main-thread', stop reason = step over                      
    frame #0: 0x0000000100003ee4 a.out`main at binding.cpp:7:5                             
   4                                                                                       
   5    int main() {                                                                       
   6        auto [test_x, test_y] = GetTest();                                             
-> 7        return 0;                                                                      
   8    }                                                                                  
   9                                                                                       
Target 0: (a.out) stopped.                                                                 
(lldb) v                                                                                   
(std::pair<int, int>)  = (first = 97, second = 5)                                          

Now we wouldn't show anything (since LLDB by default hides artificial vars in the variable view):

(lldb) n
Process 94542 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003ee4 a.out`main at binding.cpp:7:5
   4   
   5    int main() {
   6        auto [test_x, test_y] = GetTest();
-> 7        return 0;
   8    }
   9   
Target 0: (a.out) stopped.
(lldb) v
(lldb) v test_x
(int &&) test_x = 0x000000016fdfe924 (&test_x = 97)

Looks like this was explicitly changed in 2399c77c8593b0ed3bd3988a5b72f8bf6b30b0de (#69681). I don't think we can do that without fixing the DW_AT_artificial placement. Sorry should've mentioned it on said review, but I didn't see it fly by.

@Michael137
Copy link
Member

Michael137 commented Jul 18, 2024

I guess there's an argument to be made that since the user can't reference the unnamed pair anyway, maybe LLDB should be smarter about what it shows in frame var? I'm not sure what this would look like in Xcode, worth a try. If we don't show any variable now for the decomposition, I think we should revert to old behaviour.

Curiously using a struct rather than std::pair doesn't reproduce it: https://godbolt.org/z/G3dzf6KPh

This is because they're being handled in a different codepath. For the std::pair/std::tuple case the EmitDeclare(BindingDecl*...) bails out early. And instead is handled in a dedicated branch in CodeGenFunction::EmitDecl. There we emit the "holding variable" VarDecl for the binding via a regular call to EmitDeclareOfAutoVariable. But because VarDecl::isImplicit is true, we deduce that this is an artificial variable.

if (VD->isImplicit())
Flags |= llvm::DINode::FlagArtificial;

Marking isImplicit variables as artificial seems fine because they are compiler-generated, but ends up being interpreted by LLDB incorrectly for this case since the user did write the binding variables out in source, so they should be treated as regular variables. If we didn't mark the bindings as artificial (which would align with GCC), that'd resolve the LLDB issue. So I think we should look into how we might do that instead of working around it in LLDB.

@EugeneZelenko EugeneZelenko added the clang:codegen IR generation bugs: mangling, exceptions, etc. label Jul 18, 2024
@llvmbot
Copy link
Member Author

llvmbot commented Jul 18, 2024

@llvm/issue-subscribers-clang-codegen

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [49565](https://llvm.org/bz49565) | | Version | trunk | | OS | Linux | | Attachments | [Test case](https://user-images.githubusercontent.com/60944935/143761896-ffd25b9a-11e8-4f33-be1d-e13e4a4b03f9.gz) | | Reporter | LLVM Bugzilla Contributor | | CC | @dwblaikie,@pogo59,@zygoloid |

Extended Description

With structured bindings:

  auto [test_i, test_s] = GetTest();

Clang generates three variables in the DWARF symbols, one unnamed one for the std::pair returned from GetTest(), and then test_i and test_s. But test_i and test_s are marked as DW_AT_artifical even though they were user-declared, and the unnamed one is not, even though it was not user-declared:

0x000048c6:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -40)
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_type	(0x00002b98 "pair&lt;int, std::__2::basic_string&lt;char, std::__2::char_traits&lt;char&gt;, std::__2::allocator&lt;char&gt;&gt;&gt;")

0x000048d0:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -48)
                  DW_AT_name	("test_i")
                  DW_AT_type	(0x0000484a "int&amp;&amp;")
                  DW_AT_artificial	(true)

0x000048dc:     DW_TAG_variable
                  DW_AT_location	(DW_OP_fbreg -56)
                  DW_AT_name	("test_s")
                  DW_AT_type	(0x000030db "basic_string&lt;char, std::__2::char_traits&lt;char&gt;, std::__2::allocator&lt;char&gt;&gt;&amp;&amp;")
                  DW_AT_artificial	(true)

This is backwards. The unnamed pair should be marked "artificial" while the two named variables in the structured bindings should not.

GCC encodes the artificial tags in way I propose:

0x00005ed0:     DW_TAG_variable
                  DW_AT_type	(0x00003316 "pair&lt;int, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;")
                  DW_AT_artificial	(true)
                  DW_AT_location	(DW_OP_fbreg -96)

0x00005ed9:     DW_TAG_variable
                  DW_AT_name	("test_i")
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_decl_column	(0x09)
                  DW_AT_type	(0x00005ef8 "type&amp;&amp;")
                  DW_AT_location	(DW_OP_fbreg -40)

0x00005ee8:     DW_TAG_variable
                  DW_AT_name	("test_s")
                  DW_AT_decl_file	("/home/brettw/eraseme3.cc")
                  DW_AT_decl_line	(13)
                  DW_AT_decl_column	(0x11)
                  DW_AT_type	(0x00005efe "type&amp;&amp;")
                  DW_AT_location	(DW_OP_fbreg -48)

Fuchsia clang version 11.0.0 (https://fuchsia.googlesource.com/a/third_party/llvm-project 65f5887)

@Michael137
Copy link
Member

Michael137 commented Jul 18, 2024

@AaronBallman Should we avoid setting the VarDecls that are held by the BindingDecls as implicit, given they exist in the source?

RefVD->setImplicit();

Technically it doesn't contradict what isImplicit says, but is a bit confusing:

/// isImplicit - Indicates whether the declaration was implicitly
/// generated by the implementation. If false, this declaration  
/// was written explicitly in the source code.                   
bool isImplicit() const { return Implicit; }

The more I think about it the more I lean towards the bindings shouldn't be implicit. The DecompositionDecl itself is the implicit one really. The bindings themselves we get from a DecompositionDeclarator which has a SourceLocation and IdentifierInfo for each.

@AaronBallman
Copy link
Collaborator

What exists in the source is actually the BindingDecl and not the underlying VarDecl (that's an implementation detail, basically). Each of the bindings introduces an identifier that refers back to something in the DecompositionDecl (which is an actual VarDecl but isn't one the user named).

So there's two things a debugger could expose: the names of the binding declarations and the value of the decomposition declaration. I would expect the binding names to show up in the debugger for sure, and I think users tend to think of them sort of like variables. The decomposition declaration is really only interesting in that it basically gives you the returned object from the initializer, so maybe that's also worth exposing.

Now, whether this means you should avoid the VarDecl in a BindingDecl or not; I'm not certain how DWARF works, but those variables are implicit as far as the compiler is concerned (the VarDecl doesn't exist for every BindingDecl) and so we want to retain that marking so things like AST matchers continue to behave how folks expect.

Does that help? :-D

@Michael137
Copy link
Member

Michael137 commented Jul 19, 2024

What exists in the source is actually the BindingDecl and not the underlying VarDecl (that's an implementation detail, basically). Each of the bindings introduces an identifier that refers back to something in the DecompositionDecl (which is an actual VarDecl but isn't one the user named).

So there's two things a debugger could expose: the names of the binding declarations and the value of the decomposition declaration. I would expect the binding names to show up in the debugger for sure, and I think users tend to think of them sort of like variables. The decomposition declaration is really only interesting in that it basically gives you the returned object from the initializer, so maybe that's also worth exposing.

Now, whether this means you should avoid the VarDecl in a BindingDecl or not; I'm not certain how DWARF works, but those variables are implicit as far as the compiler is concerned (the VarDecl doesn't exist for every BindingDecl) and so we want to retain that marking so things like AST matchers continue to behave how folks expect.

Does that help? :-D

What exists in the source is actually the BindingDecl and not the underlying VarDecl (that's an implementation detail, basically). Each of the bindings introduces an identifier that refers back to something in the DecompositionDecl (which is an actual VarDecl but isn't one the user named).
...
Now, whether this means you should avoid the VarDecl in a BindingDecl or not; I'm not certain how DWARF works, but those variables are implicit as far as the compiler is concerned (the VarDecl doesn't exist for every BindingDecl) and so we want to retain that marking so things like AST matchers continue to behave how folks expect.

I see, that makes sense. Thanks for clarifying!

AFAICT there are two options to addressing the LLDB issue:

  1. Align Clang with GCC's DWARF output and don't put the DW_AT_artifical (which is what this issue is about).
  2. Make LLDB show the binding variables despite them being artificial. Currently LLDB defaults to ignoring all artificial variables with a few exemptions (for coroutines and the this function parameter). We could flip this logic and allow all artificial variables to be displayed and ignore the ones we don't want to show (AFAICT there's only 1, which is __vla_expr).

For (1), we currently mark anything that is isImplicit as DW_AT_artificial. The DWARFv5 spec says artificial variables are ones which are

artificially generated by a compiler and not explicitly declared by the source

From your description of how the binding variables are introduced, in a very technical sense isImplicit does comply with the DWARF definition of "artificial". But I'm not sure whether this technical definition is what the DWARF spec really meant by "not explicitly declared by the source". I wonder if there's a more appropriate heuristic we want to use to mark variables as artificial. Is there an API we could use that would tell use whether a variable was actually spelled out in the source or not? @dwblaikie @adrian-prantl to get a second pair of eyes on this.

On the other hand, (2) would be the easier way to resolve this particular LLDB issue I suppose.

@dwblaikie
Copy link
Collaborator

A quick/rough estimate from me would be to do what gcc does. Both for consistency and because it feels like the right thing to me.

As for how to implement it, again as a rough guess I'd want to label the variable non-artificial if the variables name was chosen by the user, but I'm not sure if/how to do that with clangs ast. I think it would be possible to check some property of the identifier info

@AaronBallman
Copy link
Collaborator

Is there an API we could use that would tell use whether a variable was actually spelled out in the source or not?

Not directly. It turns out that NamedDecl doesn't track a DeclarationNameInfo (which has source location information for the name), but instead relies on the SourceLocation for the declaration to be the source location of the identifier. So you could probably try out something like:

if (const NamedDecl *ND = dyn_cast<NamedDecl>(YourDecl)) {
  if (ND->getDeclName().isIdentifier()) {
    SourceLocation NameLoc = ND->getLocation();
    if (!SrcMgr.isWrittenInBuiltinFile(NameLoc) &&
        !SrcMgr.isWrittenInCommandLineFile(NameLoc) &&
        !SrcMgr.isWrittenInScratchSpace(NameLoc)) {
      // User-provided enough for our purposes.
    }
  }
}

but this might run into troubles for implicitly-generated code that provides valid source locations based on other user-provided tokens.

Michael137 added a commit that referenced this issue Aug 9, 2024
)

This patch is motivated by the debug-info issue in
#48909. Clang is currently
emitting the `DW_AT_artificial` attribute on debug-info entries for
structured bindings whereas GCC does not. GCC's interpretation of the
DWARF spec is more user-friendly in this regard, so we would like to do
the same in Clang. [`CGDebugInfo` uses `isImplicit` to decide which
variables to mark
artificial](https://github.com/llvm/llvm-project/blob/0c4023ae3b64c54ff51947e9776aee0e963c5635/clang/lib/CodeGen/CGDebugInfo.cpp#L4783-L4784)
(e.g., `ImplicitParamDecls`, compiler-generated variables, etc.). But
for the purposes of debug-info, when we say "artificial", what we really
mean in many cases is: "not explicitly spelled out in source".
`VarDecl`s that back tuple-like bindings are [technically
compiler-generated](#48909 (comment)),
but that is a confusing notion for debug-info, since these bindings
*are* spelled out in source. The [documentation for
`isImplicit`](https://github.com/llvm/llvm-project/blob/68a0d0c76223736351fd7c452bca3ba9d80ca342/clang/include/clang/AST/DeclBase.h#L596-L600)
does to some extent imply that implicit variables aren't written in
source.

This patch adds another condition to deciding whether a `VarDecl` should
be marked artificial. Specifically, don't treat structured bindings as
artificial.

**Main alternatives considered**
1. Don't use `isImplicit` in `CGDebugInfo` when determining whether to
add `DW_AT_artificial`. Instead use some other property of the AST that
would tell us whether a node was explicitly spelled out in source or not
* Considered using `SourceRange` or `SourceLocation` to tell us this,
but didn't find a good way to, e.g., correctly identify that the
implicit `this` parameter wasn't spelled out in source (as opposed to an
unnamed parameter in a function declaration)
2. We could've also added a bit to `VarDeclBitFields` that indicates
that a `VarDecl` is a holding var, but the use-case didn't feel like
good enough justification for this
3. Don't set the `VarDecl` introduced as part of a tuple-like
decomposition as implicit.
* This may affect AST matching/traversal and this specific use-case
wasn't enough to justify such a change

Fixes #48909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla c++17 clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants