Skip to content

[clang][RecordLayoutBuilder] Be stricter about inferring packed-ness in ExternalLayouts #97443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Michael137
Copy link
Member

@Michael137 Michael137 commented Jul 2, 2024

This patch is motivated by the LLDB support required for: #93069

In the presence of [[no_unique_address]], LLDB may ask Clang to lay out types with overlapping field offsets. Because we don't have attributes such as packed or no_unique_address in the LLDB AST, the RecordLayoutBuilder supports an InferAlignment mode, which, in the past, attempted to detect layouts which came from packed structures in order to provide a conservative estimate of alignment for it (since DW_AT_alignment isn't emitted unless explicitly changed with alignas, etc.).

However, in the presence of overlapping fields due to no_unique_address, InferAlignment would set the alignment of structures to 1 for which that's incorrect. This poses issues in some LLDB formatters that synthesize new Clang types and rely on the layout builder to get the FieldOffset of structures right that we did have DWARF offsets for. The result of this is that if we get the alignment wrong, LLDB reads out garbage data from the wrong field offsets.

There are a couple of solutions to this that we considered:

  1. Make LLDB formatters not do the above, and make them more robust to inaccurate alignment.
  2. Remove InferAlignment entirely and rely on Clang emitting DW_AT_alignment for packed structures.
  3. Remove InferAlignment and detect packedness from within LLDB.
  4. Make the InferAlignment logic account for overlapping fields.

Option (1) turned out quite hairy and it's not clear we can achieve this with the tools available for certain STL formatters (particularly std::map). But I would still very much like to simplify this if we can.

Option (2) wouldn't help with GCC-compiled binaries, and if we can get away with LLDB not needing the alignment, then we wouldn't need to increase debug-info size.

Option (3), AFAICT, would require us to reimplement some of the layout logic in the layout builder. Would be great if we can avoid this added complexity.

Option (4) seemed like the best option in the interim. As part of this change I also removed one of the InferAlignment blocks. The test-cases associated with this code-path pass regardless, and from the description of the change that introduced it it's not clear to me how it implies packedness. But happy to be proven wrong. Ultimately it would be great if we can get rid of the InferAlignment infrastructure and support our use-cases in LLDB or DWARF instead.

@Michael137 Michael137 requested a review from JDevlieghere as a code owner July 2, 2024 17:10
@llvmbot llvmbot added clang Clang issues not falling into any other category lldb clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2024

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-clang

Author: Michael Buch (Michael137)

Changes

This patch is motivated by the LLDB support required for: #93069

In the presence of [[no_unique_address]], LLDB may ask Clang to lay out types with overlapping field offsets. Because we don't have attributes such as packed or no_unique_address in the LLDB AST, the RecordLayoutBuilder supports an InferAlignment mode, which, in the past, attempted to detect layouts which came from packed structures in order to provide a conservative estimate of alignment for it (since DW_AT_alignment isn't emitted unless explicitly changed with alignas, etc.).

However, in the presence of overlapping fields due to no_unique_address, InferAlignment would set the alignment of structures to 1 for which that's incorrect. This poses issues in some LLDB formatters that synthesize new Clang types and rely on the layout builder to get the FieldOffset of structures right that we did have DWARF offsets for. The result of this is that if we get the alignment wrong, LLDB reads out garbage data from the wrong field offsets.

There are a couple of solutions to this that we considered:

  1. Make LLDB formatters not do the above, and make them more robust to inaccurate alignment.
  2. Remove InferAlignment entirely and rely on Clang emitting DW_AT_alignment for packed structures.
  3. Remove InferAlignment and detect packedness from within LLDB.
  4. Make the InferAlignment logic account for overlapping fields.

Option (1) turned out quite hairy and it's not clear we can achieve this with the tools available for certain STL formatters (particularly std::map). But I would still very much like to simplify this if we can.

Option (2) wouldn't help with GCC-compiled binaries, and if we can get away with LLDB not needing the alignment, then we wouldn't need to increase debug-info size.

Option (3), AFAICT, would require us to reimplement some of the layout logic in the layout builder. Would be great if we can avoid this added complexity.

Option (4) seemed like the best option in the interim. As part of this change I also removed one of the InferAlignment blocks. The test-cases associated with this code-path pass regardless, and from the description of the change that introduced it it's not clear why specifically the base offsets would influence the Alignment field, and how it would imply packedness. But happy to be proven wrong. Ultimately it would be great if we can get rid of the InferAlignment infrastructure and support our use-cases in LLDB or DWARF instead.


Full diff: https://github.com/llvm/llvm-project/pull/97443.diff

3 Files Affected:

  • (modified) clang/lib/AST/RecordLayoutBuilder.cpp (+17-17)
  • (modified) lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp (-2)
  • (modified) lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp (-2)
diff --git a/clang/lib/AST/RecordLayoutBuilder.cpp b/clang/lib/AST/RecordLayoutBuilder.cpp
index d9bf62c2bbb04..8dbf69c310cbb 100644
--- a/clang/lib/AST/RecordLayoutBuilder.cpp
+++ b/clang/lib/AST/RecordLayoutBuilder.cpp
@@ -802,7 +802,8 @@ class ItaniumRecordLayoutBuilder {
   /// \param Field The field whose offset is being queried.
   /// \param ComputedOffset The offset that we've computed for this field.
   uint64_t updateExternalFieldOffset(const FieldDecl *Field,
-                                     uint64_t ComputedOffset);
+                                     uint64_t ComputedOffset,
+                                     uint64_t PreviousOffset);
 
   void CheckFieldPadding(uint64_t Offset, uint64_t UnpaddedOffset,
                           uint64_t UnpackedOffset, unsigned UnpackedAlign,
@@ -1296,13 +1297,6 @@ ItaniumRecordLayoutBuilder::LayoutBase(const BaseSubobjectInfo *Base) {
     bool Allowed = EmptySubobjects->CanPlaceBaseAtOffset(Base, Offset);
     (void)Allowed;
     assert(Allowed && "Base subobject externally placed at overlapping offset");
-
-    if (InferAlignment && Offset < getDataSize().alignTo(AlignTo)) {
-      // The externally-supplied base offset is before the base offset we
-      // computed. Assume that the structure is packed.
-      Alignment = CharUnits::One();
-      InferAlignment = false;
-    }
   }
 
   if (!Base->Class->isEmpty()) {
@@ -1770,7 +1764,8 @@ void ItaniumRecordLayoutBuilder::LayoutBitField(const FieldDecl *D) {
   // If we're using external layout, give the external layout a chance
   // to override this information.
   if (UseExternalLayout)
-    FieldOffset = updateExternalFieldOffset(D, FieldOffset);
+    FieldOffset = updateExternalFieldOffset(
+        D, FieldOffset, FieldOffsets.empty() ? 0 : FieldOffsets.back());
 
   // Okay, place the bitfield at the calculated offset.
   FieldOffsets.push_back(FieldOffset);
@@ -2063,8 +2058,9 @@ void ItaniumRecordLayoutBuilder::LayoutField(const FieldDecl *D,
   UnpackedFieldOffset = UnpackedFieldOffset.alignTo(UnpackedFieldAlign);
 
   if (UseExternalLayout) {
-    FieldOffset = Context.toCharUnitsFromBits(
-        updateExternalFieldOffset(D, Context.toBits(FieldOffset)));
+    FieldOffset = Context.toCharUnitsFromBits(updateExternalFieldOffset(
+        D, Context.toBits(FieldOffset),
+        FieldOffsets.empty() ? 0 : FieldOffsets.back()));
 
     if (!IsUnion && EmptySubobjects) {
       // Record the fact that we're placing a field at this offset.
@@ -2250,14 +2246,18 @@ void ItaniumRecordLayoutBuilder::UpdateAlignment(
   }
 }
 
-uint64_t
-ItaniumRecordLayoutBuilder::updateExternalFieldOffset(const FieldDecl *Field,
-                                                      uint64_t ComputedOffset) {
+uint64_t ItaniumRecordLayoutBuilder::updateExternalFieldOffset(
+    const FieldDecl *Field, uint64_t ComputedOffset, uint64_t PreviousOffset) {
   uint64_t ExternalFieldOffset = External.getExternalFieldOffset(Field);
 
-  if (InferAlignment && ExternalFieldOffset < ComputedOffset) {
-    // The externally-supplied field offset is before the field offset we
-    // computed. Assume that the structure is packed.
+  // If the externally-supplied field offset is before the field offset we
+  // computed. Check against the previous field offset to make sure we don't
+  // misinterpret overlapping fields as packedness of the structure.
+  const bool assume_packed = ExternalFieldOffset > 0 &&
+                             ExternalFieldOffset < ComputedOffset &&
+                             ExternalFieldOffset > PreviousOffset;
+
+  if (InferAlignment && assume_packed) {
     Alignment = CharUnits::One();
     PreferredAlignment = CharUnits::One();
     InferAlignment = false;
diff --git a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp
index 1488199a3ad2d..ecefa66474772 100644
--- a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp
+++ b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp
@@ -1,5 +1,3 @@
-// XFAIL: *
-
 // RUN: %clangxx_host -gdwarf -o %t %s
 // RUN: %lldb %t \
 // RUN:   -o "expr alignof(OverlappingFields)" \
diff --git a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp
index 15d8de0e3ee98..9f574f9846e35 100644
--- a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp
+++ b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp
@@ -1,5 +1,3 @@
-// XFAIL: *
-
 // RUN: %clangxx_host -gdwarf -o %t %s
 // RUN: %lldb %t \
 // RUN:   -o "expr alignof(OverlappingDerived)" \

// misinterpret overlapping fields as packedness of the structure.
const bool assume_packed = ExternalFieldOffset > 0 &&
ExternalFieldOffset < ComputedOffset &&
ExternalFieldOffset > PreviousOffset;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically "overlapping" would have to account for size of the previous field

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Jul 3, 2024
Depends on:
* llvm#97544
* llvm#97549
* llvm#97551

This patch tries to simplify the way in which the
`std::map` formatter goes from the root `__tree`
pointer to a specific key/value pair.

Previously we would synthesize a structure that
mimicked what `__iter_pointer` looked like
in memory, then called `GetChildCompilerTypeAtIndex`
on it to find the byte offset into that structure
at which the pair was located at, and finally
use that offset through a call to `GetSyntheticChildAtOffset`
to retrieve that pair. Not only was this logic hard to follow,
and encoded the libc++ layout in non-obvious ways, it was also
fragile to alignment miscalculations
(llvm#97443); this would
break after the new layout of std::map landed as part of
inhttps://github.com/llvm/issues/93069.

Instead, this patch simply casts the `__iter_pointer` to
the `__node_pointer` and uses a straightforward
`GetChildMemberWithName("__value_")` to get to the key/value
we care about. This allows us to get rid of some support
infrastructure/class state.

Ideally we would fix the underlying alignment issue, but
this unblocks the libc++ refactor in the interim,
while also benefitting the formatter in terms of readability (in my opinion).
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, the way this currently works is that you do normal field layout, then if you discover that the actual offset of a field is less than the offset normal field layout would produce, you assume the struct is packed. This misses cases where a struct is packed, but the packing doesn't affect the offset of any of the fields. But as you note, this can't be fixed without adjusting the overall architecture.

There's an issue with the current implementation: it skips fields which actually are packed, I think. Consider the following:

struct Empty {};
struct __attribute((packed)) S {
  [[no_unique_address]] Empty a,b,c,d;
  char x;
  int y;
};
S s;

In this case, the field "y" is both overlapping, and at a packed offset. Really, you don't want to check for overlap; you want to ignore empty fields. (Non-empty fields can't overlap.)

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Jul 8, 2024
Depends on:
* llvm#97544
* llvm#97549
* llvm#97551

This patch tries to simplify the way in which the
`std::map` formatter goes from the root `__tree`
pointer to a specific key/value pair.

Previously we would synthesize a structure that
mimicked what `__iter_pointer` looked like
in memory, then called `GetChildCompilerTypeAtIndex`
on it to find the byte offset into that structure
at which the pair was located at, and finally
use that offset through a call to `GetSyntheticChildAtOffset`
to retrieve that pair. Not only was this logic hard to follow,
and encoded the libc++ layout in non-obvious ways, it was also
fragile to alignment miscalculations
(llvm#97443); this would
break after the new layout of std::map landed as part of
inhttps://github.com/llvm/issues/93069.

Instead, this patch simply casts the `__iter_pointer` to
the `__node_pointer` and uses a straightforward
`GetChildMemberWithName("__value_")` to get to the key/value
we care about. This allows us to get rid of some support
infrastructure/class state.

Ideally we would fix the underlying alignment issue, but
this unblocks the libc++ refactor in the interim,
while also benefitting the formatter in terms of readability (in my opinion).
Michael137 added a commit to Michael137/llvm-project that referenced this pull request Jul 8, 2024
Depends on:
* llvm#97544
* llvm#97549
* llvm#97551

This patch tries to simplify the way in which the
`std::map` formatter goes from the root `__tree`
pointer to a specific key/value pair.

Previously we would synthesize a structure that
mimicked what `__iter_pointer` looked like
in memory, then called `GetChildCompilerTypeAtIndex`
on it to find the byte offset into that structure
at which the pair was located at, and finally
use that offset through a call to `GetSyntheticChildAtOffset`
to retrieve that pair. Not only was this logic hard to follow,
and encoded the libc++ layout in non-obvious ways, it was also
fragile to alignment miscalculations
(llvm#97443); this would
break after the new layout of std::map landed as part of
inhttps://github.com/llvm/issues/93069.

Instead, this patch simply casts the `__iter_pointer` to
the `__node_pointer` and uses a straightforward
`GetChildMemberWithName("__value_")` to get to the key/value
we care about. This allows us to get rid of some support
infrastructure/class state.

Ideally we would fix the underlying alignment issue, but
this unblocks the libc++ refactor in the interim,
while also benefitting the formatter in terms of readability (in my opinion).
Michael137 added a commit that referenced this pull request Jul 8, 2024
…#97754)

Depends on #97752

This patch changes the way we retrieve the key/value pair in the
`std::unordered_map::iterator` formatter (similar to how we are changing
it for `std::map::iterator` in
#97713, the motivations being
the same).

The old logic was not very easy to follow, and encoded the libc++ layout
in non-obvious ways. But mainly it was also fragile to alignment
miscalculations (#97443); this
would break once the new layout of `std::unordered_map` landed as part
of #93069.

Instead, this patch simply casts the `__hash_iterator` to a
`__node_pointer` (which is what libc++ does too) and uses a
straightforward `GetChildMemberWithName("__value_")` to get to the
key/value we care about.

The `std::unordered_map` already does it this way, so we align the
iterator counterpart to do the same. We can eventually re-use the
core-part of the `std::unordered_map` and `std::unordered_map::iterator`
formatters. But it will be an easier to change to review once both
simplifications landed.
Michael137 added a commit that referenced this pull request Jul 8, 2024
…97713)

Depends on #97687

Similar to #97579, this patch
simplifies the way in which we retrieve the key/value pair of a
`std::map` (in this case of the `std::map::iterator`).

We do this for the same reason: not only was the old logic hard to
follow, and encoded the libc++ layout in non-obvious ways, it was also
fragile to alignment miscalculations
(#97443); this would break once
the new layout of std::map landed as part of
#93069.

Instead, this patch simply casts the `__iter_pointer` to the
`__node_pointer` and uses a straightforward
`GetChildMemberWithName("__value_")` to get to the key/value we care
about.

We can eventually re-use the core-part of the `std::map` and
`std::map::iterator` formatters. But it will be an easier to change to
review once both simplifications landed.
Michael137 added a commit that referenced this pull request Jul 8, 2024
Depends on:
* #97544
* #97549
* #97551

This patch tries to simplify the way in which the `std::map` formatter
goes from the root `__tree` pointer to a specific key/value pair.

Previously we would:
1. synthesize a structure that mimicked what `__iter_pointer` looked
like in memory
2. call `GetChildCompilerTypeAtIndex` on it to find the byte offset at
which the pair was located in the synthesized structure
3. finally, use that offset through a call to
`GetSyntheticChildAtOffset` to retrieve the key/value pair

Not only was this logic hard to follow, and encoded the libc++ layout in
non-obvious ways, it was also fragile to alignment miscalculations
(#97443); this would break once
the new layout of std::map landed as part of
#93069.

Instead, this patch simply casts the `__iter_pointer` to the
`__node_pointer` and uses a straightforward
`GetChildMemberWithName("__value_")` to get to the key/value we care
about. This allows us to get rid of some support infrastructure/class
state.

Ideally we would fix the underlying alignment issue, but this unblocks
the libc++ refactor in the interim, while also benefitting the formatter
in terms of readability (in my opinion).
@Michael137
Copy link
Member Author

If I'm understanding correctly, the way this currently works is that you do normal field layout, then if you discover that the actual offset of a field is less than the offset normal field layout would produce, you assume the struct is packed. This misses cases where a struct is packed, but the packing doesn't affect the offset of any of the fields. But as you note, this can't be fixed without adjusting the overall architecture.

There's an issue with the current implementation: it skips fields which actually are packed, I think. Consider the following:

struct Empty {};
struct __attribute((packed)) S {
  [[no_unique_address]] Empty a,b,c,d;
  char x;
  int y;
};
S s;

In this case, the field "y" is both overlapping, and at a packed offset. Really, you don't want to check for overlap; you want to ignore empty fields. (Non-empty fields can't overlap.)

Good point, we would still infer packedness in this example, but for the wrong reasons.

Skipping empty fields does seem like a better heuristic here

@labath
Copy link
Collaborator

labath commented Oct 2, 2024

Skipping empty fields does seem like a better heuristic here

FWIW, I (independently) came to the same conclusion when investigating the fallout of #76756, though it's not fully clear to me whether the PR has been updated to do that.

@Michael137
Copy link
Member Author

Skipping empty fields does seem like a better heuristic here

FWIW, I (independently) came to the same conclusion when investigating the fallout of #76756, though it's not fully clear to me whether the PR has been updated to do that.

Not yet, but will have a look at this today

@llvmbot llvmbot added the clang:codegen IR generation bugs: mangling, exceptions, etc. label Oct 2, 2024
Copy link

github-actions bot commented Oct 2, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Michael137 Michael137 force-pushed the clang/lldb/inferalignment-removal branch from 6d55789 to 7d54c5c Compare October 2, 2024 13:03
@Michael137
Copy link
Member Author

If I'm understanding correctly, the way this currently works is that you do normal field layout, then if you discover that the actual offset of a field is less than the offset normal field layout would produce, you assume the struct is packed. This misses cases where a struct is packed, but the packing doesn't affect the offset of any of the fields. But as you note, this can't be fixed without adjusting the overall architecture.

There's an issue with the current implementation: it skips fields which actually are packed, I think. Consider the following:

struct Empty {};
struct __attribute((packed)) S {
  [[no_unique_address]] Empty a,b,c,d;
  char x;
  int y;
};
S s;

In this case, the field "y" is both overlapping, and at a packed offset. Really, you don't want to check for overlap; you want to ignore empty fields. (Non-empty fields can't overlap.)

Updated the patch to ignore empty fields instead of checking overlap with previous fields.

@Michael137
Copy link
Member Author

ping

(btw, @labath is the alignment miscalculation still causing issues for you internally? or did you find a workaround?)

@labath
Copy link
Collaborator

labath commented Oct 11, 2024

(btw, @labath is the alignment miscalculation still causing issues for you internally? or did you find a workaround?)

The fire has been extinguished by #110648, but it's not ideal that any structure with a [[no_unique_address]] is treated as packed. It's not causing any issues right now, but I think we ought to fix this. (The change makes sense to me, but I don't feel it's my place to approve this)

Copy link
Member Author

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gentle ping

@efriedma-quic mind taking another look? Latest iteration skips over empty fields that overlap.

…in ExternalLayouts

This patch is motivated by the LLDB support required for:
llvm#93069

In the presence of `[[no_unique_address]]`, LLDB may ask Clang
to lay out types with overlapping field offsets. Because we don't
have attributes such as `packed` or `no_unique_address` in the LLDB
AST, the `RecordLayoutBuilder` supports an `InferAlignment` mode, which,
in the past, attempted to detect layouts which came from packed
structures in order to provide a conservative estimate of alignment for
it (since `DW_AT_alignment` isn't emitted unless explicitly changed with
`alignas`, etc.).

However, in the presence of overlapping fields due to `no_unique_address`,
`InferAlignment` would set the alignment of structures to `1` for which
that's incorrect. This poses issues in some LLDB formatters that
synthesize new Clang types and rely on the layout builder to get the
`FieldOffset` of structures right that we did have DWARF offsets for.
The result of this is that if we get the alignment wrong, LLDB reads
out garbage data from the wrong field offsets.

There are a couple of solutions to this that we considered:
1. Make LLDB formatters not do the above, and make them more robust
   to inaccurate alignment.
2. Remove `InferAlignment` entirely and rely on Clang emitting
   `DW_AT_alignment` for packed structures.
3. Remove `InferAlignment` and detect packedness from within LLDB.
4. Make the `InferAlignment` logic account for overlapping fields.

Option (1) turned out quite hairy and it's not clear we can achieve
this with the tools available for certain STL formatters (particularly
`std::map`). But I would still very much like to simplify this if we
can.

Option (2) wouldn't help with GCC-compiled binaries, and if we can
get away with LLDB not needing the alignment, then we wouldn't need
to increase debug-info size.

Option (3), AFAICT, would require us to reimplement some of the layout
logic in the layout builder. Would be great if we can avoid this added
complexity.

Option (4) seemed like the best option in the interim. As part of this
change I also removed one of the `InferAlignment` blocks. The test-cases
associated with this code-path pass regardless, and from the description
of the change that introduced it it's not clear why specifically the
base offsets would influence the `Alignment` field, and how it would
imply packedness. But happy to be proven wrong. Ultimately it would be
great if we can get rid of the `InferAlignment` infrastructure and
support our use-cases in LLDB or DWARF instead.
@Michael137 Michael137 force-pushed the clang/lldb/inferalignment-removal branch from 5d4634f to 2d9dc34 Compare October 21, 2024 10:09
@labath
Copy link
Collaborator

labath commented Dec 23, 2024

FWIW, I came across another no_unique_address-related crash today:

$ cat a.cc
struct S {
private:
  int i;
  short s;
};
static_assert(sizeof(S) == 8);

struct T {
  [[no_unique_address]] S s;
  char c;
};
static_assert(sizeof(T) == 8);

T t;
$ clang++ -c -o /tmp/a.out /tmp/a.cc -g
$ lldb /tmp/a.out -o "expr t" -x
(lldb) target create "/tmp/a.out"
Current executable set to '/tmp/a.out' (x86_64).
(lldb) expr t
assertion failed at clang/lib/CodeGen/CGRecordLayoutBuilder.cpp:960 in void (anonymous namespace)::CGRecordLowering::checkBitfieldClipping(bool) const: M.Offset >= Tail && "Bitfield access unit is not clipped"
...

I was hoping this PR would fix it, but it doesn't appear to help.

@Michael137
Copy link
Member Author

FWIW, I came across another no_unique_address-related crash today:

$ cat a.cc
struct S {
private:
  int i;
  short s;
};
static_assert(sizeof(S) == 8);

struct T {
  [[no_unique_address]] S s;
  char c;
};
static_assert(sizeof(T) == 8);

T t;
$ clang++ -c -o /tmp/a.out /tmp/a.cc -g
$ lldb /tmp/a.out -o "expr t" -x
(lldb) target create "/tmp/a.out"
Current executable set to '/tmp/a.out' (x86_64).
(lldb) expr t
assertion failed at clang/lib/CodeGen/CGRecordLayoutBuilder.cpp:960 in void (anonymous namespace)::CGRecordLowering::checkBitfieldClipping(bool) const: M.Offset >= Tail && "Bitfield access unit is not clipped"
...

I was hoping this PR would fix it, but it doesn't appear to help.

Hmmm interesting, yea that looks more like something that #96422 should've addressed. Though this is a special case where the attribute is applied to a non-empty type. Would need to step through the CGRecordLayoutBuilder to see why Clang believes the offsets from DWARF aren't correct.

@Michael137
Copy link
Member Author

Michael137 commented Dec 23, 2024

FWIW, I came across another no_unique_address-related crash today:

$ cat a.cc
struct S {
private:
  int i;
  short s;
};
static_assert(sizeof(S) == 8);

struct T {
  [[no_unique_address]] S s;
  char c;
};
static_assert(sizeof(T) == 8);

T t;
$ clang++ -c -o /tmp/a.out /tmp/a.cc -g
$ lldb /tmp/a.out -o "expr t" -x
(lldb) target create "/tmp/a.out"
Current executable set to '/tmp/a.out' (x86_64).
(lldb) expr t
assertion failed at clang/lib/CodeGen/CGRecordLayoutBuilder.cpp:960 in void (anonymous namespace)::CGRecordLowering::checkBitfieldClipping(bool) const: M.Offset >= Tail && "Bitfield access unit is not clipped"
...

I was hoping this PR would fix it, but it doesn't appear to help.

Hmmm interesting, yea that looks more like something that #96422 should've addressed. Though this is a special case where the attribute is applied to a non-empty type. Would need to step through the CGRecordLayoutBuilder to see why Clang believes the offsets from DWARF aren't correct.

Yea I took a quick look at this and the problem is the following check:

Field->isPotentiallyOverlapping()

S is potentially overlapping because it has the NoUniqueAddress attribute, but that isn't present in the DWARF AST. So when we compute the record layout using the DWARF AST we push_back the incorrect MemberInfo here:

(lldb) p getStorageType(Field->getType()->getAsCXXRecordDecl())->dump()                                                                                                                           
%struct.S.base = type <{ i32, i16 }>                                                                                                                                                              
(lldb) p getStorageType(*Field)->dump()                                                                                                                                                                                                                                                                   
%struct.S = type <{ i32, i16, [2 x i8] }>                                                                 
(lldb) n                                                                                                  

In the isPotentiallyOverlapping case we would've picked the layout without the tail padding.

So we're back to "how do we encode this isPotentiallyOverlapping property in DWARF". OR, maybe we can get away with removing that dependency on the AST in the same way we did with isZeroSize..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category lldb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants