-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[clang][RecordLayoutBuilder] Be stricter about inferring packed-ness in ExternalLayouts #97443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[clang][RecordLayoutBuilder] Be stricter about inferring packed-ness in ExternalLayouts #97443
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Michael Buch (Michael137) ChangesThis patch is motivated by the LLDB support required for: #93069 In the presence of However, in the presence of overlapping fields due to There are a couple of solutions to this that we considered:
Option (1) turned out quite hairy and it's not clear we can achieve this with the tools available for certain STL formatters (particularly Option (2) wouldn't help with GCC-compiled binaries, and if we can get away with LLDB not needing the alignment, then we wouldn't need to increase debug-info size. Option (3), AFAICT, would require us to reimplement some of the layout logic in the layout builder. Would be great if we can avoid this added complexity. Option (4) seemed like the best option in the interim. As part of this change I also removed one of the Full diff: https://github.com/llvm/llvm-project/pull/97443.diff 3 Files Affected:
diff --git a/clang/lib/AST/RecordLayoutBuilder.cpp b/clang/lib/AST/RecordLayoutBuilder.cpp
index d9bf62c2bbb04..8dbf69c310cbb 100644
--- a/clang/lib/AST/RecordLayoutBuilder.cpp
+++ b/clang/lib/AST/RecordLayoutBuilder.cpp
@@ -802,7 +802,8 @@ class ItaniumRecordLayoutBuilder {
/// \param Field The field whose offset is being queried.
/// \param ComputedOffset The offset that we've computed for this field.
uint64_t updateExternalFieldOffset(const FieldDecl *Field,
- uint64_t ComputedOffset);
+ uint64_t ComputedOffset,
+ uint64_t PreviousOffset);
void CheckFieldPadding(uint64_t Offset, uint64_t UnpaddedOffset,
uint64_t UnpackedOffset, unsigned UnpackedAlign,
@@ -1296,13 +1297,6 @@ ItaniumRecordLayoutBuilder::LayoutBase(const BaseSubobjectInfo *Base) {
bool Allowed = EmptySubobjects->CanPlaceBaseAtOffset(Base, Offset);
(void)Allowed;
assert(Allowed && "Base subobject externally placed at overlapping offset");
-
- if (InferAlignment && Offset < getDataSize().alignTo(AlignTo)) {
- // The externally-supplied base offset is before the base offset we
- // computed. Assume that the structure is packed.
- Alignment = CharUnits::One();
- InferAlignment = false;
- }
}
if (!Base->Class->isEmpty()) {
@@ -1770,7 +1764,8 @@ void ItaniumRecordLayoutBuilder::LayoutBitField(const FieldDecl *D) {
// If we're using external layout, give the external layout a chance
// to override this information.
if (UseExternalLayout)
- FieldOffset = updateExternalFieldOffset(D, FieldOffset);
+ FieldOffset = updateExternalFieldOffset(
+ D, FieldOffset, FieldOffsets.empty() ? 0 : FieldOffsets.back());
// Okay, place the bitfield at the calculated offset.
FieldOffsets.push_back(FieldOffset);
@@ -2063,8 +2058,9 @@ void ItaniumRecordLayoutBuilder::LayoutField(const FieldDecl *D,
UnpackedFieldOffset = UnpackedFieldOffset.alignTo(UnpackedFieldAlign);
if (UseExternalLayout) {
- FieldOffset = Context.toCharUnitsFromBits(
- updateExternalFieldOffset(D, Context.toBits(FieldOffset)));
+ FieldOffset = Context.toCharUnitsFromBits(updateExternalFieldOffset(
+ D, Context.toBits(FieldOffset),
+ FieldOffsets.empty() ? 0 : FieldOffsets.back()));
if (!IsUnion && EmptySubobjects) {
// Record the fact that we're placing a field at this offset.
@@ -2250,14 +2246,18 @@ void ItaniumRecordLayoutBuilder::UpdateAlignment(
}
}
-uint64_t
-ItaniumRecordLayoutBuilder::updateExternalFieldOffset(const FieldDecl *Field,
- uint64_t ComputedOffset) {
+uint64_t ItaniumRecordLayoutBuilder::updateExternalFieldOffset(
+ const FieldDecl *Field, uint64_t ComputedOffset, uint64_t PreviousOffset) {
uint64_t ExternalFieldOffset = External.getExternalFieldOffset(Field);
- if (InferAlignment && ExternalFieldOffset < ComputedOffset) {
- // The externally-supplied field offset is before the field offset we
- // computed. Assume that the structure is packed.
+ // If the externally-supplied field offset is before the field offset we
+ // computed. Check against the previous field offset to make sure we don't
+ // misinterpret overlapping fields as packedness of the structure.
+ const bool assume_packed = ExternalFieldOffset > 0 &&
+ ExternalFieldOffset < ComputedOffset &&
+ ExternalFieldOffset > PreviousOffset;
+
+ if (InferAlignment && assume_packed) {
Alignment = CharUnits::One();
PreferredAlignment = CharUnits::One();
InferAlignment = false;
diff --git a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp
index 1488199a3ad2d..ecefa66474772 100644
--- a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp
+++ b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-alignment.cpp
@@ -1,5 +1,3 @@
-// XFAIL: *
-
// RUN: %clangxx_host -gdwarf -o %t %s
// RUN: %lldb %t \
// RUN: -o "expr alignof(OverlappingFields)" \
diff --git a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp
index 15d8de0e3ee98..9f574f9846e35 100644
--- a/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp
+++ b/lldb/test/Shell/SymbolFile/DWARF/no_unique_address-base-alignment.cpp
@@ -1,5 +1,3 @@
-// XFAIL: *
-
// RUN: %clangxx_host -gdwarf -o %t %s
// RUN: %lldb %t \
// RUN: -o "expr alignof(OverlappingDerived)" \
|
// misinterpret overlapping fields as packedness of the structure. | ||
const bool assume_packed = ExternalFieldOffset > 0 && | ||
ExternalFieldOffset < ComputedOffset && | ||
ExternalFieldOffset > PreviousOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically "overlapping" would have to account for size of the previous field
Depends on: * llvm#97544 * llvm#97549 * llvm#97551 This patch tries to simplify the way in which the `std::map` formatter goes from the root `__tree` pointer to a specific key/value pair. Previously we would synthesize a structure that mimicked what `__iter_pointer` looked like in memory, then called `GetChildCompilerTypeAtIndex` on it to find the byte offset into that structure at which the pair was located at, and finally use that offset through a call to `GetSyntheticChildAtOffset` to retrieve that pair. Not only was this logic hard to follow, and encoded the libc++ layout in non-obvious ways, it was also fragile to alignment miscalculations (llvm#97443); this would break after the new layout of std::map landed as part of inhttps://github.com/llvm/issues/93069. Instead, this patch simply casts the `__iter_pointer` to the `__node_pointer` and uses a straightforward `GetChildMemberWithName("__value_")` to get to the key/value we care about. This allows us to get rid of some support infrastructure/class state. Ideally we would fix the underlying alignment issue, but this unblocks the libc++ refactor in the interim, while also benefitting the formatter in terms of readability (in my opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, the way this currently works is that you do normal field layout, then if you discover that the actual offset of a field is less than the offset normal field layout would produce, you assume the struct is packed. This misses cases where a struct is packed, but the packing doesn't affect the offset of any of the fields. But as you note, this can't be fixed without adjusting the overall architecture.
There's an issue with the current implementation: it skips fields which actually are packed, I think. Consider the following:
struct Empty {};
struct __attribute((packed)) S {
[[no_unique_address]] Empty a,b,c,d;
char x;
int y;
};
S s;
In this case, the field "y" is both overlapping, and at a packed offset. Really, you don't want to check for overlap; you want to ignore empty fields. (Non-empty fields can't overlap.)
Depends on: * llvm#97544 * llvm#97549 * llvm#97551 This patch tries to simplify the way in which the `std::map` formatter goes from the root `__tree` pointer to a specific key/value pair. Previously we would synthesize a structure that mimicked what `__iter_pointer` looked like in memory, then called `GetChildCompilerTypeAtIndex` on it to find the byte offset into that structure at which the pair was located at, and finally use that offset through a call to `GetSyntheticChildAtOffset` to retrieve that pair. Not only was this logic hard to follow, and encoded the libc++ layout in non-obvious ways, it was also fragile to alignment miscalculations (llvm#97443); this would break after the new layout of std::map landed as part of inhttps://github.com/llvm/issues/93069. Instead, this patch simply casts the `__iter_pointer` to the `__node_pointer` and uses a straightforward `GetChildMemberWithName("__value_")` to get to the key/value we care about. This allows us to get rid of some support infrastructure/class state. Ideally we would fix the underlying alignment issue, but this unblocks the libc++ refactor in the interim, while also benefitting the formatter in terms of readability (in my opinion).
Depends on: * llvm#97544 * llvm#97549 * llvm#97551 This patch tries to simplify the way in which the `std::map` formatter goes from the root `__tree` pointer to a specific key/value pair. Previously we would synthesize a structure that mimicked what `__iter_pointer` looked like in memory, then called `GetChildCompilerTypeAtIndex` on it to find the byte offset into that structure at which the pair was located at, and finally use that offset through a call to `GetSyntheticChildAtOffset` to retrieve that pair. Not only was this logic hard to follow, and encoded the libc++ layout in non-obvious ways, it was also fragile to alignment miscalculations (llvm#97443); this would break after the new layout of std::map landed as part of inhttps://github.com/llvm/issues/93069. Instead, this patch simply casts the `__iter_pointer` to the `__node_pointer` and uses a straightforward `GetChildMemberWithName("__value_")` to get to the key/value we care about. This allows us to get rid of some support infrastructure/class state. Ideally we would fix the underlying alignment issue, but this unblocks the libc++ refactor in the interim, while also benefitting the formatter in terms of readability (in my opinion).
…#97754) Depends on #97752 This patch changes the way we retrieve the key/value pair in the `std::unordered_map::iterator` formatter (similar to how we are changing it for `std::map::iterator` in #97713, the motivations being the same). The old logic was not very easy to follow, and encoded the libc++ layout in non-obvious ways. But mainly it was also fragile to alignment miscalculations (#97443); this would break once the new layout of `std::unordered_map` landed as part of #93069. Instead, this patch simply casts the `__hash_iterator` to a `__node_pointer` (which is what libc++ does too) and uses a straightforward `GetChildMemberWithName("__value_")` to get to the key/value we care about. The `std::unordered_map` already does it this way, so we align the iterator counterpart to do the same. We can eventually re-use the core-part of the `std::unordered_map` and `std::unordered_map::iterator` formatters. But it will be an easier to change to review once both simplifications landed.
…97713) Depends on #97687 Similar to #97579, this patch simplifies the way in which we retrieve the key/value pair of a `std::map` (in this case of the `std::map::iterator`). We do this for the same reason: not only was the old logic hard to follow, and encoded the libc++ layout in non-obvious ways, it was also fragile to alignment miscalculations (#97443); this would break once the new layout of std::map landed as part of #93069. Instead, this patch simply casts the `__iter_pointer` to the `__node_pointer` and uses a straightforward `GetChildMemberWithName("__value_")` to get to the key/value we care about. We can eventually re-use the core-part of the `std::map` and `std::map::iterator` formatters. But it will be an easier to change to review once both simplifications landed.
Depends on: * #97544 * #97549 * #97551 This patch tries to simplify the way in which the `std::map` formatter goes from the root `__tree` pointer to a specific key/value pair. Previously we would: 1. synthesize a structure that mimicked what `__iter_pointer` looked like in memory 2. call `GetChildCompilerTypeAtIndex` on it to find the byte offset at which the pair was located in the synthesized structure 3. finally, use that offset through a call to `GetSyntheticChildAtOffset` to retrieve the key/value pair Not only was this logic hard to follow, and encoded the libc++ layout in non-obvious ways, it was also fragile to alignment miscalculations (#97443); this would break once the new layout of std::map landed as part of #93069. Instead, this patch simply casts the `__iter_pointer` to the `__node_pointer` and uses a straightforward `GetChildMemberWithName("__value_")` to get to the key/value we care about. This allows us to get rid of some support infrastructure/class state. Ideally we would fix the underlying alignment issue, but this unblocks the libc++ refactor in the interim, while also benefitting the formatter in terms of readability (in my opinion).
Good point, we would still infer packedness in this example, but for the wrong reasons. Skipping empty fields does seem like a better heuristic here |
FWIW, I (independently) came to the same conclusion when investigating the fallout of #76756, though it's not fully clear to me whether the PR has been updated to do that. |
Not yet, but will have a look at this today |
✅ With the latest revision this PR passed the C/C++ code formatter. |
6d55789
to
7d54c5c
Compare
Updated the patch to ignore empty fields instead of checking overlap with previous fields. |
ping (btw, @labath is the alignment miscalculation still causing issues for you internally? or did you find a workaround?) |
The fire has been extinguished by #110648, but it's not ideal that any structure with a [[no_unique_address]] is treated as packed. It's not causing any issues right now, but I think we ought to fix this. (The change makes sense to me, but I don't feel it's my place to approve this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gentle ping
@efriedma-quic mind taking another look? Latest iteration skips over empty fields that overlap.
…in ExternalLayouts This patch is motivated by the LLDB support required for: llvm#93069 In the presence of `[[no_unique_address]]`, LLDB may ask Clang to lay out types with overlapping field offsets. Because we don't have attributes such as `packed` or `no_unique_address` in the LLDB AST, the `RecordLayoutBuilder` supports an `InferAlignment` mode, which, in the past, attempted to detect layouts which came from packed structures in order to provide a conservative estimate of alignment for it (since `DW_AT_alignment` isn't emitted unless explicitly changed with `alignas`, etc.). However, in the presence of overlapping fields due to `no_unique_address`, `InferAlignment` would set the alignment of structures to `1` for which that's incorrect. This poses issues in some LLDB formatters that synthesize new Clang types and rely on the layout builder to get the `FieldOffset` of structures right that we did have DWARF offsets for. The result of this is that if we get the alignment wrong, LLDB reads out garbage data from the wrong field offsets. There are a couple of solutions to this that we considered: 1. Make LLDB formatters not do the above, and make them more robust to inaccurate alignment. 2. Remove `InferAlignment` entirely and rely on Clang emitting `DW_AT_alignment` for packed structures. 3. Remove `InferAlignment` and detect packedness from within LLDB. 4. Make the `InferAlignment` logic account for overlapping fields. Option (1) turned out quite hairy and it's not clear we can achieve this with the tools available for certain STL formatters (particularly `std::map`). But I would still very much like to simplify this if we can. Option (2) wouldn't help with GCC-compiled binaries, and if we can get away with LLDB not needing the alignment, then we wouldn't need to increase debug-info size. Option (3), AFAICT, would require us to reimplement some of the layout logic in the layout builder. Would be great if we can avoid this added complexity. Option (4) seemed like the best option in the interim. As part of this change I also removed one of the `InferAlignment` blocks. The test-cases associated with this code-path pass regardless, and from the description of the change that introduced it it's not clear why specifically the base offsets would influence the `Alignment` field, and how it would imply packedness. But happy to be proven wrong. Ultimately it would be great if we can get rid of the `InferAlignment` infrastructure and support our use-cases in LLDB or DWARF instead.
5d4634f
to
2d9dc34
Compare
FWIW, I came across another no_unique_address-related crash today:
I was hoping this PR would fix it, but it doesn't appear to help. |
Hmmm interesting, yea that looks more like something that #96422 should've addressed. Though this is a special case where the attribute is applied to a non-empty type. Would need to step through the |
Yea I took a quick look at this and the problem is the following check:
In the So we're back to "how do we encode this isPotentiallyOverlapping property in DWARF". OR, maybe we can get away with removing that dependency on the AST in the same way we did with |
This patch is motivated by the LLDB support required for: #93069
In the presence of
[[no_unique_address]]
, LLDB may ask Clang to lay out types with overlapping field offsets. Because we don't have attributes such aspacked
orno_unique_address
in the LLDB AST, theRecordLayoutBuilder
supports anInferAlignment
mode, which, in the past, attempted to detect layouts which came from packed structures in order to provide a conservative estimate of alignment for it (sinceDW_AT_alignment
isn't emitted unless explicitly changed withalignas
, etc.).However, in the presence of overlapping fields due to
no_unique_address
,InferAlignment
would set the alignment of structures to1
for which that's incorrect. This poses issues in some LLDB formatters that synthesize new Clang types and rely on the layout builder to get theFieldOffset
of structures right that we did have DWARF offsets for. The result of this is that if we get the alignment wrong, LLDB reads out garbage data from the wrong field offsets.There are a couple of solutions to this that we considered:
InferAlignment
entirely and rely on Clang emittingDW_AT_alignment
for packed structures.InferAlignment
and detect packedness from within LLDB.InferAlignment
logic account for overlapping fields.Option (1) turned out quite hairy and it's not clear we can achieve this with the tools available for certain STL formatters (particularly
std::map
). But I would still very much like to simplify this if we can.Option (2) wouldn't help with GCC-compiled binaries, and if we can get away with LLDB not needing the alignment, then we wouldn't need to increase debug-info size.
Option (3), AFAICT, would require us to reimplement some of the layout logic in the layout builder. Would be great if we can avoid this added complexity.
Option (4) seemed like the best option in the interim. As part of this change I also removed one of the
InferAlignment
blocks. The test-cases associated with this code-path pass regardless, and from the description of the change that introduced it it's not clear to me how it implies packedness. But happy to be proven wrong. Ultimately it would be great if we can get rid of theInferAlignment
infrastructure and support our use-cases in LLDB or DWARF instead.