-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[DebugInfo] Add DW_OP_LLVM_extract_bits #93990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DebugInfo] Add DW_OP_LLVM_extract_bits #93990
Conversation
This operation extracts a number of bits at a given offset and sign or zero extends them, which is done by emitting it as a left shift followed by a right shift. This is being added for use in clang for C++ structured bindings of bitfields that have offset or size that aren't a byte multiple. A new operation is being added, instead of shifts being used directly, as it makes correctly handling it in optimisations (which will be done in a later patch) much easier.
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-llvm-ir Author: John Brawn (john-brawn-arm) ChangesThis operation extracts a number of bits at a given offset and sign or zero extends them, which is done by emitting it as a left shift followed by a right shift. This is being added for use in clang for C++ structured bindings of bitfields that have offset or size that aren't a byte multiple. A new operation is being added, instead of shifts being used directly, as it makes correctly handling it in optimisations (which will be done in a later patch) much easier. Full diff: https://github.com/llvm/llvm-project/pull/93990.diff 7 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index c58f7f7140e47..7b4e91d09f342 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -6312,6 +6312,13 @@ The current supported opcode vocabulary is limited:
(``16`` and ``DW_ATE_signed`` here, respectively) to which the top of the
expression stack is to be converted. Maps into a ``DW_OP_convert`` operation
that references a base type constructed from the supplied values.
+- ``DW_OP_LLVM_extract_bits, 16, 8, DW_ATE_signed`` specifies the offset, size,
+ and encoding (``16``, ``8``, and ``DW_ATE_signed`` here, respectively) of bits
+ that are to be extracted from the value at the top of the expression stack.
+ If the top of the expression stack is a memory location then these bits are
+ extracted from the value pointed to by that memory location. Maps into a
+ ``DW_OP_shl`` followed by ``DW_OP_shr`` or ``DW_OP_shra`` (depending on
+ encoding).
- ``DW_OP_LLVM_tag_offset, tag_offset`` specifies that a memory tag should be
optionally applied to the pointer. The memory tag is derived from the
given tag offset in an implementation-defined manner.
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 74c4d6ff3a716..7ae265484be58 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -144,6 +144,7 @@ enum LocationAtom {
DW_OP_LLVM_entry_value = 0x1003, ///< Only used in LLVM metadata.
DW_OP_LLVM_implicit_pointer = 0x1004, ///< Only used in LLVM metadata.
DW_OP_LLVM_arg = 0x1005, ///< Only used in LLVM metadata.
+ DW_OP_LLVM_extract_bits = 0x1006, ///< Only used in LLVM metadata.
};
enum LlvmUserLocationAtom {
diff --git a/llvm/lib/BinaryFormat/Dwarf.cpp b/llvm/lib/BinaryFormat/Dwarf.cpp
index 7324266172684..d9668dffabec6 100644
--- a/llvm/lib/BinaryFormat/Dwarf.cpp
+++ b/llvm/lib/BinaryFormat/Dwarf.cpp
@@ -155,6 +155,8 @@ StringRef llvm::dwarf::OperationEncodingString(unsigned Encoding) {
return "DW_OP_LLVM_implicit_pointer";
case DW_OP_LLVM_arg:
return "DW_OP_LLVM_arg";
+ case DW_OP_LLVM_extract_bits:
+ return "DW_OP_LLVM_extract_bits";
}
}
@@ -169,6 +171,7 @@ unsigned llvm::dwarf::getOperationEncoding(StringRef OperationEncodingString) {
.Case("DW_OP_LLVM_entry_value", DW_OP_LLVM_entry_value)
.Case("DW_OP_LLVM_implicit_pointer", DW_OP_LLVM_implicit_pointer)
.Case("DW_OP_LLVM_arg", DW_OP_LLVM_arg)
+ .Case("DW_OP_LLVM_extract_bits", DW_OP_LLVM_extract_bits)
.Default(0);
}
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
index a74d43897d45b..87beeb7d6bc9a 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
@@ -18,6 +18,7 @@
#include "llvm/CodeGen/Register.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/IR/DataLayout.h"
+#include "llvm/MC/MCAsmInfo.h"
#include "llvm/Support/ErrorHandling.h"
#include <algorithm>
@@ -546,6 +547,37 @@ bool DwarfExpression::addExpression(
LocationKind = Unknown;
return true;
}
+ case dwarf::DW_OP_LLVM_extract_bits: {
+ unsigned SizeInBits = Op->getArg(1);
+ unsigned BitOffset = Op->getArg(0);
+ dwarf::TypeKind Encoding = static_cast<dwarf::TypeKind>(Op->getArg(2));
+
+ // If we have a memory location then dereference to get the value
+ if (isMemoryLocation())
+ emitOp(dwarf::DW_OP_deref);
+
+ // Extract the bits by a shift left (to shift out the bits after what we
+ // want to extract) followed by shift right (to shift the bits to position
+ // 0 and also sign/zero extend). These operations are done in the DWARF
+ // "generic type" whose size is the size of a pointer.
+ unsigned PtrSizeInBytes = CU.getAsmPrinter()->MAI->getCodePointerSize();
+ unsigned LeftShift = PtrSizeInBytes * 8 - (SizeInBits + BitOffset);
+ unsigned RightShift = LeftShift + BitOffset;
+ if (LeftShift) {
+ emitOp(dwarf::DW_OP_constu);
+ emitUnsigned(LeftShift);
+ emitOp(dwarf::DW_OP_shl);
+ }
+ emitOp(dwarf::DW_OP_constu);
+ emitUnsigned(RightShift);
+ emitOp(Encoding == dwarf::DW_ATE_signed ? dwarf::DW_OP_shra
+ : dwarf::DW_OP_shr);
+
+ // The value is now at the top of the stack, so set the location to
+ // implicit so that we get a stack_value at the end.
+ LocationKind = Implicit;
+ break;
+ }
case dwarf::DW_OP_plus_uconst:
assert(!isRegisterLocation());
emitOp(dwarf::DW_OP_plus_uconst);
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index 8b1a21f962b08..4f5935de42bb0 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -2468,6 +2468,10 @@ static void writeDIExpression(raw_ostream &Out, const DIExpression *N,
if (Op.getOp() == dwarf::DW_OP_LLVM_convert) {
Out << FS << Op.getArg(0);
Out << FS << dwarf::AttributeEncodingString(Op.getArg(1));
+ } else if (Op.getOp() == dwarf::DW_OP_LLVM_extract_bits) {
+ Out << FS << Op.getArg(0);
+ Out << FS << Op.getArg(1);
+ Out << FS << dwarf::AttributeEncodingString(Op.getArg(2));
} else {
for (unsigned A = 0, AE = Op.getNumArgs(); A != AE; ++A)
Out << FS << Op.getArg(A);
diff --git a/llvm/lib/IR/DebugInfoMetadata.cpp b/llvm/lib/IR/DebugInfoMetadata.cpp
index 9bd1d7880c9f8..5e69192d5c52f 100644
--- a/llvm/lib/IR/DebugInfoMetadata.cpp
+++ b/llvm/lib/IR/DebugInfoMetadata.cpp
@@ -1402,6 +1402,8 @@ unsigned DIExpression::ExprOperand::getSize() const {
return 2;
switch (Op) {
+ case dwarf::DW_OP_LLVM_extract_bits:
+ return 4;
case dwarf::DW_OP_LLVM_convert:
case dwarf::DW_OP_LLVM_fragment:
case dwarf::DW_OP_bregx:
@@ -1474,6 +1476,7 @@ bool DIExpression::isValid() const {
case dwarf::DW_OP_LLVM_convert:
case dwarf::DW_OP_LLVM_arg:
case dwarf::DW_OP_LLVM_tag_offset:
+ case dwarf::DW_OP_LLVM_extract_bits:
case dwarf::DW_OP_constu:
case dwarf::DW_OP_plus_uconst:
case dwarf::DW_OP_plus:
diff --git a/llvm/test/DebugInfo/X86/DW_OP_LLVM_extract_bits.ll b/llvm/test/DebugInfo/X86/DW_OP_LLVM_extract_bits.ll
new file mode 100644
index 0000000000000..da0eec669b50c
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/DW_OP_LLVM_extract_bits.ll
@@ -0,0 +1,99 @@
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu %s -o %t -filetype=obj
+; RUN: llvm-dwarfdump --debug-info %t | FileCheck %s
+
+%struct.struct_t = type { i8 }
+
+@g = dso_local global %struct.struct_t zeroinitializer, align 1, !dbg !0
+
+; CHECK-LABEL: DW_TAG_subprogram
+; CHECK: DW_AT_name ("test1")
+; CHECK: DW_TAG_variable
+; CHECK: DW_AT_location (DW_OP_fbreg -1, DW_OP_deref, DW_OP_constu 0x3d, DW_OP_shl, DW_OP_constu 0x3d, DW_OP_shr, DW_OP_stack_value)
+; CHECK: DW_AT_name ("x")
+; CHECK: DW_TAG_variable
+; CHECK: DW_AT_location (DW_OP_fbreg -1, DW_OP_deref, DW_OP_constu 0x39, DW_OP_shl, DW_OP_constu 0x3c, DW_OP_shra, DW_OP_stack_value)
+; CHECK: DW_AT_name ("y")
+
+define i32 @test1() !dbg !13 {
+entry:
+ %0 = alloca %struct.struct_t, align 1
+ tail call void @llvm.dbg.declare(metadata ptr %0, metadata !17, metadata !DIExpression(DW_OP_LLVM_extract_bits, 0, 3, DW_ATE_unsigned)), !dbg !18
+ tail call void @llvm.dbg.declare(metadata ptr %0, metadata !19, metadata !DIExpression(DW_OP_LLVM_extract_bits, 3, 4, DW_ATE_signed)), !dbg !21
+ ret i32 0, !dbg !22
+}
+
+; CHECK-LABEL: DW_TAG_subprogram
+; CHECK: DW_AT_name ("test2")
+; CHECK: DW_TAG_variable
+; CHECK: DW_AT_location (DW_OP_breg0 RAX+0, DW_OP_constu 0xff, DW_OP_and, DW_OP_constu 0x3d, DW_OP_shl, DW_OP_constu 0x3d, DW_OP_shr, DW_OP_stack_value)
+; CHECK: DW_AT_name ("x")
+; CHECK: DW_TAG_variable
+; CHECK: DW_AT_location (DW_OP_breg0 RAX+0, DW_OP_constu 0xff, DW_OP_and, DW_OP_constu 0x39, DW_OP_shl, DW_OP_constu 0x3c, DW_OP_shra, DW_OP_stack_value)
+; CHECK: DW_AT_name ("y")
+
+define i8 @test2() !dbg !23 {
+entry:
+ %0 = load i8, ptr @g, align 1
+ tail call void @llvm.dbg.value(metadata i8 %0, metadata !24, metadata !DIExpression(DW_OP_LLVM_extract_bits, 0, 3, DW_ATE_unsigned)), !dbg !25
+ tail call void @llvm.dbg.value(metadata i8 %0, metadata !26, metadata !DIExpression(DW_OP_LLVM_extract_bits, 3, 4, DW_ATE_signed)), !dbg !27
+ ret i8 %0, !dbg !28
+}
+
+; CHECK-LABEL: DW_TAG_subprogram
+; CHECK: DW_AT_name ("test3")
+; CHECK: DW_TAG_variable
+; CHECK: DW_AT_location (DW_OP_breg0 RAX+0, DW_OP_constu 0x3f, DW_OP_shr, DW_OP_stack_value)
+; CHECK: DW_AT_name ("x")
+; CHECK: DW_TAG_variable
+; CHECK: DW_AT_location (DW_OP_breg0 RAX+0, DW_OP_constu 0x3f, DW_OP_shra, DW_OP_stack_value)
+; CHECK: DW_AT_name ("y")
+
+define i64 @test3(ptr %p) !dbg !29 {
+entry:
+ %0 = load i64, ptr %p, align 8
+ tail call void @llvm.dbg.value(metadata i64 %0, metadata !33, metadata !DIExpression(DW_OP_LLVM_extract_bits, 63, 1, DW_ATE_unsigned)), !dbg !30
+ tail call void @llvm.dbg.value(metadata i64 %0, metadata !34, metadata !DIExpression(DW_OP_LLVM_extract_bits, 63, 1, DW_ATE_signed)), !dbg !31
+ ret i64 %0, !dbg !32
+}
+
+declare void @llvm.dbg.declare(metadata, metadata, metadata)
+declare void @llvm.dbg.value(metadata, metadata, metadata)
+
+!llvm.dbg.cu = !{!2}
+!llvm.module.flags = !{!11, !12}
+
+!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
+!1 = distinct !DIGlobalVariable(name: "g", scope: !2, file: !3, line: 6, type: !5, isLocal: false, isDefinition: true)
+!2 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !3, isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, nameTableKind: None)
+!3 = !DIFile(filename: "DW_OP_bit_piece.cpp", directory: "./")
+!4 = !{!0}
+!5 = !DIDerivedType(tag: DW_TAG_typedef, name: "struct_t", file: !3, line: 4, baseType: !6)
+!6 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !3, line: 1, size: 8, flags: DIFlagTypePassByValue, elements: !7, identifier: "_ZTS8struct_t")
+!7 = !{!8, !10}
+!8 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !6, file: !3, line: 2, baseType: !9, size: 3, flags: DIFlagBitField, extraData: i64 0)
+!9 = !DIBasicType(name: "unsigned int", size: 32, encoding: DW_ATE_unsigned)
+!10 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !6, file: !3, line: 3, baseType: !9, size: 4, offset: 3, flags: DIFlagBitField, extraData: i64 0)
+!11 = !{i32 7, !"Dwarf Version", i32 5}
+!12 = !{i32 2, !"Debug Info Version", i32 3}
+!13 = distinct !DISubprogram(name: "test1", linkageName: "test1", scope: !3, file: !3, line: 8, type: !14, scopeLine: 8, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !16)
+!14 = !DISubroutineType(types: !15)
+!15 = !{!9}
+!16 = !{}
+!17 = !DILocalVariable(name: "x", scope: !13, file: !3, line: 9, type: !9)
+!18 = !DILocation(line: 9, column: 9, scope: !13)
+!19 = !DILocalVariable(name: "y", scope: !13, file: !3, line: 9, type: !20)
+!20 = !DIBasicType(name: "signed int", size: 32, encoding: DW_ATE_signed)
+!21 = !DILocation(line: 9, column: 12, scope: !13)
+!22 = !DILocation(line: 10, column: 3, scope: !13)
+!23 = distinct !DISubprogram(name: "test2", linkageName: "test2", scope: !3, file: !3, line: 8, type: !14, scopeLine: 8, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !16)
+!24 = !DILocalVariable(name: "x", scope: !23, file: !3, line: 9, type: !9)
+!25 = !DILocation(line: 9, column: 9, scope: !23)
+!26 = !DILocalVariable(name: "y", scope: !23, file: !3, line: 9, type: !20)
+!27 = !DILocation(line: 9, column: 12, scope: !23)
+!28 = !DILocation(line: 10, column: 3, scope: !23)
+!29 = distinct !DISubprogram(name: "test3", linkageName: "test3", scope: !3, file: !3, line: 8, type: !14, scopeLine: 8, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !2, retainedNodes: !16)
+!30 = !DILocation(line: 9, column: 9, scope: !29)
+!31 = !DILocation(line: 9, column: 12, scope: !29)
+!32 = !DILocation(line: 10, column: 3, scope: !29)
+!33 = !DILocalVariable(name: "x", scope: !29, file: !3, line: 9, type: !9)
+!34 = !DILocalVariable(name: "y", scope: !29, file: !3, line: 9, type: !20)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach for its readability and because it's a lot simpler than mixing this up with the composition operators.
Can you explain why the signedness matters?
; CHECK-LABEL: DW_TAG_subprogram | ||
; CHECK: DW_AT_name ("test3") | ||
; CHECK: DW_TAG_variable | ||
; CHECK: DW_AT_location (DW_OP_breg0 RAX+0, DW_OP_constu 0x3f, DW_OP_shr, DW_OP_stack_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to hardcode some register allocation assumptions. What do you think about converting it to MIR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like MIR doesn't handle multiple debug values in the same stack location. The MIR generated for test1 (by running DW_OP_LLVM_extract_bits.ll through llc with --stop-after=patchable-function) has
stack:
- { id: 0, name: '', type: default, offset: -9, size: 1, alignment: 1,
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
debug-info-variable: '!17!19', debug-info-expression: '!DIExpression(DW_OP_LLVM_extract_bits, 0, 3, DW_ATE_unsigned)!DIExpression(DW_OP_LLVM_extract_bits, 3, 4, DW_ATE_signed)',
debug-info-location: '!18!21' }
which llc then gives an error for on reading it:
error: tmp.mir:130:32: expected end of string after the metadata node
debug-info-variable: '!17!19', debug-info-expression: '!DIExpression(DW_OP_LLVM_extract_bits, 0, 3, DW_ATE_unsigned)!DIExpression(DW_OP_LLVM_extract_bits, 3, 4, DW_ATE_signed)',
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is a silly bug - I remember seeing it some time ago, I thought it had been fixed! This isn't a technical limitation of MIR in-memory, but an error in how it is printed (and possibly parsed) that should be fixed soon; I'll take a look at it soon. In the meanwhile, you could use regex (something like {{R[^+]+}}
?) to match any 64 bit register.
If we have
then if y isn't sign-extended, looking at the values in a debugger we have
which is wrong. If it's sign-extended we have
which is correct. |
I see: the value of the 4-bit signed field is I think I'm fine with this proposal! It would be good to hear at least one more LGTM from others in the @llvm/pr-subscribers-debuginfo group before landing this. |
In case it's relevant to you have you seen #61981? Variable locations for structured bindings are currently broken if the alloca gets SROA'd. I've got a private patch that needs some reworking that I've not had time to put upstream. I think there are other issues too (I can't recall the details, but I think it's related to converting address offsets to bit-shifts after mem2reg, or the lack thereof). I think it makes sense to add "high level" DIExpression operations like this and this one seem reasonable. I'd feel more confident about the patch if I could see the whole stack, but github does make that somewhat difficult (perhaps you could share a link to a branch with all the patches you plan on submitting, if it's not too much of a faff?). LGTM from me too but I'd prefer to wait a few days before accepting to see if anyone else wants to chip in. |
I agree with this, we aren't squeezed for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly LGTM, with the question about the operator encoding outstanding.
; CHECK-LABEL: DW_TAG_subprogram | ||
; CHECK: DW_AT_name ("test3") | ||
; CHECK: DW_TAG_variable | ||
; CHECK: DW_AT_location (DW_OP_breg0 RAX+0, DW_OP_constu 0x3f, DW_OP_shr, DW_OP_stack_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is a silly bug - I remember seeing it some time ago, I thought it had been fixed! This isn't a technical limitation of MIR in-memory, but an error in how it is printed (and possibly parsed) that should be fixed soon; I'll take a look at it soon. In the meanwhile, you could use regex (something like {{R[^+]+}}
?) to match any 64 bit register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
This operation extracts a number of bits at a given offset and sign or zero extends them, which is done by emitting it as a left shift followed by a right shift.
This is being added for use in clang for C++ structured bindings of bitfields that have offset or size that aren't a byte multiple. A new operation is being added, instead of shifts being used directly, as it makes correctly handling it in optimisations (which will be done in a later patch) much easier.