Skip to content

Commit b19cfb9

Browse files
[llvm-debuginfo-analyzer] Add support for WebAssembly binary format. (#82588)
Add support for the WebAssembly binary format and be able to generate logical views. #69181 The README.txt includes information about how to build the test cases.
1 parent aff0570 commit b19cfb9

25 files changed

+3286
-8
lines changed

llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst

Lines changed: 313 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ DESCRIPTION
1616
binary object files and prints their contents in a logical view, which
1717
is a human readable representation that closely matches the structure
1818
of the original user source code. Supported object file formats include
19-
ELF, Mach-O, PDB and COFF.
19+
ELF, Mach-O, WebAssembly, PDB and COFF.
2020

2121
The **logical view** abstracts the complexity associated with the
2222
different low-level representations of the debugging information that
@@ -468,8 +468,9 @@ If the <pattern> criteria is too general, a more selective option can
468468
be specified to target a particular category of elements:
469469
lines (:option:`--select-lines`), scopes (:option:`--select-scopes`),
470470
symbols (:option:`--select-symbols`) and types (:option:`--select-types`).
471+
471472
These options require knowledge of the debug information format (DWARF,
472-
CodeView, COFF), as the given **kind** describes a very specific type
473+
CodeView), as the given **kind** describes a very specific type
473474
of element.
474475

475476
LINES
@@ -598,7 +599,7 @@ When comparing logical views created from different debug formats, its
598599
accuracy depends on how close the debug information represents the
599600
user code. For instance, a logical view created from a binary file with
600601
DWARF debug information may include more detailed data than a logical
601-
view created from a binary file with CodeView/COFF debug information.
602+
view created from a binary file with CodeView debug information.
602603

603604
The following options describe the elements to compare.
604605

@@ -1952,6 +1953,315 @@ The **{Coverage}** and **{Location}** attributes describe the debug
19521953
location and coverage for logical symbols. For optimized code, the
19531954
coverage value decreases and it affects the program debuggability.
19541955

1956+
WEBASSEMBLY SUPPORT
1957+
~~~~~~~~~~~~~~~~~~~
1958+
The below example is used to show the WebAssembly output generated by
1959+
:program:`llvm-debuginfo-analyzer`. We compiled the example for a
1960+
WebAssembly 32-bit target with Clang (-O0 -g --target=wasm32):
1961+
1962+
.. code-block:: c++
1963+
1964+
1 using INTPTR = const int *;
1965+
2 int foo(INTPTR ParamPtr, unsigned ParamUnsigned, bool ParamBool) {
1966+
3 if (ParamBool) {
1967+
4 typedef int INTEGER;
1968+
5 const INTEGER CONSTANT = 7;
1969+
6 return CONSTANT;
1970+
7 }
1971+
8 return ParamUnsigned;
1972+
9 }
1973+
1974+
PRINT BASIC DETAILS
1975+
^^^^^^^^^^^^^^^^^^^
1976+
The following command prints basic details for all the logical elements
1977+
sorted by the debug information internal offset; it includes its lexical
1978+
level and debug info format.
1979+
1980+
.. code-block:: none
1981+
1982+
llvm-debuginfo-analyzer --attribute=level,format
1983+
--output-sort=offset
1984+
--print=scopes,symbols,types,lines,instructions
1985+
test-clang.wasm
1986+
1987+
or
1988+
1989+
.. code-block:: none
1990+
1991+
llvm-debuginfo-analyzer --attribute=level,format
1992+
--output-sort=offset
1993+
--print=elements
1994+
test-clang.wasm
1995+
1996+
Each row represents an element that is present within the debug
1997+
information. The first column represents the scope level, followed by
1998+
the associated line number (if any), and finally the description of
1999+
the element.
2000+
2001+
.. code-block:: none
2002+
2003+
Logical View:
2004+
[000] {File} 'test-clang.wasm' -> WASM
2005+
2006+
[001] {CompileUnit} 'test.cpp'
2007+
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
2008+
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
2009+
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
2010+
[003] 2 {Parameter} 'ParamBool' -> 'bool'
2011+
[003] {Block}
2012+
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
2013+
[004] 5 {Line}
2014+
[004] {Code} 'i32.const 7'
2015+
[004] {Code} 'local.set 10'
2016+
[004] {Code} 'local.get 5'
2017+
[004] {Code} 'local.get 10'
2018+
[004] {Code} 'i32.store 12'
2019+
[004] 6 {Line}
2020+
[004] {Code} 'i32.const 7'
2021+
[004] {Code} 'local.set 11'
2022+
[004] {Code} 'local.get 5'
2023+
[004] {Code} 'local.get 11'
2024+
[004] {Code} 'i32.store 28'
2025+
[004] {Code} 'br 1'
2026+
[004] - {Line}
2027+
[004] {Code} 'end'
2028+
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
2029+
[003] 2 {Line}
2030+
[003] {Code} 'nop'
2031+
[003] {Code} 'end'
2032+
[003] {Code} 'i64.div_s'
2033+
[003] {Code} 'global.get 0'
2034+
[003] {Code} 'local.set 3'
2035+
[003] {Code} 'i32.const 32'
2036+
[003] {Code} 'local.set 4'
2037+
[003] {Code} 'local.get 3'
2038+
[003] {Code} 'local.get 4'
2039+
[003] {Code} 'i32.sub'
2040+
[003] {Code} 'local.set 5'
2041+
[003] {Code} 'local.get 5'
2042+
[003] {Code} 'local.get 0'
2043+
[003] {Code} 'i32.store 24'
2044+
[003] {Code} 'local.get 5'
2045+
[003] {Code} 'local.get 1'
2046+
[003] {Code} 'i32.store 20'
2047+
[003] {Code} 'local.get 2'
2048+
[003] {Code} 'local.set 6'
2049+
[003] {Code} 'local.get 5'
2050+
[003] {Code} 'local.get 6'
2051+
[003] {Code} 'i32.store8 19'
2052+
[003] 3 {Line}
2053+
[003] {Code} 'local.get 5'
2054+
[003] {Code} 'i32.load8_u 19'
2055+
[003] {Code} 'local.set 7'
2056+
[003] 3 {Line}
2057+
[003] {Code} 'i32.const 1'
2058+
[003] {Code} 'local.set 8'
2059+
[003] {Code} 'local.get 7'
2060+
[003] {Code} 'local.get 8'
2061+
[003] {Code} 'i32.and'
2062+
[003] {Code} 'local.set 9'
2063+
[003] {Code} 'block'
2064+
[003] {Code} 'block'
2065+
[003] {Code} 'local.get 9'
2066+
[003] {Code} 'i32.eqz'
2067+
[003] {Code} 'br_if 0'
2068+
[003] 8 {Line}
2069+
[003] {Code} 'local.get 5'
2070+
[003] {Code} 'i32.load 20'
2071+
[003] {Code} 'local.set 12'
2072+
[003] 8 {Line}
2073+
[003] {Code} 'local.get 5'
2074+
[003] {Code} 'local.get 12'
2075+
[003] {Code} 'i32.store 28'
2076+
[003] - {Line}
2077+
[003] {Code} 'end'
2078+
[003] 9 {Line}
2079+
[003] {Code} 'local.get 5'
2080+
[003] {Code} 'i32.load 28'
2081+
[003] {Code} 'local.set 13'
2082+
[003] {Code} 'local.get 13'
2083+
[003] {Code} 'return'
2084+
[003] {Code} 'end'
2085+
[003] 9 {Line}
2086+
[003] {Code} 'unreachable'
2087+
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'
2088+
2089+
SELECT LOGICAL ELEMENTS
2090+
^^^^^^^^^^^^^^^^^^^^^^^
2091+
The following prints all *instructions*, *symbols* and *types* that
2092+
contain **'block'** or **'.store'** in their names or types, using a tab
2093+
layout and given the number of matches.
2094+
2095+
.. code-block:: none
2096+
2097+
llvm-debuginfo-analyzer --attribute=level
2098+
--select-nocase --select-regex
2099+
--select=BLOCK --select=.store
2100+
--report=list
2101+
--print=symbols,types,instructions,summary
2102+
test-clang.wasm
2103+
2104+
Logical View:
2105+
[000] {File} 'test-clang.wasm'
2106+
2107+
[001] {CompileUnit} 'test.cpp'
2108+
[003] {Code} 'block'
2109+
[003] {Code} 'block'
2110+
[004] {Code} 'i32.store 12'
2111+
[003] {Code} 'i32.store 20'
2112+
[003] {Code} 'i32.store 24'
2113+
[004] {Code} 'i32.store 28'
2114+
[003] {Code} 'i32.store 28'
2115+
[003] {Code} 'i32.store8 19'
2116+
2117+
-----------------------------
2118+
Element Total Printed
2119+
-----------------------------
2120+
Scopes 3 0
2121+
Symbols 4 0
2122+
Types 2 0
2123+
Lines 62 8
2124+
-----------------------------
2125+
Total 71 8
2126+
2127+
COMPARISON MODE
2128+
^^^^^^^^^^^^^^^
2129+
Given the previous example we found the above debug information issue
2130+
(related to the previous invalid scope location for the **'typedef int
2131+
INTEGER'**) by comparing against another compiler.
2132+
2133+
Using GCC to generate test-dwarf-gcc.o, we can apply a selection pattern
2134+
with the printing mode to obtain the following logical view output.
2135+
2136+
.. code-block:: none
2137+
2138+
llvm-debuginfo-analyzer --attribute=level
2139+
--select-regex --select-nocase --select=INTe
2140+
--report=list
2141+
--print=symbols,types
2142+
test-clang.wasm test-dwarf-gcc.o
2143+
2144+
Logical View:
2145+
[000] {File} 'test-clang.wasm'
2146+
2147+
[001] {CompileUnit} 'test.cpp'
2148+
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
2149+
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
2150+
2151+
Logical View:
2152+
[000] {File} 'test-dwarf-gcc.o'
2153+
2154+
[001] {CompileUnit} 'test.cpp'
2155+
[004] 4 {TypeAlias} 'INTEGER' -> 'int'
2156+
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
2157+
2158+
The output shows that both objects contain the same elements. But the
2159+
**'typedef INTEGER'** is located at different scope level. The GCC
2160+
generated object, shows **'4'**, which is the correct value.
2161+
2162+
There are 2 comparison methods: logical view and logical elements.
2163+
2164+
LOGICAL VIEW
2165+
""""""""""""
2166+
It compares the logical view as a whole unit; for a match, each compared
2167+
logical element must have the same parents and children.
2168+
2169+
The output shows in view form the **missing (-), added (+)** elements,
2170+
giving more context by swapping the reference and target object files.
2171+
2172+
.. code-block:: none
2173+
2174+
llvm-debuginfo-analyzer --attribute=level
2175+
--compare=types
2176+
--report=view
2177+
--print=symbols,types
2178+
test-clang.wasm test-dwarf-gcc.o
2179+
2180+
Reference: 'test-clang.wasm'
2181+
Target: 'test-dwarf-gcc.o'
2182+
2183+
Logical View:
2184+
[000] {File} 'test-clang.wasm'
2185+
2186+
[001] {CompileUnit} 'test.cpp'
2187+
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'
2188+
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
2189+
[003] {Block}
2190+
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
2191+
+[004] 4 {TypeAlias} 'INTEGER' -> 'int'
2192+
[003] 2 {Parameter} 'ParamBool' -> 'bool'
2193+
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
2194+
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
2195+
-[003] 4 {TypeAlias} 'INTEGER' -> 'int'
2196+
2197+
The output shows the merging view path (reference and target) with the
2198+
missing and added elements.
2199+
2200+
LOGICAL ELEMENTS
2201+
""""""""""""""""
2202+
It compares individual logical elements without considering if their
2203+
parents are the same. For both comparison methods, the equal criteria
2204+
includes the name, source code location, type, lexical scope level.
2205+
2206+
.. code-block:: none
2207+
2208+
llvm-debuginfo-analyzer --attribute=level
2209+
--compare=types
2210+
--report=list
2211+
--print=symbols,types,summary
2212+
test-clang.wasm test-dwarf-gcc.o
2213+
2214+
Reference: 'test-clang.wasm'
2215+
Target: 'test-dwarf-gcc.o'
2216+
2217+
(1) Missing Types:
2218+
-[003] 4 {TypeAlias} 'INTEGER' -> 'int'
2219+
2220+
(1) Added Types:
2221+
+[004] 4 {TypeAlias} 'INTEGER' -> 'int'
2222+
2223+
----------------------------------------
2224+
Element Expected Missing Added
2225+
----------------------------------------
2226+
Scopes 4 0 0
2227+
Symbols 0 0 0
2228+
Types 2 1 1
2229+
Lines 0 0 0
2230+
----------------------------------------
2231+
Total 6 1 1
2232+
2233+
Changing the *Reference* and *Target* order:
2234+
2235+
.. code-block:: none
2236+
2237+
llvm-debuginfo-analyzer --attribute=level
2238+
--compare=types
2239+
--report=list
2240+
--print=symbols,types,summary
2241+
test-dwarf-gcc.o test-clang.wasm
2242+
2243+
Reference: 'test-dwarf-gcc.o'
2244+
Target: 'test-clang.wasm'
2245+
2246+
(1) Missing Types:
2247+
-[004] 4 {TypeAlias} 'INTEGER' -> 'int'
2248+
2249+
(1) Added Types:
2250+
+[003] 4 {TypeAlias} 'INTEGER' -> 'int'
2251+
2252+
----------------------------------------
2253+
Element Expected Missing Added
2254+
----------------------------------------
2255+
Scopes 4 0 0
2256+
Symbols 0 0 0
2257+
Types 2 1 1
2258+
Lines 0 0 0
2259+
----------------------------------------
2260+
Total 6 1 1
2261+
2262+
As the *Reference* and *Target* are switched, the *Added Types* from
2263+
the first case now are listed as *Missing Types*.
2264+
19552265
EXIT STATUS
19562266
-----------
19572267
:program:`llvm-debuginfo-analyzer` returns 0 if the input files were

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,48 @@ class LVBinaryReader : public LVReader {
122122
std::unique_ptr<MCContext> MC;
123123
std::unique_ptr<MCInstPrinter> MIP;
124124

125+
// https://yurydelendik.github.io/webassembly-dwarf/
126+
// 2. Consuming and Generating DWARF for WebAssembly Code
127+
// Note: Some DWARF constructs don't map one-to-one onto WebAssembly
128+
// constructs. We strive to enumerate and resolve any ambiguities here.
129+
//
130+
// 2.1. Code Addresses
131+
// Note: DWARF associates various bits of debug info
132+
// with particular locations in the program via its code address (instruction
133+
// pointer or PC). However, WebAssembly's linear memory address space does not
134+
// contain WebAssembly instructions.
135+
//
136+
// Wherever a code address (see 2.17 of [DWARF]) is used in DWARF for
137+
// WebAssembly, it must be the offset of an instruction relative within the
138+
// Code section of the WebAssembly file. The DWARF is considered malformed if
139+
// a PC offset is between instruction boundaries within the Code section.
140+
//
141+
// Note: It is expected that a DWARF consumer does not know how to decode
142+
// WebAssembly instructions. The instruction pointer is selected as the offset
143+
// in the binary file of the first byte of the instruction, and it is
144+
// consistent with the WebAssembly Web API conventions definition of the code
145+
// location.
146+
//
147+
// EXAMPLE: .DEBUG_LINE INSTRUCTION POINTERS
148+
// The .debug_line DWARF section maps instruction pointers to source
149+
// locations. With WebAssembly, the .debug_line section maps Code
150+
// section-relative instruction offsets to source locations.
151+
//
152+
// EXAMPLE: DW_AT_* ATTRIBUTES
153+
// For entities with a single associated code address, DWARF uses
154+
// the DW_AT_low_pc attribute to specify the associated code address value.
155+
// For WebAssembly, the DW_AT_low_pc's value is a Code section-relative
156+
// instruction offset.
157+
//
158+
// For entities with a single contiguous range of code, DWARF uses a
159+
// pair of DW_AT_low_pc and DW_AT_high_pc attributes to specify the associated
160+
// contiguous range of code address values. For WebAssembly, these attributes
161+
// are Code section-relative instruction offsets.
162+
//
163+
// For entities with multiple ranges of code, DWARF uses the DW_AT_ranges
164+
// attribute, which refers to the array located at the .debug_ranges section.
165+
LVAddress WasmCodeSectionOffset = 0;
166+
125167
// Loads all info for the architecture of the provided object file.
126168
Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures);
127169

0 commit comments

Comments
 (0)