-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[bolt][aarch64] simplify rodata/literal load for X86 & AArch64 #165723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-bolt Author: Alexey Moksyakov (yavtuk) Changesldr reg, literal instruction is limited +/- 1MB range, emitCI put the constants by the end of function and the one is out of available range. Full diff: https://github.com/llvm/llvm-project/pull/165723.diff 1 Files Affected:
diff --git a/bolt/test/AArch64/materialize-constant.s b/bolt/test/AArch64/materialize-constant.s
new file mode 100644
index 0000000000000..1c15626b09594
--- /dev/null
+++ b/bolt/test/AArch64/materialize-constant.s
@@ -0,0 +1,74 @@
+// this test checks a load literal instructions changed to movk
+
+// REQUIRES: system-linux
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+
+# RUN: link_fdata %s %t.o %t.fdata
+# RUN: %clang %cflags -pie %t.o -o %t.exe -Wl,-q -Wl,-z,relro -Wl,-z,now
+# RUN: llvm-bolt %t.exe -o %t.bolt -data %t.fdata \
+# RUN: --keep-nops --eliminate-unreachable=false
+# RUN: llvm-objdump --disassemble-symbols=foo %t.bolt | FileCheck %s
+
+# CHECK: mov{{.*}} w19, #0
+# CHECK-NEXT: mov{{.*}} w22, #0
+# CHECK-NEXT: movk{{.*}} w23, #0, lsl #16
+# CHECK-NEXT: movk{{.*}} w23, #100
+# CHECK-NEXT: movk{{.*}} w24, #0, lsl #16
+# CHECK-NEXT: movk{{.*}} w24, #3
+
+ .text
+ .align 4
+ .local foo
+ .type foo, %function
+foo:
+# FDATA: 1 main 0 1 foo 0 0 10
+ stp x29, x30, [sp, #-32]!
+ stp x19, x20, [sp, #16]
+ mov x29, sp
+
+ mov w19, #0 // counter = 0
+ mov w22, #0 // result = 0
+
+ ldr w23, .Llimit
+ ldr w24, .LStep
+ b .LStub
+
+.LConstants:
+ .Llimit: .word 100
+ .LStep: .word 3
+
+.LStub:
+.rep 0x100000
+ nop
+.endr
+ b .Lmain_loop
+
+.Lmain_loop:
+ madd w22, w19, w24, w22 // result += counter * increment
+
+ add w19, w19, #1
+ cmp w19, w23
+ b.lt .Lmain_loop
+
+ mov w0, w22
+
+ b .Lreturn_point
+
+.Lreturn_point:
+ ldp x19, x20, [sp, #16]
+ ldp x29, x30, [sp], #32
+ ret
+.size foo, .-foo
+
+
+ .global main
+ .type main, %function
+main:
+ mov x0, #0
+ bl foo
+ mov x0, 0
+ mov w8, #93
+ svc #0
+
+ .size main, .-main
|
|
hi folks, I've faced with the issue related to ldr reg, literal instruction inside function with constant island. after emitCI the constants are placed by the end of the function and the distance to target for ldr is out of range. I think such cases are possibly for ADR instruction either and also after long jump pass. |
|
Thanks for the patch! Wondering if it would be useful to combine this "ldr->movk" transformation with some sort of ldr relaxation (as in #165787), particularly for cases where the constant island content has (dynamic) relocation and need to be fixed up? |
|
@yavtuk, great idea! If this is integrated into Extra step on top will be elimination of the constant island that is no longer referenced. Running |
|
It would be good to handle this in the suggested pass, also reporting how many times it was applied. From discussion, this doesn't appear to be a common case. |
ldr reg, literal instruction is limited +/- 1MB range, emitCI put the constants by the end of function and the one is out of available range.
82564f4 to
95e84b2
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) BOLTBOLT.AArch64/materialize-constant.sBOLT.X86/rodata-simpl-loads.testBOLT.X86/srol-bug.testIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
ec9d1a5 to
54f11ca
Compare
54f11ca to
4ef587a
Compare
|
@maksfb @yozhu @paschalis-mpeis please take a look when you have a time |
| uint32_t Offset = TargetAddress - DataSection->getAddress(); | ||
| StringRef ConstantData = DataSection->getContents(); | ||
| const InstructionListType Instrs = | ||
| MIB->materializeConstant(Inst, ConstantData, Offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For x86, could we keep the original code which directly update the Instr in place? The new code calling materializeConstant() will call replaceMemOperandWithImm(), create a new Instr and copy over the new instruction operand by operand, find the original instruction in basic block, and then replace it with the new instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use extra wrapper to unify for all platform, replaceMemOperandWithImm is used in another optimisations passes related to X86
| I = {8, 4, AArch64::MOVKXi}; | ||
| break; | ||
| default: | ||
| llvm_unreachable("unexpected ldr instruction"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be any other PC relative LDR instructions hit the default case, like byte or half-word load? Just in case, maybe replace llvm_unreachable() with a comment saying unsupported and then return Insts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find instructions related to byte or half wold, now only 2 instructions are handled.
we can analyse the apps to figure out which and how much are used, I faced only 4 and 8 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will be able to add extra logic as necessary
| b .LStub | ||
|
|
||
| // CI moved by emitCI function to the end of the function | ||
| // without materialization CI is outside available range (+/-1MB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "without" -> "Without"
|
|
||
| .size _start, .-_start | ||
|
|
||
| #--- materialize-ci-outside-func.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this test case is almost the same as the previous one, except for the many nop's under .LStub. Could we keep the first copy and wrap the .LStub block with #if defined(...) and #endif?
Thanks for the patch, and sorry for the delay on review. I left some comments in the code; and overall LGTM. |
4ef587a to
d80a5e6
Compare
|
@yozhu do we need one more verification pass to check that the distance is valid for target? |
b2d22b9 to
e96818d
Compare
This patch fixed the issue related to load literal for AArch64 (bolt/test/AArch64/materialize-constant.s), address range for literal is limited +/- 1MB, emitCI puts the constants by the end of function and the one is out of available range. SimplifyRODataLoads is enabled by default for X86 & AArch64 Signed-off-by: Moksyakov Alexey <[email protected]>
e96818d to
de57782
Compare
ldr reg, literal instruction is limited +/- 1MB range, emitCI put the constants by the end of function and the one is out of available range.