mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Merge latest upstream changes into master #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
update teams branch from upstream
also add checksums
Merge upstream gcc-mirror/gcc into sourceryinstitute/gcc
Merge sourceryinstitute/master into sourceryinstitute/teams
Merge master into download-opencoarrays-mpich
habemus-papadum
pushed a commit
to habemus-papadum/gcc
that referenced
this pull request
Nov 29, 2017
This patch improves alloca alignment. Currently alloca reserves too much space as it aligns twice, and generates unnecessary stack alignment code. When the requested alignment is lower than the stack alignment, no extra alignment is needed. If the requested alignment is higher, we need to increase the size by the difference of the requested alignment and the stack alignment. As a result, the alloca alignment is exactly as expected: alloca (16): sub sp, sp, gcc-mirror#16 mov x1, sp alloca (x): add x0, x0, 15 and x0, x0, -16 sub sp, sp, x0 mov x0, sp __builtin_alloca_with_align (x, 512): add x0, x0, 63 and x0, x0, -16 sub sp, sp, x0 add x0, sp, 63 and x0, x0, -64 gcc/ * explow.c (get_dynamic_stack_size): Improve dynamic alignment. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251713 138bc75d-0d04-0410-961f-82ee72b054a4
hubot
pushed a commit
that referenced
this pull request
Feb 8, 2018
The shrinkwrap optimization added in GCC 7 allows each callee-save to be delayed and done only across blocks which need a particular callee-save. Although this reduces unnecessary memory traffic on code paths that need few callee-saves, it typically uses LDR/STR rather than LDP/STP. This means more memory accesses and increased codesize, ~1.0% on average. To improve this, if a particular callee-save must be saved/restored, also add the adjacent callee-save to allow use of LDP/STP. This significantly reduces codesize (for example gcc_r, povray_r, parest_r, xalancbmk_r are 1% smaller). This is a simple fix which can be backported. A more advanced approach would scan blocks for pairs of callee-saves, but that requires a full rewrite of all the callee-save code which is too late at this stage. An example epilog in a shrinkwrapped function before: ldp x21, x22, [sp,#16] ldr x23, [sp,#32] ldr x24, [sp,#40] ldp x25, x26, [sp,#48] ldr x27, [sp,#64] ldr x28, [sp,#72] ldr x30, [sp,#80] ldr d8, [sp,#88] ldp x19, x20, [sp],#96 ret And after this patch: ldr d8, [sp,#88] ldp x21, x22, [sp,#16] ldp x23, x24, [sp,#32] ldp x25, x26, [sp,#48] ldp x27, x28, [sp,#64] ldr x30, [sp,#80] ldp x19, x20, [sp],#96 ret gcc/ * config/aarch64/aarch64.c (aarch64_components_for_bb): Increase LDP/STP opportunities by adding adjacent callee-saves. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@257482 138bc75d-0d04-0410-961f-82ee72b054a4
kraj
pushed a commit
to kraj/gcc
that referenced
this pull request
Oct 12, 2020
Prevents the following UBSAN error:
./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/torture/pr49770.C -O2 -c
/home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482:22: runtime error: load of value 2, which is not a valid value for type 'bool'
#0 0x1fdb4d1 in modref_tree<int>::merge(modref_tree<int>*, vec<modref_parm_map, va_heap, vl_ptr>*) /home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482
#1 0x1fcadaa in merge_call_side_effects(modref_summary*, gimple*, modref_summary*, bool) /home/marxin/Programming/gcc2/gcc/ipa-modref.c:511
gcc-mirror#2 0x1fcbadd in analyze_call /home/marxin/Programming/gcc2/gcc/ipa-modref.c:642
gcc-mirror#3 0x1fcc061 in analyze_stmt /home/marxin/Programming/gcc2/gcc/ipa-modref.c:732
gcc-mirror#4 0x1fccf31 in analyze_function /home/marxin/Programming/gcc2/gcc/ipa-modref.c:823
gcc-mirror#5 0x1fd17e5 in execute /home/marxin/Programming/gcc2/gcc/ipa-modref.c:1441
gcc-mirror#6 0x25cca6e in execute_one_pass(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2509
gcc-mirror#7 0x25cd39b in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2597
gcc-mirror#8 0x25cd450 in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2598
gcc-mirror#9 0x25cd4ee in execute_pass_list(function*, opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2608
gcc-mirror#10 0x25c7a5a in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/Programming/gcc2/gcc/passes.c:1726
gcc-mirror#11 0x25cfa3f in execute_ipa_pass_list(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2941
gcc-mirror#12 0x173572d in ipa_passes /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2642
gcc-mirror#13 0x17364ee in symbol_table::compile() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2777
gcc-mirror#14 0x17372d9 in symbol_table::finalize_compilation_unit() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:3022
gcc-mirror#15 0x2a1f00a in compile_file /home/marxin/Programming/gcc2/gcc/toplev.c:485
gcc-mirror#16 0x2a27dc8 in do_compile /home/marxin/Programming/gcc2/gcc/toplev.c:2321
gcc-mirror#17 0x2a283cc in toplev::main(int, char**) /home/marxin/Programming/gcc2/gcc/toplev.c:2460
gcc-mirror#18 0x54f21cd in main /home/marxin/Programming/gcc2/gcc/main.c:39
gcc-mirror#19 0x7ffff6f0de09 in __libc_start_main ../csu/libc-start.c:314
gcc-mirror#20 0x9eac09 in _start (/home/marxin/Programming/gcc2/objdir/gcc/cc1plus+0x9eac09)
gcc/ChangeLog:
* ipa-modref.c (merge_call_side_effects): Clear modref_parm_map
fields in the vector.
mablinov
pushed a commit
to mablinov/gcc
that referenced
this pull request
Oct 29, 2021
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
nstester
pushed a commit
to nstester/gcc
that referenced
this pull request
Nov 1, 2021
This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.
Basically consider the following case:
#include <stdint.h>
#include <arm_neon.h>
uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
uint64_t res = a | arr[0];
uint64x2_t val = vld1q_u64 (arr);
*rt = vaddq_u64 (val, b);
return res;
}
The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.
The code we generate for this however is quite sub-optimal:
test:
adrp x2, .LC0
sub sp, sp, gcc-mirror#16
ldr q1, [x2, #:lo12:.LC0]
mov x2, 16502
movk x2, 0x1023, lsl 16
movk x2, 0x4308, lsl 32
add v1.2d, v1.2d, v0.2d
movk x2, 0x942, lsl 48
orr x0, x0, x2
str q1, [x1]
add sp, sp, 16
ret
.LC0:
.xword 667169396713799798
.xword 667169396713799798
Essentially we materialize the same constant twice. The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find
<bb 2> :
arr[0] = 667169396713799798;
arr[1] = 667169396713799798;
res_7 = a_6(D) | 667169396713799798;
_16 = __builtin_aarch64_ld1v2di (&arr);
_17 = VIEW_CONVERT_EXPR<uint64x2_t>(_16);
_11 = b_10(D) + _17;
*rt_12(D) = _11;
arr ={v} {CLOBBER};
return res_7;
Which makes sense for further optimization. However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.
(insn 8 5 9 2 (set (reg:V2DI 99)
(const_vector:V2DI [
(const_int 667169396713799798 [0x942430810234076]) repeated x2
])) "cse.c":7:12 -1
(nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
(const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
(nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
(ior:DI (reg/v:DI 96 [ a ])
(reg:DI 103))) "cse.c":8:12 -1
(nil))
And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.
This will then trigger the re-materialization of the constant twice.
To fix this this patch extends CSE to be able to generate an extract for a
constant from another vector, or to make a vector for a constant by duplicating
another constant.
Whether this transformation is done or not depends entirely on the costing for
the target for the different constants and operations.
I Initially also investigated doing this in PRE, but PRE requires at least 2 BB
to work and does not currently have any way to remove redundancies within a
single BB and it did not look easy to support.
gcc/ChangeLog:
* cse.c (add_to_set): New.
(find_sets_in_insn): Register constants in sets.
(canonicalize_insn): Use auto_vec instead.
(cse_insn): Try materializing using vec_dup.
* rtl.h (simplify_context::simplify_gen_vec_select,
simplify_gen_vec_select): New.
* simplify-rtx.c (simplify_context::simplify_gen_vec_select): New.
mablinov
pushed a commit
to mablinov/gcc
that referenced
this pull request
Nov 10, 2021
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
fxcoudert
pushed a commit
to fxcoudert/gcc
that referenced
this pull request
Nov 23, 2021
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
xionghul
pushed a commit
to xionghul/gcc
that referenced
this pull request
Feb 22, 2022
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.
This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE. In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore. The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.
In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.
Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).
Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.
In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.
Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.
Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:
float a[32];
float fn1(int d) {
float c = 4.0f;
for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
c = 5.0f;
return c;
}
fn1:
ldr r3, .L3+48
vldr.64 d4, .L3 // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32 q0, [r3] // q0=a(0..3)
adds r3, r3, gcc-mirror#16
vcmp.f32 eq, q0, q2 // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32 q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32 eq, q1, q2 // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrs r2, P0 @ movhi
ands r3, r3, r2 // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16 // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32 // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1] // keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32 s11, s12
vmov s15, r2
vmov s14, r3
vmaxnm.f32 s15, s11, s15
vmaxnm.f32 s15, s15, s14
vmov s14, r0
vmaxnm.f32 s15, s15, s14
vmov r0, s15
bx lr
.L4:
.align 3
.L3:
.word 1073741824 // 2.0f
.word 1073741824
.word 1073741824
.word 1073741824
.word 1084227584 // 5.0f
.word 1084227584
.word 1084227584
.word 1084227584
.word 1082130432 // 4.0f
.word 1082130432
.word 1082130432
.word 1082130432
This patch adds tests that trigger an ICE without this fix.
The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks. In addition, since we should not
need these masks, the tests make sure they are not present.
Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.
2022-02-22 Christophe Lyon <[email protected]>
PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
(vec_cmpu<mode><MVE_vpred>): New.
(vcond_mask_<mode><MVE_vpred>): New.
* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
and disable for MVE.
* doc/sourcebuild.texi (arm_mve): Document new effective-target.
gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.
* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
* lib/target-supports.exp (check_effective_target_arm_mve): New.
catap
pushed a commit
to catap/gcc
that referenced
this pull request
Feb 22, 2022
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
markmentovai
pushed a commit
to markmentovai/gcc
that referenced
this pull request
Jun 13, 2022
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16.
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 3, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 4, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca)
catap
pushed a commit
to catap/gcc
that referenced
this pull request
May 4, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca) Signed-off-by: Kirill A. Korinsky <[email protected]>
catap
pushed a commit
to catap/gcc
that referenced
this pull request
Nov 12, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca) Signed-off-by: Kirill A. Korinsky <[email protected]>
catap
pushed a commit
to catap/gcc
that referenced
this pull request
Nov 14, 2023
darwinpcs, packs some stack items, which means that one cannot guarantee that they are aligned to DI. Check for these cases and reject PRFM instructions then. Note, that this generally results in use of an extra temporary reg. clang uses 'PRFUM' instructions in those cases, so we have a missed optimisation opportunity (low priority). fixes issue gcc-mirror#16. (cherry picked from commit 534aad5033dc224ed96118b67a84d496bba500ca) Signed-off-by: Kirill A. Korinsky <[email protected]>
NinaRanns
referenced
this pull request
in NinaRanns/gcc
Jul 31, 2024
infiniloop and contract_specifier_seq parsing
hubot
pushed a commit
that referenced
this pull request
Dec 6, 2024
This test would fail if GCC is configured with non-default options, such as -mtune=cortex-a9. This 'unexpected' scheduling makes the DLSTP optimization generate subs lr, #16 bhi .L4 lctp pop {r4, r5, pc} .L4: sub ip, ip, #16 b <loop-begin> instead of the expected sub ip, ip, #16 letp lr, <loop-begin> Although GCC still optimizes all 144 loops, only 96 use letp, 48 others use lctp. The patch simply forces -mtune=cortex-m55 to avoid this unexpected issue. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/dlstp-compile-asm-1.c: Add -mtune=cortex-m55
kees
added a commit
to kees/gcc
that referenced
this pull request
Aug 14, 2025
Implement AArch64-specific KCFI backend providing runtime validation of indirect function calls with ARM exception handling infrastructure. Core AArch64 KCFI Features: * Function preamble generation using .word directives for type ID storage at -4 byte offset from function entry point (no prefix NOPs needed due to 4-byte instruction alignment) * Enhanced debugging through ESR (Exception Syndrome Register) encoding in BRK instruction immediate values for precise failure analysis * Scratch register allocation using w16/w17 (x16/x17) following AArch64 procedure call standard for intra-procedure-call registers * Support for both regular calls (BLR) and sibling calls (BR) with appropriate register usage and jump instructions Assembly Code Generation: * Atomic bundled KCFI check + call/branch sequences using UNSPECV_KCFI_CHECK to prevent optimizer separation and maintain security properties * Constant loading for type IDs using MOV/MOVK instruction pairs for values requiring 32-bit representation * Direct comparison approach using CMP instruction for type validation without arithmetic operations (contrast with x86_64's additive approach) Assembly Code Pattern for AArch64: ldur w16, [target, #-4] ; Load actual type ID from preamble mov w17, #type_id_low ; Load expected type (lower 16 bits) movk w17, #type_id_high, lsl gcc-mirror#16 ; Load upper 16 bits if needed cmp w16, w17 ; Compare type IDs directly b.eq .Lpass ; Branch if types match .Ltrap: brk #esr_value ; Enhanced trap with register info .Lpass: blr/br target ; Execute validated indirect transfer ESR (Exception Syndrome Register) Integration: * BRK instruction immediate encoding format: 0x8000 | ((TypeIndex & 31) << 5) | (AddrIndex & 31) * TypeIndex indicates which W register contains expected type (W17 = 17) * AddrIndex indicates which X register contains target address (0-30) * Example: brk #33313 (0x8221) = expected type in W17, target address in X1 * Enables kernel exception handlers to precisely identify KCFI violation context * Supports advanced debugging and forensic analysis of control flow attacks AArch64-Specific Optimizations: * No prefix NOP calculation needed due to natural 4-byte instruction alignment * Type ID storage using single .word directive in function preambles * Register allocator integration via explicit w16/w17 clobber annotations * Function label emission coordination through ASM_OUTPUT_FUNCTION_LABEL macro redirection to aarch64_declare_function_name() for preamble integration * Support for large immediate values with MOV/MOVK instruction generation Target Hook Implementation: * aarch64_kcfi_calculate_prefix_nops(): Returns 0 (no alignment needed) * aarch64_kcfi_gen_checked_call(): Bundled check+call RTL generation * aarch64_kcfi_emit_type_id_instruction(): .word directive emission * aarch64_kcfi_add_clobbers(): w16/w17 register constraint management * Integration in aarch64_override_options() for initialization Machine Description Integration: * UNSPECV_KCFI_CHECK unspec for atomic check+call bundling * Support for both regular calls and sibling calls with distinct patterns * Runtime ESR value calculation for accurate register encoding * Clobber specifications for w16 (loaded type) and w17 (expected type) Security Properties: * Direct comparison-based type validation with immediate trap on mismatch * Enhanced exception context through ESR encoding for precise failure analysis * Tamper-resistant type ID storage with ARM exception infrastructure integration * Support for cross-compilation and accurate register allocation across targets Build and run tested with Linux kernel ARCH=arm64. Signed-off-by: Kees Cook <[email protected]>
kees
added a commit
to kees/gcc
that referenced
this pull request
Aug 19, 2025
Implement AArch64-specific KCFI backend providing runtime validation of indirect function calls with ARM exception handling infrastructure. Core AArch64 KCFI Features: * Function preamble generation using .word directives for type ID storage at -4 byte offset from function entry point (no prefix NOPs needed due to 4-byte instruction alignment) * Enhanced debugging through ESR (Exception Syndrome Register) encoding in BRK instruction immediate values for precise failure analysis * Scratch register allocation using w16/w17 (x16/x17) following AArch64 procedure call standard for intra-procedure-call registers * Support for both regular calls (BLR) and sibling calls (BR) with appropriate register usage and jump instructions Assembly Code Generation: * Atomic bundled KCFI check + call/branch sequences using UNSPECV_KCFI_CHECK to prevent optimizer separation and maintain security properties * Constant loading for type IDs using MOV/MOVK instruction pairs for values requiring 32-bit representation * Direct comparison approach using CMP instruction for type validation without arithmetic operations (contrast with x86_64's additive approach) Assembly Code Pattern for AArch64: ldur w16, [target, #-4] ; Load actual type ID from preamble mov w17, #type_id_low ; Load expected type (lower 16 bits) movk w17, #type_id_high, lsl gcc-mirror#16 ; Load upper 16 bits if needed cmp w16, w17 ; Compare type IDs directly b.eq .Lpass ; Branch if types match .Ltrap: brk #esr_value ; Enhanced trap with register info .Lpass: blr/br target ; Execute validated indirect transfer ESR (Exception Syndrome Register) Integration: * BRK instruction immediate encoding format: 0x8000 | ((TypeIndex & 31) << 5) | (AddrIndex & 31) * TypeIndex indicates which W register contains expected type (W17 = 17) * AddrIndex indicates which X register contains target address (0-30) * Example: brk #33313 (0x8221) = expected type in W17, target address in X1 * Enables kernel exception handlers to precisely identify KCFI violation context * Supports advanced debugging and forensic analysis of control flow attacks AArch64-Specific Optimizations: * No prefix NOP calculation needed due to natural 4-byte instruction alignment * Type ID storage using single .word directive in function preambles * Register allocator integration via explicit w16/w17 clobber annotations * Function label emission coordination through ASM_OUTPUT_FUNCTION_LABEL macro redirection to aarch64_declare_function_name() for preamble integration * Support for large immediate values with MOV/MOVK instruction generation Target Hook Implementation: * aarch64_kcfi_calculate_prefix_nops(): Returns 0 (no alignment needed) * aarch64_kcfi_gen_checked_call(): Bundled check+call RTL generation * aarch64_kcfi_emit_type_id_instruction(): .word directive emission * aarch64_kcfi_add_clobbers(): w16/w17 register constraint management * Integration in aarch64_override_options() for initialization Machine Description Integration: * UNSPECV_KCFI_CHECK unspec for atomic check+call bundling * Support for both regular calls and sibling calls with distinct patterns * Runtime ESR value calculation for accurate register encoding * Clobber specifications for w16 (loaded type) and w17 (expected type) Security Properties: * Direct comparison-based type validation with immediate trap on mismatch * Enhanced exception context through ESR encoding for precise failure analysis * Tamper-resistant type ID storage with ARM exception infrastructure integration * Support for cross-compilation and accurate register allocation across targets Build and run tested with Linux kernel ARCH=arm64. Signed-off-by: Kees Cook <[email protected]>
kees
added a commit
to kees/gcc
that referenced
this pull request
Sep 1, 2025
Implement AArch64-specific KCFI backend. - Function preamble generation using .word directives for type ID storage at offset from function entry point (no prefix NOPs needed due to 4-byte instruction alignment). - Trap debugging through ESR (Exception Syndrome Register) encoding in BRK instruction immediate values for precise failure analysis. - Scratch register allocation using w16/w17 (x16/x17) following AArch64 procedure call standard for intra-procedure-call registers. - Support for both regular calls (BLR) and sibling calls (BR) with appropriate register usage and jump instructions. - Atomic bundled KCFI check + call/branch sequences using UNSPECV_KCFI_CHECK to prevent optimizer separation and maintain security properties. Assembly Code Pattern for AArch64: ldur w16, [target, #-4] ; Load actual type ID from preamble mov w17, #type_id_low ; Load expected type (lower 16 bits) movk w17, #type_id_high, lsl gcc-mirror#16 ; Load upper 16 bits if needed cmp w16, w17 ; Compare type IDs directly b.eq .Lpass ; Branch if types match .Ltrap: brk #esr_value ; Enhanced trap with register info .Lpass: blr/br target ; Execute validated indirect transfer ESR (Exception Syndrome Register) Integration: - BRK instruction immediate encoding format: 0x8000 | ((TypeIndex & 31) << 5) | (AddrIndex & 31) - TypeIndex indicates which W register contains expected type (W17 = 17) - AddrIndex indicates which X register contains target address (0-30) - Example: brk #33313 (0x8221) = expected type in W17, target address in X1 Like x86, the callback initialization in aarch64_override_options() seem hacky. Is there a better place for this? Build and run tested with Linux kernel ARCH=arm64. Signed-off-by: Kees Cook <[email protected]>
kees
added a commit
to kees/gcc
that referenced
this pull request
Sep 5, 2025
Implement AArch64-specific KCFI backend. - Function preamble generation using .word directives for type ID storage at offset from function entry point (no default alignment NOPs needed due to fixed 4-byte instruction size). - Trap debugging through ESR (Exception Syndrome Register) encoding in BRK instruction immediate values. - Scratch register allocation using w16/w17 (x16/x17) following AArch64 procedure call standard for intra-procedure-call registers. Assembly Code Pattern for AArch64: ldur w16, [target, #-4] ; Load actual type ID from preamble mov w17, #type_id_low ; Load expected type (lower 16 bits) movk w17, #type_id_high, lsl gcc-mirror#16 ; Load upper 16 bits if needed cmp w16, w17 ; Compare type IDs directly b.eq .Lpass ; Branch if types match .Ltrap: brk #esr_value ; Enhanced trap with register info .Lpass: blr/br target ; Execute validated indirect transfer ESR (Exception Syndrome Register) Integration: - BRK instruction immediate encoding format: 0x8000 | ((TypeIndex & 31) << 5) | (AddrIndex & 31) - TypeIndex indicates which W register contains expected type (W17 = 17) - AddrIndex indicates which X register contains target address (0-30) - Example: brk #33313 (0x8221) = expected type in W17, target address in X1 Build and run tested with Linux kernel ARCH=arm64. gcc/ChangeLog: config/aarch64/aarch64-protos.h: Declare aarch64_indirect_branch_asm, and KCFI helpers. config/aarch64/aarch64.cc (aarch64_expand_call): Wrap CALLs in KCFI, with clobbers. (aarch64_indirect_branch_asm): New function, extract common logic for branch asm, like existing call asm helper. (aarch64_output_kcfi_insn): Emit KCFI assembly. config/aarch64/aarch64.md: Add KCFI RTL patterns and replace open-coded branch emission with aarch64_indirect_branch_asm. doc/invoke.texi: Document aarch64 nuances. Signed-off-by: Kees Cook <[email protected]>
kees
added a commit
to kees/gcc
that referenced
this pull request
Sep 5, 2025
Implement AArch64-specific KCFI backend. - Function preamble generation using .word directives for type ID storage at offset from function entry point (no default alignment NOPs needed due to fixed 4-byte instruction size). - Trap debugging through ESR (Exception Syndrome Register) encoding in BRK instruction immediate values. - Scratch register allocation using w16/w17 (x16/x17) following AArch64 procedure call standard for intra-procedure-call registers. Assembly Code Pattern for AArch64: ldur w16, [target, #-4] ; Load actual type ID from preamble mov w17, #type_id_low ; Load expected type (lower 16 bits) movk w17, #type_id_high, lsl gcc-mirror#16 ; Load upper 16 bits if needed cmp w16, w17 ; Compare type IDs directly b.eq .Lpass ; Branch if types match .Ltrap: brk #esr_value ; Enhanced trap with register info .Lpass: blr/br target ; Execute validated indirect transfer ESR (Exception Syndrome Register) Integration: - BRK instruction immediate encoding format: 0x8000 | ((TypeIndex & 31) << 5) | (AddrIndex & 31) - TypeIndex indicates which W register contains expected type (W17 = 17) - AddrIndex indicates which X register contains target address (0-30) - Example: brk #33313 (0x8221) = expected type in W17, target address in X1 Build and run tested with Linux kernel ARCH=arm64. gcc/ChangeLog: config/aarch64/aarch64-protos.h: Declare aarch64_indirect_branch_asm, and KCFI helpers. config/aarch64/aarch64.cc (aarch64_expand_call): Wrap CALLs in KCFI, with clobbers. (aarch64_indirect_branch_asm): New function, extract common logic for branch asm, like existing call asm helper. (aarch64_output_kcfi_insn): Emit KCFI assembly. config/aarch64/aarch64.md: Add KCFI RTL patterns and replace open-coded branch emission with aarch64_indirect_branch_asm. doc/invoke.texi: Document aarch64 nuances. Signed-off-by: Kees Cook <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.