-
Notifications
You must be signed in to change notification settings - Fork 15
Description
As discovered recently by CKI, somewhere between df202b4 and 9886142 ftrace became broken on ppc64le clang-built kernels.
The first perf_event_open(2)
on the ftrace event succeeds and prints a message like the following to the console:
[ 503.027859] ftrace-powerpc: Unexpected call sequence at 0000000020516279: 00000000 00000000
[ 503.027870] ------------[ ftrace bug ]------------
[ 503.027871] ftrace failed to modify
[ 503.027873] [<c0080000010e002c>] ipmi_addr_src_to_str+0x24/0x70 [ipmi_msghandler]
[ 503.027885] actual: 08:00:00:48
[ 503.027894] Setting ftrace call site to call ftrace function
[ 503.027895] ftrace record flags: 80000001
[ 503.027897] (1)
[ 503.027897] expected tramp: c000000000078f94
[ 503.027904] ------------[ cut here ]------------
[ 503.027906] WARNING: CPU: 2 PID: 17017 at kernel/trace/ftrace.c:2084 ftrace_bug+0x154/0x310
[ 503.027954] Modules linked in: tun af_key crypto_user scsi_transport_iscsi xt_multiport ip_gre ip_tunnel gre bluetooth ecdh_generic sctp ip6_udp_tunnel udp_tunnel overlay xt_CONNSECMARK xt_SECMARK xt_conntrack nft_compat ah6 ah4 nft_objref nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip_tables vfat fat jfs rfkill bonding tls sunrpc pseries_rng crct10dif_vpmsum ibmveth drm fuse drm_panel_orientation_quirks zram xfs ibmvscsi nx_compress_pseries nx_compress scsi_transport_srp vmx_crypto crc32c_vpmsum ipmi_devintf ipmi_msghandler [last unloaded: setest_module_request]
[ 503.028105] CPU: 2 PID: 17017 Comm: perf_event Tainted: G OE 5.19.0-rc1 #1
[ 503.028114] NIP: c0000000002e8d14 LR: c0000000002e8df8 CTR: c000000000332f40
[ 503.028121] REGS: c0000000166ff5f0 TRAP: 0700 Tainted: G OE (5.19.0-rc1)
[ 503.028129] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 4a822022 XER: 00000000
[ 503.028157] CFAR: c0000000002e8dfc IRQMASK: 0
[ 503.028157] GPR00: c0000000002e8df8 c0000000166ff890 c000000002959000 0000000000000022
[ 503.028157] GPR04: 0000000000000001 0000000000000001 0000000000000027 ffffffffffffffff
[ 503.028157] GPR08: 0000000000000000 0000000000000000 0000000000000063 c0000000166ff6b0
[ 503.028157] GPR12: 0000000000800000 c00000001ecad680 0000000000000000 0000000000000000
[ 503.028157] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 503.028157] GPR20: c000000002b09000 c00000009ceece40 0000000000000000 0000000000000000
[ 503.028157] GPR24: c000000002993060 c000000002997898 c000000002b09000 c000000002ac9000
[ 503.028157] GPR28: ffffffffffffffea c0000000015611a1 c000000016540000 c0000000166ffb20
[ 503.028278] NIP [c0000000002e8d14] ftrace_bug+0x154/0x310
[ 503.028286] LR [c0000000002e8df8] ftrace_bug+0x238/0x310
[ 503.028293] Call Trace:
[ 503.028297] [c0000000166ff890] [c0000000002e8df8] ftrace_bug+0x238/0x310 (unreliable)
[ 503.028310] [c0000000166ff920] [c0000000002e9990] ftrace_modify_all_code+0x80/0x1e0
[ 503.028321] [c0000000166ff960] [c0000000000786c8] arch_ftrace_update_code+0x18/0x40
[ 503.028333] [c0000000166ff980] [c0000000002e9d0c] ftrace_startup+0x10c/0x1a0
[ 503.028343] [c0000000166ff9d0] [c0000000002eef0c] register_ftrace_function+0x7c/0xc0
[ 503.028354] [c0000000166ffa10] [c000000000332ebc] perf_ftrace_event_register+0x6c/0xf0
[ 503.028365] [c0000000166ffa50] [c0000000003321f0] perf_trace_event_init+0xb0/0x420
[ 503.028375] [c0000000166ffac0] [c0000000003320ec] perf_trace_init+0x10c/0x160
[ 503.028386] [c0000000166ffb20] [c0000000003fb694] perf_tp_event_init+0x54/0xa0
[ 503.028396] [c0000000166ffb50] [c0000000003fdf8c] perf_try_init_event+0x9c/0x1b0
[ 503.028407] [c0000000166ffba0] [c0000000003fd9b8] perf_init_event+0x118/0x200
[ 503.028417] [c0000000166ffc00] [c0000000003f268c] perf_event_alloc+0x41c/0x940
[ 503.028428] [c0000000166ffc90] [c0000000003f18f4] __do_sys_perf_event_open+0x2f4/0x9d0
[ 503.028439] [c0000000166ffda0] [c00000000002c6c0] system_call_exception+0x1b0/0x330
[ 503.028451] [c0000000166ffe10] [c00000000000c63c] system_call_common+0xec/0x250
[ 503.028462] --- interrupt: c00 at 0x7fff9a1bd4a4
[ 503.028470] NIP: 00007fff9a1bd4a4 LR: 0000000010010550 CTR: 0000000000000000
[ 503.028477] REGS: c0000000166ffe80 TRAP: 0c00 Tainted: G OE (5.19.0-rc1)
[ 503.028484] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 24002848 XER: 00000000
[ 503.028521] IRQMASK: 0
[ 503.028521] GPR00: 000000000000013f 00007ffffab17250 00007fff9a2c6e00 00007ffffab17388
[ 503.028521] GPR04: ffffffffffffffff 0000000000000000 ffffffffffffffff 0000000000000000
[ 503.028521] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
[ 503.028521] GPR12: 0000000000000000 00007fff9a40b360 0000000000000000 0000000000000000
[ 503.028521] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 503.028521] GPR20: 0000000000000000 0000000000000000 0000000000000000 000000001001057c
[ 503.028521] GPR24: 00007fff9a3ff7d0 00007fff9a400000 00007ffffab17888 0000000000000003
[ 503.028521] GPR28: 00007ffffab17bd8 00007ffffab17868 0000000000000003 00007ffffab17250
[ 503.028638] NIP [00007fff9a1bd4a4] 0x7fff9a1bd4a4
[ 503.028644] LR [0000000010010550] 0x10010550
[ 503.028651] --- interrupt: c00
[ 503.028656] Instruction dump:
[ 503.028663] 7fa5eb78 7fa6eb78 38635f64 3884a9d5 4bf29cdd 60000000 3c620017 806362b8
[ 503.028688] 3863ffff 28030003 4081000c 408a0028 <0fe00000> 7c6307b4 3c82fff8 3884d2b8
[ 503.028712] ---[ end trace 0000000000000000 ]---
Subsequent attempts to open the event then fail with ENODEV
. One can trigger such perf_event_open(2)
call as follows:
dnf install -y libselinux-devel # (or similar)
curl -Lo perf_event.c https://github.com/SELinuxProject/selinux-testsuite/raw/master/tests/perf_event/perf_event.c
cc -o perf_event perf_event.c -lselinux
./perf_event 0 1
Sample kernel config: http://s3.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/07/07/582362614/redhat:582362614/redhat:582362614_ppc64le/.config
Some of the clang versions with which the issue has been observed:
- clang version 14.0.0 (Fedora 14.0.0-1.fc37)
- clang version 14.0.5 (Fedora 14.0.5-2.fc37)
Given that both the last good and the first bad builds were built using an identical clang 14.0.0 build, it doesn't seem to be a regression in clang, rather some kernel change has caused this.