Skip to content

Conversation

@PlaidCat
Copy link
Collaborator

Bug fix requested by SIG-CLOUD user to address users needs with GPUs, CUDA and HMM.
This is the integration and temporary fix to allow the HMM kselftest to not live in an infinite loop.

Internal Ticket: SECO-170

BUILD

[maple@r9-sigcloud-builder kernel-src-tree]$ ../kernel-tools/kernel_build.sh
/mnt/code/kernel-src-tree
  CLEAN   arch/x86/boot/compressed
  CLEAN   arch/x86/boot

  CLEAN   include/config include/generated arch/x86/include/generated .config .config.old .version Module.symvers certs/signing_key.pem certs/signing_key.x509 certs/x509.genkey
[TIMER]{MRPROPER}: 7s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-jmaple_cuda_bug_fix"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  LEX     scripts/kconfig/lexer.lex.c
  YACC    scripts/kconfig/parser.tab.[ch]
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/menu.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build

  BTF [M] sound/xen/snd_xen_front.ko
  BTF [M] virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1494s
Making Modules
  INSTALL /lib/modules/5.14.0-jmaple_cuda_bug_fix+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  STRIP   /lib/modules/5.14.0-jmaple_cuda_bug_fix+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  SIGN    /lib/modules/5.14.0-jmaple_cuda_bug_fix+/kernel/arch/x86/crypto/blake2s-x86_64.ko

  SIGN    /lib/modules/5.14.0-jmaple_cuda_bug_fix+/kernel/virt/lib/irqbypass.ko
  DEPMOD  /lib/modules/5.14.0-jmaple_cuda_bug_fix+
[TIMER]{MODULES}: 31s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-jmaple_cuda_bug_fix+ \
	arch/x86/boot/bzImage System.map "/boot"
[TIMER]{INSTALL}: 20s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-jmaple_cuda_bug_fix+ and Index to 4
The default is /boot/loader/entries/9d52c46c412f49e287269f012ed1d6ff-5.14.0-jmaple_cuda_bug_fix+.conf with index 4 and kernel /boot/vmlinuz-5.14.0-jmaple_cuda_bug_fix+
The default is /boot/loader/entries/9d52c46c412f49e287269f012ed1d6ff-5.14.0-jmaple_cuda_bug_fix+.conf with index 4 and kernel /boot/vmlinuz-5.14.0-jmaple_cuda_bug_fix+
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 7s
[TIMER]{BUILD}: 1494s
[TIMER]{MODULES}: 31s
[TIMER]{INSTALL}: 20s
[TIMER]{TOTAL} 1556s

kABI Check

See in the build log

Checking kABI
kABI check passed

Kernel Self Tests

There have been issues with kselftests for Rocky9 so in this case we're only testing the HMM situation. Which also has its own problems in TEARDOWN on Failure.

5 Runs per pre and post update

hmm_update_wrong.txt is intentionally wrong to show that there is data that is different

[jmaple@devbox code]$ for i in $(seq 1 $(( ${#files[@]} - 1)) ); do echo ${files[i]}; diff ${files[0]} ${files[i]}; done
hmm_default_test2.txt
hmm_default_test3.txt
hmm_default_test4.txt
hmm_default_test5.txt
hmm_update_test1.txt
hmm_update_test2.txt
hmm_update_test3.txt
hmm_update_test4.txt
hmm_update_test5.txt
hmm_update_wrong.txt
240a241
> # Maple changed below here
242,245c243,246
< #            OK  hmm2.hmm2_device_coherent.double_map
< ok 56 hmm2.hmm2_device_coherent.double_map
< # FAILED: 53 / 56 tests passed.
< # Totals: pass:51 fail:3 xfail:0 xpass:0 skip:2 error:0
---
> #            FAIL  hmm2.hmm2_device_coherent.double_map
> fail 56 hmm2.hmm2_device_coherent.double_map
> # FAILED: 52 / 56 tests passed.
> # Totals: pass:50 fail:4 xfail:0 xpass:0 skip:2 error:0

Example

[maple@r9-sigcloud-builder mm]$ cat /mnt/code/hmm_default_test1.txt
running: bash
--------------------------------
running bash ./test_hmm.sh smoke
--------------------------------
Running smoke test. Note, this test provides basic coverage.
TAP version 13
1..56
# Starting 56 tests from 5 test cases.
#  RUN           hmm.hmm_device_private.open_close ...
#            OK  hmm.hmm_device_private.open_close
ok 1 hmm.hmm_device_private.open_close
#  RUN           hmm.hmm_device_private.anon_read ...
#            OK  hmm.hmm_device_private.anon_read
ok 2 hmm.hmm_device_private.anon_read
#  RUN           hmm.hmm_device_private.anon_read_prot ...
#            OK  hmm.hmm_device_private.anon_read_prot
ok 3 hmm.hmm_device_private.anon_read_prot
#  RUN           hmm.hmm_device_private.anon_write ...
#            OK  hmm.hmm_device_private.anon_write
ok 4 hmm.hmm_device_private.anon_write
#  RUN           hmm.hmm_device_private.anon_write_prot ...
#            OK  hmm.hmm_device_private.anon_write_prot
ok 5 hmm.hmm_device_private.anon_write_prot
#  RUN           hmm.hmm_device_private.anon_write_child ...
#            OK  hmm.hmm_device_private.anon_write_child
ok 6 hmm.hmm_device_private.anon_write_child
#  RUN           hmm.hmm_device_private.anon_write_child_shared ...
#            OK  hmm.hmm_device_private.anon_write_child_shared
ok 7 hmm.hmm_device_private.anon_write_child_shared
#  RUN           hmm.hmm_device_private.anon_write_huge ...
#            OK  hmm.hmm_device_private.anon_write_huge
ok 8 hmm.hmm_device_private.anon_write_huge
#  RUN           hmm.hmm_device_private.anon_write_hugetlbfs ...
#            OK  hmm.hmm_device_private.anon_write_hugetlbfs
ok 9 # SKIP Huge page could not be allocated
#  RUN           hmm.hmm_device_private.file_read ...
#            OK  hmm.hmm_device_private.file_read
ok 10 hmm.hmm_device_private.file_read
#  RUN           hmm.hmm_device_private.file_write ...
#            OK  hmm.hmm_device_private.file_write
ok 11 hmm.hmm_device_private.file_write
#  RUN           hmm.hmm_device_private.migrate ...
#            OK  hmm.hmm_device_private.migrate
ok 12 hmm.hmm_device_private.migrate
#  RUN           hmm.hmm_device_private.migrate_fault ...
#            OK  hmm.hmm_device_private.migrate_fault
ok 13 hmm.hmm_device_private.migrate_fault
#  RUN           hmm.hmm_device_private.migrate_release ...
#            OK  hmm.hmm_device_private.migrate_release
ok 14 hmm.hmm_device_private.migrate_release
#  RUN           hmm.hmm_device_private.migrate_shared ...
#            OK  hmm.hmm_device_private.migrate_shared
ok 15 hmm.hmm_device_private.migrate_shared
#  RUN           hmm.hmm_device_private.migrate_multiple ...
#            OK  hmm.hmm_device_private.migrate_multiple
ok 16 hmm.hmm_device_private.migrate_multiple
#  RUN           hmm.hmm_device_private.anon_read_multiple ...
#            OK  hmm.hmm_device_private.anon_read_multiple
ok 17 hmm.hmm_device_private.anon_read_multiple
#  RUN           hmm.hmm_device_private.anon_teardown ...
#            OK  hmm.hmm_device_private.anon_teardown
ok 18 hmm.hmm_device_private.anon_teardown
#  RUN           hmm.hmm_device_private.mixedmap ...
#            OK  hmm.hmm_device_private.mixedmap
ok 19 hmm.hmm_device_private.mixedmap
#  RUN           hmm.hmm_device_private.compound ...
#            OK  hmm.hmm_device_private.compound
ok 20 hmm.hmm_device_private.compound
#  RUN           hmm.hmm_device_private.exclusive ...
#          FAIL  hmm.hmm_device_private.exclusive
not ok 21 hmm.hmm_device_private.exclusive
#  RUN           hmm.hmm_device_private.exclusive_mprotect ...
#          FAIL  hmm.hmm_device_private.exclusive_mprotect
not ok 22 hmm.hmm_device_private.exclusive_mprotect
#  RUN           hmm.hmm_device_private.exclusive_cow ...
#          FAIL  hmm.hmm_device_private.exclusive_cow
not ok 23 hmm.hmm_device_private.exclusive_cow
#  RUN           hmm.hmm_device_private.hmm_gup_test ...
#            OK  hmm.hmm_device_private.hmm_gup_test
ok 24 # SKIP Skipping test, could not find gup_test driver
#  RUN           hmm.hmm_device_private.hmm_cow_in_device ...
#            OK  hmm.hmm_device_private.hmm_cow_in_device
ok 25 hmm.hmm_device_private.hmm_cow_in_device
#  RUN           hmm.hmm_device_coherent.open_close ...
#            OK  hmm.hmm_device_coherent.open_close
ok 26 hmm.hmm_device_coherent.open_close
#  RUN           hmm.hmm_device_coherent.anon_read ...
#            OK  hmm.hmm_device_coherent.anon_read
ok 27 hmm.hmm_device_coherent.anon_read
#  RUN           hmm.hmm_device_coherent.anon_read_prot ...
#            OK  hmm.hmm_device_coherent.anon_read_prot
ok 28 hmm.hmm_device_coherent.anon_read_prot
#  RUN           hmm.hmm_device_coherent.anon_write ...
#            OK  hmm.hmm_device_coherent.anon_write
ok 29 hmm.hmm_device_coherent.anon_write
#  RUN           hmm.hmm_device_coherent.anon_write_prot ...
#            OK  hmm.hmm_device_coherent.anon_write_prot
ok 30 hmm.hmm_device_coherent.anon_write_prot
#  RUN           hmm.hmm_device_coherent.anon_write_child ...
#            OK  hmm.hmm_device_coherent.anon_write_child
ok 31 hmm.hmm_device_coherent.anon_write_child
#  RUN           hmm.hmm_device_coherent.anon_write_child_shared ...
#            OK  hmm.hmm_device_coherent.anon_write_child_shared
ok 32 hmm.hmm_device_coherent.anon_write_child_shared
#  RUN           hmm.hmm_device_coherent.anon_write_huge ...
#            OK  hmm.hmm_device_coherent.anon_write_huge
ok 33 hmm.hmm_device_coherent.anon_write_huge
#  RUN           hmm.hmm_device_coherent.anon_write_hugetlbfs ...
#            OK  hmm.hmm_device_coherent.anon_write_hugetlbfs
ok 34 hmm.hmm_device_coherent.anon_write_hugetlbfs
#  RUN           hmm.hmm_device_coherent.file_read ...
#            OK  hmm.hmm_device_coherent.file_read
ok 35 hmm.hmm_device_coherent.file_read
#  RUN           hmm.hmm_device_coherent.file_write ...
#            OK  hmm.hmm_device_coherent.file_write
ok 36 hmm.hmm_device_coherent.file_write
#  RUN           hmm.hmm_device_coherent.migrate ...
#            OK  hmm.hmm_device_coherent.migrate
ok 37 hmm.hmm_device_coherent.migrate
#  RUN           hmm.hmm_device_coherent.migrate_fault ...
#            OK  hmm.hmm_device_coherent.migrate_fault
ok 38 hmm.hmm_device_coherent.migrate_fault
#  RUN           hmm.hmm_device_coherent.migrate_release ...
#            OK  hmm.hmm_device_coherent.migrate_release
ok 39 hmm.hmm_device_coherent.migrate_release
#  RUN           hmm.hmm_device_coherent.migrate_shared ...
#            OK  hmm.hmm_device_coherent.migrate_shared
ok 40 hmm.hmm_device_coherent.migrate_shared
#  RUN           hmm.hmm_device_coherent.migrate_multiple ...
#            OK  hmm.hmm_device_coherent.migrate_multiple
ok 41 hmm.hmm_device_coherent.migrate_multiple
#  RUN           hmm.hmm_device_coherent.anon_read_multiple ...
#            OK  hmm.hmm_device_coherent.anon_read_multiple
ok 42 hmm.hmm_device_coherent.anon_read_multiple
#  RUN           hmm.hmm_device_coherent.anon_teardown ...
#            OK  hmm.hmm_device_coherent.anon_teardown
ok 43 hmm.hmm_device_coherent.anon_teardown
#  RUN           hmm.hmm_device_coherent.mixedmap ...
#            OK  hmm.hmm_device_coherent.mixedmap
ok 44 hmm.hmm_device_coherent.mixedmap
#  RUN           hmm.hmm_device_coherent.compound ...
#            OK  hmm.hmm_device_coherent.compound
ok 45 hmm.hmm_device_coherent.compound
#  RUN           hmm.hmm_device_coherent.exclusive ...
#            OK  hmm.hmm_device_coherent.exclusive
ok 46 hmm.hmm_device_coherent.exclusive
#  RUN           hmm.hmm_device_coherent.exclusive_mprotect ...
#            OK  hmm.hmm_device_coherent.exclusive_mprotect
ok 47 hmm.hmm_device_coherent.exclusive_mprotect
#  RUN           hmm.hmm_device_coherent.exclusive_cow ...
#            OK  hmm.hmm_device_coherent.exclusive_cow
ok 48 hmm.hmm_device_coherent.exclusive_cow
#  RUN           hmm.hmm_device_coherent.hmm_gup_test ...
#            OK  hmm.hmm_device_coherent.hmm_gup_test
ok 49 hmm.hmm_device_coherent.hmm_gup_test
#  RUN           hmm.hmm_device_coherent.hmm_cow_in_device ...
#            OK  hmm.hmm_device_coherent.hmm_cow_in_device
ok 50 hmm.hmm_device_coherent.hmm_cow_in_device
#  RUN           hmm2.hmm2_device_private.migrate_mixed ...
#            OK  hmm2.hmm2_device_private.migrate_mixed
ok 51 hmm2.hmm2_device_private.migrate_mixed
#  RUN           hmm2.hmm2_device_private.snapshot ...
#            OK  hmm2.hmm2_device_private.snapshot
ok 52 hmm2.hmm2_device_private.snapshot
#  RUN           hmm2.hmm2_device_private.double_map ...
#            OK  hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
#  RUN           hmm2.hmm2_device_coherent.migrate_mixed ...
#            OK  hmm2.hmm2_device_coherent.migrate_mixed
ok 54 hmm2.hmm2_device_coherent.migrate_mixed
#  RUN           hmm2.hmm2_device_coherent.snapshot ...
#            OK  hmm2.hmm2_device_coherent.snapshot
ok 55 hmm2.hmm2_device_coherent.snapshot
#  RUN           hmm2.hmm2_device_coherent.double_map ...
#            OK  hmm2.hmm2_device_coherent.double_map
ok 56 hmm2.hmm2_device_coherent.double_map
# FAILED: 53 / 56 tests passed.
# Totals: pass:51 fail:3 xfail:0 xpass:0 skip:2 error:0
[PASS]
0
SUMMARY: PASS=1 SKIP=0 FAIL=0

jira SECO-170
bugfix cuda hangs
commit-author Thomas Gleixner <[email protected]>
commit ea72ce5
upstream-diff Missing CONFIG_KMSAN commits in
arch/x86/include/asm/pgtable_64_types.h

iounmap() on x86 occasionally fails to unmap because the provided valid
ioremap address is not below high_memory. It turned out that this
happens due to KASLR.

KASLR uses the full address space between PAGE_OFFSET and vaddr_end to
randomize the starting points of the direct map, vmalloc and vmemmap
regions.  It thereby limits the size of the direct map by using the
installed memory size plus an extra configurable margin for hot-plug
memory.  This limitation is done to gain more randomization space
because otherwise only the holes between the direct map, vmalloc,
vmemmap and vaddr_end would be usable for randomizing.

The limited direct map size is not exposed to the rest of the kernel, so
the memory hot-plug and resource management related code paths still
operate under the assumption that the available address space can be
determined with MAX_PHYSMEM_BITS.

request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards.  That means the first allocation happens past the end of the
direct map and if unlucky this address is in the vmalloc space, which
causes high_memory to become greater than VMALLOC_START and consequently
causes iounmap() to fail for valid ioremap addresses.

MAX_PHYSMEM_BITS cannot be changed for that because the randomization
does not align with address bit boundaries and there are other places
which actually require to know the maximum number of address bits.  All
remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found
to be correct.

Cure this by exposing the end of the direct map via PHYSMEM_END and use
that for the memory hot-plug and resource management related places
instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END
maps to a variable which is initialized by the KASLR initialization and
otherwise it is based on MAX_PHYSMEM_BITS as before.

To prevent future hickups add a check into add_pages() to catch callers
trying to add memory above PHYSMEM_END.

Fixes: 0483e1f ("x86/mm: Implement ASLR for kernel memory regions")
	Reported-by: Max Ramanouski <[email protected]>
	Reported-by: Alistair Popple <[email protected]>
	Signed-off-by: Thomas Gleixner <[email protected]>
Tested-By: Max Ramanouski <[email protected]>
	Tested-by: Alistair Popple <[email protected]>
	Reviewed-by: Dan Williams <[email protected]>
	Reviewed-by: Alistair Popple <[email protected]>
	Reviewed-by: Kees Cook <[email protected]>
	Cc: [email protected]
Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
(cherry picked from commit ea72ce5)
	Signed-off-by: Jonathan Maple <[email protected]>
jira SECO-170

In Rocky9 if you run ./run_vmtests.sh -t hmm it will fail and cause an
infinite loop on ASSERTs in FIXTURE_TEARDOWN()
This temporary fix is based on the discussion here
https://patchwork.kernel.org/project/linux-kselftest/patch/[email protected]/#25046055

We will investigate further kselftest updates that will resolve the root
causes of this.
@PlaidCat PlaidCat requested a review from gvrose8192 October 23, 2024 20:51
Copy link

@gvrose8192 gvrose8192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@PlaidCat PlaidCat merged commit 9b46c72 into sig-cloud-9/5.14.0-427.40.1.el9_4 Oct 23, 2024
@PlaidCat PlaidCat deleted the jmaple_r9-sigcloud-cuda_bug_fix branch October 23, 2024 21:05
@PlaidCat PlaidCat restored the jmaple_r9-sigcloud-cuda_bug_fix branch October 24, 2024 14:26
@PlaidCat PlaidCat deleted the jmaple_r9-sigcloud-cuda_bug_fix branch October 24, 2024 14:26
PlaidCat added a commit that referenced this pull request Nov 1, 2024
jira LE-2015
cve CVE-2024-40904
Rebuild_History Non-Buildable kernel-5.14.0-427.42.1.el9_4
commit-author Alan Stern <[email protected]>
commit 22f0081

The syzbot fuzzer found that the interrupt-URB completion callback in
the cdc-wdm driver was taking too long, and the driver's immediate
resubmission of interrupt URBs with -EPROTO status combined with the
dummy-hcd emulation to cause a CPU lockup:

cdc_wdm 1-1:1.0: nonzero urb status received: -71
cdc_wdm 1-1:1.0: wdm_int_callback - 0 bytes
watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [syz-executor782:6625]
CPU#0 Utilization every 4s during lockup:
	#1:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#2:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#3:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#4:  98% system,	  0% softirq,	  3% hardirq,	  0% idle
	#5:  98% system,	  1% softirq,	  3% hardirq,	  0% idle
Modules linked in:
irq event stamp: 73096
hardirqs last  enabled at (73095): [<ffff80008037bc00>] console_emit_next_record kernel/printk/printk.c:2935 [inline]
hardirqs last  enabled at (73095): [<ffff80008037bc00>] console_flush_all+0x650/0xb74 kernel/printk/printk.c:2994
hardirqs last disabled at (73096): [<ffff80008af10b00>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline]
hardirqs last disabled at (73096): [<ffff80008af10b00>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551
softirqs last  enabled at (73048): [<ffff8000801ea530>] softirq_handle_end kernel/softirq.c:400 [inline]
softirqs last  enabled at (73048): [<ffff8000801ea530>] handle_softirqs+0xa60/0xc34 kernel/softirq.c:582
softirqs last disabled at (73043): [<ffff800080020de8>] __do_softirq+0x14/0x20 kernel/softirq.c:588
CPU: 0 PID: 6625 Comm: syz-executor782 Tainted: G        W          6.10.0-rc2-syzkaller-g8867bbd4a056 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024

Testing showed that the problem did not occur if the two error
messages -- the first two lines above -- were removed; apparently adding
material to the kernel log takes a surprisingly large amount of time.

In any case, the best approach for preventing these lockups and to
avoid spamming the log with thousands of error messages per second is
to ratelimit the two dev_err() calls.  Therefore we replace them with
dev_err_ratelimited().

	Signed-off-by: Alan Stern <[email protected]>
	Suggested-by: Greg KH <[email protected]>
Reported-and-tested-by: [email protected]
Closes: https://lore.kernel.org/linux-usb/[email protected]/
Reported-and-tested-by: [email protected]
Closes: https://lore.kernel.org/linux-usb/[email protected]/
Fixes: 9908a32 ("USB: remove err() macro from usb class drivers")
Link: https://lore.kernel.org/linux-usb/[email protected]/
	Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 22f0081)
	Signed-off-by: Jonathan Maple <[email protected]>
PlaidCat added a commit that referenced this pull request Dec 17, 2024
jira LE-2169
cve CVE-2024-38540
Rebuild_History Non-Buildable kernel-4.18.0-553.27.1.el8_10
commit-author Michal Schmidt <[email protected]>
commit 78cfd17
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.27.1.el8_10/78cfd171.failed

Undefined behavior is triggered when bnxt_qplib_alloc_init_hwq is called
with hwq_attr->aux_depth != 0 and hwq_attr->aux_stride == 0.
In that case, "roundup_pow_of_two(hwq_attr->aux_stride)" gets called.
roundup_pow_of_two is documented as undefined for 0.

Fix it in the one caller that had this combination.

The undefined behavior was detected by UBSAN:
  UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
  shift exponent 64 is too large for 64-bit type 'long unsigned int'
  CPU: 24 PID: 1075 Comm: (udev-worker) Not tainted 6.9.0-rc6+ #4
  Hardware name: Abacus electric, s.r.o. - [email protected] Super Server/H12SSW-iN, BIOS 2.7 10/25/2023
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   ubsan_epilogue+0x5/0x30
   __ubsan_handle_shift_out_of_bounds.cold+0x61/0xec
   __roundup_pow_of_two+0x25/0x35 [bnxt_re]
   bnxt_qplib_alloc_init_hwq+0xa1/0x470 [bnxt_re]
   bnxt_qplib_create_qp+0x19e/0x840 [bnxt_re]
   bnxt_re_create_qp+0x9b1/0xcd0 [bnxt_re]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __kmalloc+0x1b6/0x4f0
   ? create_qp.part.0+0x128/0x1c0 [ib_core]
   ? __pfx_bnxt_re_create_qp+0x10/0x10 [bnxt_re]
   create_qp.part.0+0x128/0x1c0 [ib_core]
   ib_create_qp_kernel+0x50/0xd0 [ib_core]
   create_mad_qp+0x8e/0xe0 [ib_core]
   ? __pfx_qp_event_handler+0x10/0x10 [ib_core]
   ib_mad_init_device+0x2be/0x680 [ib_core]
   add_client_context+0x10d/0x1a0 [ib_core]
   enable_device_and_get+0xe0/0x1d0 [ib_core]
   ib_register_device+0x53c/0x630 [ib_core]
   ? srso_alias_return_thunk+0x5/0xfbef5
   bnxt_re_probe+0xbd8/0xe50 [bnxt_re]
   ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
   auxiliary_bus_probe+0x49/0x80
   ? driver_sysfs_add+0x57/0xc0
   really_probe+0xde/0x340
   ? pm_runtime_barrier+0x54/0x90
   ? __pfx___driver_attach+0x10/0x10
   __driver_probe_device+0x78/0x110
   driver_probe_device+0x1f/0xa0
   __driver_attach+0xba/0x1c0
   bus_for_each_dev+0x8f/0xe0
   bus_add_driver+0x146/0x220
   driver_register+0x72/0xd0
   __auxiliary_driver_register+0x6e/0xd0
   ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
   bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
   ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
   do_one_initcall+0x5b/0x310
   do_init_module+0x90/0x250
   init_module_from_file+0x86/0xc0
   idempotent_init_module+0x121/0x2b0
   __x64_sys_finit_module+0x5e/0xb0
   do_syscall_64+0x82/0x160
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? syscall_exit_to_user_mode_prepare+0x149/0x170
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? syscall_exit_to_user_mode+0x75/0x230
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? do_syscall_64+0x8e/0x160
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __count_memcg_events+0x69/0x100
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? count_memcg_events.constprop.0+0x1a/0x30
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? handle_mm_fault+0x1f0/0x300
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? do_user_addr_fault+0x34e/0x640
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f4e5132821d
  Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 db 0c 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffca9c906a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
  RAX: ffffffffffffffda RBX: 0000563ec8a8f130 RCX: 00007f4e5132821d
  RDX: 0000000000000000 RSI: 00007f4e518fa07d RDI: 000000000000003b
  RBP: 00007ffca9c90760 R08: 00007f4e513f6b20 R09: 00007ffca9c906f0
  R10: 0000563ec8a8faa0 R11: 0000000000000246 R12: 00007f4e518fa07d
  R13: 0000000000020000 R14: 0000563ec8409e90 R15: 0000563ec8a8fa60
   </TASK>
  ---[ end trace ]---

Fixes: 0c4dcd6 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
	Signed-off-by: Michal Schmidt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Acked-by: Selvin Xavier <[email protected]>
	Signed-off-by: Leon Romanovsky <[email protected]>
(cherry picked from commit 78cfd17)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/infiniband/hw/bnxt_re/qplib_fp.c
PlaidCat pushed a commit that referenced this pull request Dec 19, 2024
Its used from trace__run(), for the 'perf trace' live mode, i.e. its
strace-like, non-perf.data file processing mode, the most common one.

The trace__run() function will set trace->host using machine__new_host()
that is supposed to give a machine instance representing the running
machine, and since we'll use perf_env__arch_strerrno() to get the right
errno -> string table, we need to use machine->env, so initialize it in
machine__new_host().

Before the patch:

  (gdb) run trace --errno-summary -a sleep 1
  <SNIP>
   Summary of events:

   gvfs-afc-volume (3187), 2 events, 0.0%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     pselect6               1      0     0.000     0.000     0.000     0.000      0.00%

   GUsbEventThread (3519), 2 events, 0.0%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     poll                   1      0     0.000     0.000     0.000     0.000      0.00%
  <SNIP>
  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
  478		if (env->arch_strerrno == NULL)
  (gdb) bt
  #0  0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478
  #1  0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673
  #2  0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708
  #3  0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747
  #4  0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456
  #5  0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487
  #6  0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351
  #7  0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404
  #8  0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448
  #9  0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560
  (gdb)

After:

  root@number:~# perf trace -a --errno-summary sleep 1
  <SNIP>
     pw-data-loop (2685), 1410 events, 16.0%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     epoll_wait           188      0   983.428     0.000     5.231    15.595      8.68%
     ioctl                 94      0     0.811     0.004     0.009     0.016      2.82%
     read                 188      0     0.322     0.001     0.002     0.006      5.15%
     write                141      0     0.280     0.001     0.002     0.018      8.39%
     timerfd_settime       94      0     0.138     0.001     0.001     0.007      6.47%

   gnome-control-c (179406), 1848 events, 20.9%

     syscall            calls  errors  total       min       avg       max       stddev
                                       (msec)    (msec)    (msec)    (msec)        (%)
     --------------- --------  ------ -------- --------- --------- ---------     ------
     poll                 222      0   959.577     0.000     4.322    21.414     11.40%
     recvmsg              150      0     0.539     0.001     0.004     0.013      5.12%
     write                300      0     0.442     0.001     0.001     0.007      3.29%
     read                 150      0     0.183     0.001     0.001     0.009      5.53%
     getpid               102      0     0.101     0.000     0.001     0.008      7.82%

  root@number:~#

Fixes: 54373b5 ("perf env: Introduce perf_env__arch_strerrno()")
Reported-by: Veronika Molnarova <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Veronika Molnarova <[email protected]>
Acked-by: Michael Petlan <[email protected]>
Tested-by: Michael Petlan <[email protected]>
Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1
Signed-off-by: Namhyung Kim <[email protected]>
PlaidCat pushed a commit that referenced this pull request Dec 19, 2024
…s_lock

For storing a value to a queue attribute, the queue_attr_store function
first freezes the queue (->q_usage_counter(io)) and then acquire
->sysfs_lock. This seems not correct as the usual ordering should be to
acquire ->sysfs_lock before freezing the queue. This incorrect ordering
causes the following lockdep splat which we are able to reproduce always
simply by accessing /sys/kernel/debug file using ls command:

[   57.597146] WARNING: possible circular locking dependency detected
[   57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G        W
[   57.597162] ------------------------------------------------------
[   57.597168] ls/4605 is trying to acquire lock:
[   57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0
[   57.597200]
               but task is already holding lock:
[   57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4
[   57.597226]
               which lock already depends on the new lock.

[   57.597233]
               the existing dependency chain (in reverse order) is:
[   57.597241]
               -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[   57.597255]        down_write+0x6c/0x18c
[   57.597264]        start_creating+0xb4/0x24c
[   57.597274]        debugfs_create_dir+0x2c/0x1e8
[   57.597283]        blk_register_queue+0xec/0x294
[   57.597292]        add_disk_fwnode+0x2e4/0x548
[   57.597302]        brd_alloc+0x2c8/0x338
[   57.597309]        brd_init+0x100/0x178
[   57.597317]        do_one_initcall+0x88/0x3e4
[   57.597326]        kernel_init_freeable+0x3cc/0x6e0
[   57.597334]        kernel_init+0x34/0x1cc
[   57.597342]        ret_from_kernel_user_thread+0x14/0x1c
[   57.597350]
               -> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
[   57.597362]        __mutex_lock+0xfc/0x12a0
[   57.597370]        blk_register_queue+0xd4/0x294
[   57.597379]        add_disk_fwnode+0x2e4/0x548
[   57.597388]        brd_alloc+0x2c8/0x338
[   57.597395]        brd_init+0x100/0x178
[   57.597402]        do_one_initcall+0x88/0x3e4
[   57.597410]        kernel_init_freeable+0x3cc/0x6e0
[   57.597418]        kernel_init+0x34/0x1cc
[   57.597426]        ret_from_kernel_user_thread+0x14/0x1c
[   57.597434]
               -> #3 (&q->sysfs_lock){+.+.}-{4:4}:
[   57.597446]        __mutex_lock+0xfc/0x12a0
[   57.597454]        queue_attr_store+0x9c/0x110
[   57.597462]        sysfs_kf_write+0x70/0xb0
[   57.597471]        kernfs_fop_write_iter+0x1b0/0x2ac
[   57.597480]        vfs_write+0x3dc/0x6e8
[   57.597488]        ksys_write+0x84/0x140
[   57.597495]        system_call_exception+0x130/0x360
[   57.597504]        system_call_common+0x160/0x2c4
[   57.597516]
               -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}:
[   57.597530]        __submit_bio+0x5ec/0x828
[   57.597538]        submit_bio_noacct_nocheck+0x1e4/0x4f0
[   57.597547]        iomap_readahead+0x2a0/0x448
[   57.597556]        xfs_vm_readahead+0x28/0x3c
[   57.597564]        read_pages+0x88/0x41c
[   57.597571]        page_cache_ra_unbounded+0x1ac/0x2d8
[   57.597580]        filemap_get_pages+0x188/0x984
[   57.597588]        filemap_read+0x13c/0x4bc
[   57.597596]        xfs_file_buffered_read+0x88/0x17c
[   57.597605]        xfs_file_read_iter+0xac/0x158
[   57.597614]        vfs_read+0x2d4/0x3b4
[   57.597622]        ksys_read+0x84/0x144
[   57.597629]        system_call_exception+0x130/0x360
[   57.597637]        system_call_common+0x160/0x2c4
[   57.597647]
               -> #1 (mapping.invalidate_lock#2){++++}-{4:4}:
[   57.597661]        down_read+0x6c/0x220
[   57.597669]        filemap_fault+0x870/0x100c
[   57.597677]        xfs_filemap_fault+0xc4/0x18c
[   57.597684]        __do_fault+0x64/0x164
[   57.597693]        __handle_mm_fault+0x1274/0x1dac
[   57.597702]        handle_mm_fault+0x248/0x484
[   57.597711]        ___do_page_fault+0x428/0xc0c
[   57.597719]        hash__do_page_fault+0x30/0x68
[   57.597727]        do_hash_fault+0x90/0x35c
[   57.597736]        data_access_common_virt+0x210/0x220
[   57.597745]        _copy_from_user+0xf8/0x19c
[   57.597754]        sel_write_load+0x178/0xd54
[   57.597762]        vfs_write+0x108/0x6e8
[   57.597769]        ksys_write+0x84/0x140
[   57.597777]        system_call_exception+0x130/0x360
[   57.597785]        system_call_common+0x160/0x2c4
[   57.597794]
               -> #0 (&mm->mmap_lock){++++}-{4:4}:
[   57.597806]        __lock_acquire+0x17cc/0x2330
[   57.597814]        lock_acquire+0x138/0x400
[   57.597822]        __might_fault+0x7c/0xc0
[   57.597830]        filldir64+0xe8/0x390
[   57.597839]        dcache_readdir+0x80/0x2d4
[   57.597846]        iterate_dir+0xd8/0x1d4
[   57.597855]        sys_getdents64+0x88/0x2d4
[   57.597864]        system_call_exception+0x130/0x360
[   57.597872]        system_call_common+0x160/0x2c4
[   57.597881]
               other info that might help us debug this:

[   57.597888] Chain exists of:
                 &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3

[   57.597905]  Possible unsafe locking scenario:

[   57.597911]        CPU0                    CPU1
[   57.597917]        ----                    ----
[   57.597922]   rlock(&sb->s_type->i_mutex_key#3);
[   57.597932]                                lock(&q->debugfs_mutex);
[   57.597940]                                lock(&sb->s_type->i_mutex_key#3);
[   57.597950]   rlock(&mm->mmap_lock);
[   57.597958]
                *** DEADLOCK ***

[   57.597965] 2 locks held by ls/4605:
[   57.597971]  #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154
[   57.597989]  #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4

Prevent the above lockdep warning by acquiring ->sysfs_lock before
freezing the queue while storing a queue attribute in queue_attr_store
function. Later, we also found[1] another function __blk_mq_update_nr_
hw_queues where we first freeze queue and then acquire the ->sysfs_lock.
So we've also updated lock ordering in __blk_mq_update_nr_hw_queues
function and ensured that in all code paths we follow the correct lock
ordering i.e. acquire ->sysfs_lock before freezing the queue.

[1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/

Reported-by: [email protected]
Fixes: af28141 ("block: freeze the queue in queue_attr_store")
Tested-by: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Nilay Shroff <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
gvrose8192 added a commit that referenced this pull request Jan 13, 2025
jira VULN-6730
cve CVE-2023-4921
commit-author valis <[email protected]>
commit 8fc134f

When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:

  tc qdisc add dev lo root handle 1: qfq
  tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
  tc qdisc add dev lo parent 1:1 handle 2: plug
  tc filter add dev lo parent 1: basic classid 1:1
  ping -c1 127.0.0.1

and boom:

[  285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[  285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[  285.355903]
[  285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[  285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  285.358376] Call Trace:
[  285.358773]  <IRQ>
[  285.359109]  dump_stack_lvl+0x44/0x60
[  285.359708]  print_address_description.constprop.0+0x2c/0x3c0
[  285.360611]  kasan_report+0x10c/0x120
[  285.361195]  ? qfq_dequeue+0xa7/0x7f0
[  285.361780]  qfq_dequeue+0xa7/0x7f0
[  285.362342]  __qdisc_run+0xf1/0x970
[  285.362903]  net_tx_action+0x28e/0x460
[  285.363502]  __do_softirq+0x11b/0x3de
[  285.364097]  do_softirq.part.0+0x72/0x90
[  285.364721]  </IRQ>
[  285.365072]  <TASK>
[  285.365422]  __local_bh_enable_ip+0x77/0x90
[  285.366079]  __dev_queue_xmit+0x95f/0x1550
[  285.366732]  ? __pfx_csum_and_copy_from_iter+0x10/0x10
[  285.367526]  ? __pfx___dev_queue_xmit+0x10/0x10
[  285.368259]  ? __build_skb_around+0x129/0x190
[  285.368960]  ? ip_generic_getfrag+0x12c/0x170
[  285.369653]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  285.370390]  ? csum_partial+0x8/0x20
[  285.370961]  ? raw_getfrag+0xe5/0x140
[  285.371559]  ip_finish_output2+0x539/0xa40
[  285.372222]  ? __pfx_ip_finish_output2+0x10/0x10
[  285.372954]  ip_output+0x113/0x1e0
[  285.373512]  ? __pfx_ip_output+0x10/0x10
[  285.374130]  ? icmp_out_count+0x49/0x60
[  285.374739]  ? __pfx_ip_finish_output+0x10/0x10
[  285.375457]  ip_push_pending_frames+0xf3/0x100
[  285.376173]  raw_sendmsg+0xef5/0x12d0
[  285.376760]  ? do_syscall_64+0x40/0x90
[  285.377359]  ? __static_call_text_end+0x136578/0x136578
[  285.378173]  ? do_syscall_64+0x40/0x90
[  285.378772]  ? kasan_enable_current+0x11/0x20
[  285.379469]  ? __pfx_raw_sendmsg+0x10/0x10
[  285.380137]  ? __sock_create+0x13e/0x270
[  285.380673]  ? __sys_socket+0xf3/0x180
[  285.381174]  ? __x64_sys_socket+0x3d/0x50
[  285.381725]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.382425]  ? __rcu_read_unlock+0x48/0x70
[  285.382975]  ? ip4_datagram_release_cb+0xd8/0x380
[  285.383608]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[  285.384295]  ? preempt_count_sub+0x14/0xc0
[  285.384844]  ? __list_del_entry_valid+0x76/0x140
[  285.385467]  ? _raw_spin_lock_bh+0x87/0xe0
[  285.386014]  ? __pfx__raw_spin_lock_bh+0x10/0x10
[  285.386645]  ? release_sock+0xa0/0xd0
[  285.387148]  ? preempt_count_sub+0x14/0xc0
[  285.387712]  ? freeze_secondary_cpus+0x348/0x3c0
[  285.388341]  ? aa_sk_perm+0x177/0x390
[  285.388856]  ? __pfx_aa_sk_perm+0x10/0x10
[  285.389441]  ? check_stack_object+0x22/0x70
[  285.390032]  ? inet_send_prepare+0x2f/0x120
[  285.390603]  ? __pfx_inet_sendmsg+0x10/0x10
[  285.391172]  sock_sendmsg+0xcc/0xe0
[  285.391667]  __sys_sendto+0x190/0x230
[  285.392168]  ? __pfx___sys_sendto+0x10/0x10
[  285.392727]  ? kvm_clock_get_cycles+0x14/0x30
[  285.393328]  ? set_normalized_timespec64+0x57/0x70
[  285.393980]  ? _raw_spin_unlock_irq+0x1b/0x40
[  285.394578]  ? __x64_sys_clock_gettime+0x11c/0x160
[  285.395225]  ? __pfx___x64_sys_clock_gettime+0x10/0x10
[  285.395908]  ? _copy_to_user+0x3e/0x60
[  285.396432]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.397086]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.397734]  ? do_syscall_64+0x71/0x90
[  285.398258]  __x64_sys_sendto+0x74/0x90
[  285.398786]  do_syscall_64+0x64/0x90
[  285.399273]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.399949]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.400605]  ? do_syscall_64+0x71/0x90
[  285.401124]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.401807] RIP: 0033:0x495726
[  285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[  285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[  285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[  285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[  285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[  285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[  285.410403]  </TASK>
[  285.410704]
[  285.410929] Allocated by task 144:
[  285.411402]  kasan_save_stack+0x1e/0x40
[  285.411926]  kasan_set_track+0x21/0x30
[  285.412442]  __kasan_slab_alloc+0x55/0x70
[  285.412973]  kmem_cache_alloc_node+0x187/0x3d0
[  285.413567]  __alloc_skb+0x1b4/0x230
[  285.414060]  __ip_append_data+0x17f7/0x1b60
[  285.414633]  ip_append_data+0x97/0xf0
[  285.415144]  raw_sendmsg+0x5a8/0x12d0
[  285.415640]  sock_sendmsg+0xcc/0xe0
[  285.416117]  __sys_sendto+0x190/0x230
[  285.416626]  __x64_sys_sendto+0x74/0x90
[  285.417145]  do_syscall_64+0x64/0x90
[  285.417624]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.418306]
[  285.418531] Freed by task 144:
[  285.418960]  kasan_save_stack+0x1e/0x40
[  285.419469]  kasan_set_track+0x21/0x30
[  285.419988]  kasan_save_free_info+0x27/0x40
[  285.420556]  ____kasan_slab_free+0x109/0x1a0
[  285.421146]  kmem_cache_free+0x1c2/0x450
[  285.421680]  __netif_receive_skb_core+0x2ce/0x1870
[  285.422333]  __netif_receive_skb_one_core+0x97/0x140
[  285.423003]  process_backlog+0x100/0x2f0
[  285.423537]  __napi_poll+0x5c/0x2d0
[  285.424023]  net_rx_action+0x2be/0x560
[  285.424510]  __do_softirq+0x11b/0x3de
[  285.425034]
[  285.425254] The buggy address belongs to the object at ffff8880bad31280
[  285.425254]  which belongs to the cache skbuff_head_cache of size 224
[  285.426993] The buggy address is located 40 bytes inside of
[  285.426993]  freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[  285.428572]
[  285.428798] The buggy address belongs to the physical page:
[  285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[  285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[  285.431447] page_type: 0xffffffff()
[  285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
[  285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[  285.433562] page dumped because: kasan: bad access detected
[  285.434144]
[  285.434320] Memory state around the buggy address:
[  285.434828]  ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.435580]  ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  285.436777]                                   ^
[  285.437106]  ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  285.437616]  ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.438126] ==================================================================
[  285.438662] Disabling lock debugging due to kernel taint

Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
	Reported-by: valis <[email protected]>
	Signed-off-by: valis <[email protected]>
	Signed-off-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit 8fc134f)
	Signed-off-by: Greg Rose <[email protected]>
gvrose8192 added a commit that referenced this pull request Jan 13, 2025
jira VULN-6730
cve CVE-2023-4921
commit-author valis <[email protected]>
commit 8fc134f

When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:

  tc qdisc add dev lo root handle 1: qfq
  tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
  tc qdisc add dev lo parent 1:1 handle 2: plug
  tc filter add dev lo parent 1: basic classid 1:1
  ping -c1 127.0.0.1

and boom:

[  285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[  285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[  285.355903]
[  285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[  285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  285.358376] Call Trace:
[  285.358773]  <IRQ>
[  285.359109]  dump_stack_lvl+0x44/0x60
[  285.359708]  print_address_description.constprop.0+0x2c/0x3c0
[  285.360611]  kasan_report+0x10c/0x120
[  285.361195]  ? qfq_dequeue+0xa7/0x7f0
[  285.361780]  qfq_dequeue+0xa7/0x7f0
[  285.362342]  __qdisc_run+0xf1/0x970
[  285.362903]  net_tx_action+0x28e/0x460
[  285.363502]  __do_softirq+0x11b/0x3de
[  285.364097]  do_softirq.part.0+0x72/0x90
[  285.364721]  </IRQ>
[  285.365072]  <TASK>
[  285.365422]  __local_bh_enable_ip+0x77/0x90
[  285.366079]  __dev_queue_xmit+0x95f/0x1550
[  285.366732]  ? __pfx_csum_and_copy_from_iter+0x10/0x10
[  285.367526]  ? __pfx___dev_queue_xmit+0x10/0x10
[  285.368259]  ? __build_skb_around+0x129/0x190
[  285.368960]  ? ip_generic_getfrag+0x12c/0x170
[  285.369653]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  285.370390]  ? csum_partial+0x8/0x20
[  285.370961]  ? raw_getfrag+0xe5/0x140
[  285.371559]  ip_finish_output2+0x539/0xa40
[  285.372222]  ? __pfx_ip_finish_output2+0x10/0x10
[  285.372954]  ip_output+0x113/0x1e0
[  285.373512]  ? __pfx_ip_output+0x10/0x10
[  285.374130]  ? icmp_out_count+0x49/0x60
[  285.374739]  ? __pfx_ip_finish_output+0x10/0x10
[  285.375457]  ip_push_pending_frames+0xf3/0x100
[  285.376173]  raw_sendmsg+0xef5/0x12d0
[  285.376760]  ? do_syscall_64+0x40/0x90
[  285.377359]  ? __static_call_text_end+0x136578/0x136578
[  285.378173]  ? do_syscall_64+0x40/0x90
[  285.378772]  ? kasan_enable_current+0x11/0x20
[  285.379469]  ? __pfx_raw_sendmsg+0x10/0x10
[  285.380137]  ? __sock_create+0x13e/0x270
[  285.380673]  ? __sys_socket+0xf3/0x180
[  285.381174]  ? __x64_sys_socket+0x3d/0x50
[  285.381725]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.382425]  ? __rcu_read_unlock+0x48/0x70
[  285.382975]  ? ip4_datagram_release_cb+0xd8/0x380
[  285.383608]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[  285.384295]  ? preempt_count_sub+0x14/0xc0
[  285.384844]  ? __list_del_entry_valid+0x76/0x140
[  285.385467]  ? _raw_spin_lock_bh+0x87/0xe0
[  285.386014]  ? __pfx__raw_spin_lock_bh+0x10/0x10
[  285.386645]  ? release_sock+0xa0/0xd0
[  285.387148]  ? preempt_count_sub+0x14/0xc0
[  285.387712]  ? freeze_secondary_cpus+0x348/0x3c0
[  285.388341]  ? aa_sk_perm+0x177/0x390
[  285.388856]  ? __pfx_aa_sk_perm+0x10/0x10
[  285.389441]  ? check_stack_object+0x22/0x70
[  285.390032]  ? inet_send_prepare+0x2f/0x120
[  285.390603]  ? __pfx_inet_sendmsg+0x10/0x10
[  285.391172]  sock_sendmsg+0xcc/0xe0
[  285.391667]  __sys_sendto+0x190/0x230
[  285.392168]  ? __pfx___sys_sendto+0x10/0x10
[  285.392727]  ? kvm_clock_get_cycles+0x14/0x30
[  285.393328]  ? set_normalized_timespec64+0x57/0x70
[  285.393980]  ? _raw_spin_unlock_irq+0x1b/0x40
[  285.394578]  ? __x64_sys_clock_gettime+0x11c/0x160
[  285.395225]  ? __pfx___x64_sys_clock_gettime+0x10/0x10
[  285.395908]  ? _copy_to_user+0x3e/0x60
[  285.396432]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.397086]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.397734]  ? do_syscall_64+0x71/0x90
[  285.398258]  __x64_sys_sendto+0x74/0x90
[  285.398786]  do_syscall_64+0x64/0x90
[  285.399273]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.399949]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.400605]  ? do_syscall_64+0x71/0x90
[  285.401124]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.401807] RIP: 0033:0x495726
[  285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[  285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[  285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[  285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[  285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[  285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[  285.410403]  </TASK>
[  285.410704]
[  285.410929] Allocated by task 144:
[  285.411402]  kasan_save_stack+0x1e/0x40
[  285.411926]  kasan_set_track+0x21/0x30
[  285.412442]  __kasan_slab_alloc+0x55/0x70
[  285.412973]  kmem_cache_alloc_node+0x187/0x3d0
[  285.413567]  __alloc_skb+0x1b4/0x230
[  285.414060]  __ip_append_data+0x17f7/0x1b60
[  285.414633]  ip_append_data+0x97/0xf0
[  285.415144]  raw_sendmsg+0x5a8/0x12d0
[  285.415640]  sock_sendmsg+0xcc/0xe0
[  285.416117]  __sys_sendto+0x190/0x230
[  285.416626]  __x64_sys_sendto+0x74/0x90
[  285.417145]  do_syscall_64+0x64/0x90
[  285.417624]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.418306]
[  285.418531] Freed by task 144:
[  285.418960]  kasan_save_stack+0x1e/0x40
[  285.419469]  kasan_set_track+0x21/0x30
[  285.419988]  kasan_save_free_info+0x27/0x40
[  285.420556]  ____kasan_slab_free+0x109/0x1a0
[  285.421146]  kmem_cache_free+0x1c2/0x450
[  285.421680]  __netif_receive_skb_core+0x2ce/0x1870
[  285.422333]  __netif_receive_skb_one_core+0x97/0x140
[  285.423003]  process_backlog+0x100/0x2f0
[  285.423537]  __napi_poll+0x5c/0x2d0
[  285.424023]  net_rx_action+0x2be/0x560
[  285.424510]  __do_softirq+0x11b/0x3de
[  285.425034]
[  285.425254] The buggy address belongs to the object at ffff8880bad31280
[  285.425254]  which belongs to the cache skbuff_head_cache of size 224
[  285.426993] The buggy address is located 40 bytes inside of
[  285.426993]  freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[  285.428572]
[  285.428798] The buggy address belongs to the physical page:
[  285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[  285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[  285.431447] page_type: 0xffffffff()
[  285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
[  285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[  285.433562] page dumped because: kasan: bad access detected
[  285.434144]
[  285.434320] Memory state around the buggy address:
[  285.434828]  ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.435580]  ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  285.436777]                                   ^
[  285.437106]  ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  285.437616]  ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.438126] ==================================================================
[  285.438662] Disabling lock debugging due to kernel taint

Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
	Reported-by: valis <[email protected]>
	Signed-off-by: valis <[email protected]>
	Signed-off-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit 8fc134f)
	Signed-off-by: Greg Rose <[email protected]>
gvrose8192 added a commit that referenced this pull request Jan 14, 2025
jira VULN-6730
cve CVE-2023-4921
commit-author valis <[email protected]>
commit 8fc134f

When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:

  tc qdisc add dev lo root handle 1: qfq
  tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
  tc qdisc add dev lo parent 1:1 handle 2: plug
  tc filter add dev lo parent 1: basic classid 1:1
  ping -c1 127.0.0.1

and boom:

[  285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[  285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[  285.355903]
[  285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[  285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  285.358376] Call Trace:
[  285.358773]  <IRQ>
[  285.359109]  dump_stack_lvl+0x44/0x60
[  285.359708]  print_address_description.constprop.0+0x2c/0x3c0
[  285.360611]  kasan_report+0x10c/0x120
[  285.361195]  ? qfq_dequeue+0xa7/0x7f0
[  285.361780]  qfq_dequeue+0xa7/0x7f0
[  285.362342]  __qdisc_run+0xf1/0x970
[  285.362903]  net_tx_action+0x28e/0x460
[  285.363502]  __do_softirq+0x11b/0x3de
[  285.364097]  do_softirq.part.0+0x72/0x90
[  285.364721]  </IRQ>
[  285.365072]  <TASK>
[  285.365422]  __local_bh_enable_ip+0x77/0x90
[  285.366079]  __dev_queue_xmit+0x95f/0x1550
[  285.366732]  ? __pfx_csum_and_copy_from_iter+0x10/0x10
[  285.367526]  ? __pfx___dev_queue_xmit+0x10/0x10
[  285.368259]  ? __build_skb_around+0x129/0x190
[  285.368960]  ? ip_generic_getfrag+0x12c/0x170
[  285.369653]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  285.370390]  ? csum_partial+0x8/0x20
[  285.370961]  ? raw_getfrag+0xe5/0x140
[  285.371559]  ip_finish_output2+0x539/0xa40
[  285.372222]  ? __pfx_ip_finish_output2+0x10/0x10
[  285.372954]  ip_output+0x113/0x1e0
[  285.373512]  ? __pfx_ip_output+0x10/0x10
[  285.374130]  ? icmp_out_count+0x49/0x60
[  285.374739]  ? __pfx_ip_finish_output+0x10/0x10
[  285.375457]  ip_push_pending_frames+0xf3/0x100
[  285.376173]  raw_sendmsg+0xef5/0x12d0
[  285.376760]  ? do_syscall_64+0x40/0x90
[  285.377359]  ? __static_call_text_end+0x136578/0x136578
[  285.378173]  ? do_syscall_64+0x40/0x90
[  285.378772]  ? kasan_enable_current+0x11/0x20
[  285.379469]  ? __pfx_raw_sendmsg+0x10/0x10
[  285.380137]  ? __sock_create+0x13e/0x270
[  285.380673]  ? __sys_socket+0xf3/0x180
[  285.381174]  ? __x64_sys_socket+0x3d/0x50
[  285.381725]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.382425]  ? __rcu_read_unlock+0x48/0x70
[  285.382975]  ? ip4_datagram_release_cb+0xd8/0x380
[  285.383608]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[  285.384295]  ? preempt_count_sub+0x14/0xc0
[  285.384844]  ? __list_del_entry_valid+0x76/0x140
[  285.385467]  ? _raw_spin_lock_bh+0x87/0xe0
[  285.386014]  ? __pfx__raw_spin_lock_bh+0x10/0x10
[  285.386645]  ? release_sock+0xa0/0xd0
[  285.387148]  ? preempt_count_sub+0x14/0xc0
[  285.387712]  ? freeze_secondary_cpus+0x348/0x3c0
[  285.388341]  ? aa_sk_perm+0x177/0x390
[  285.388856]  ? __pfx_aa_sk_perm+0x10/0x10
[  285.389441]  ? check_stack_object+0x22/0x70
[  285.390032]  ? inet_send_prepare+0x2f/0x120
[  285.390603]  ? __pfx_inet_sendmsg+0x10/0x10
[  285.391172]  sock_sendmsg+0xcc/0xe0
[  285.391667]  __sys_sendto+0x190/0x230
[  285.392168]  ? __pfx___sys_sendto+0x10/0x10
[  285.392727]  ? kvm_clock_get_cycles+0x14/0x30
[  285.393328]  ? set_normalized_timespec64+0x57/0x70
[  285.393980]  ? _raw_spin_unlock_irq+0x1b/0x40
[  285.394578]  ? __x64_sys_clock_gettime+0x11c/0x160
[  285.395225]  ? __pfx___x64_sys_clock_gettime+0x10/0x10
[  285.395908]  ? _copy_to_user+0x3e/0x60
[  285.396432]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.397086]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.397734]  ? do_syscall_64+0x71/0x90
[  285.398258]  __x64_sys_sendto+0x74/0x90
[  285.398786]  do_syscall_64+0x64/0x90
[  285.399273]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.399949]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.400605]  ? do_syscall_64+0x71/0x90
[  285.401124]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.401807] RIP: 0033:0x495726
[  285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[  285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[  285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[  285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[  285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[  285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[  285.410403]  </TASK>
[  285.410704]
[  285.410929] Allocated by task 144:
[  285.411402]  kasan_save_stack+0x1e/0x40
[  285.411926]  kasan_set_track+0x21/0x30
[  285.412442]  __kasan_slab_alloc+0x55/0x70
[  285.412973]  kmem_cache_alloc_node+0x187/0x3d0
[  285.413567]  __alloc_skb+0x1b4/0x230
[  285.414060]  __ip_append_data+0x17f7/0x1b60
[  285.414633]  ip_append_data+0x97/0xf0
[  285.415144]  raw_sendmsg+0x5a8/0x12d0
[  285.415640]  sock_sendmsg+0xcc/0xe0
[  285.416117]  __sys_sendto+0x190/0x230
[  285.416626]  __x64_sys_sendto+0x74/0x90
[  285.417145]  do_syscall_64+0x64/0x90
[  285.417624]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.418306]
[  285.418531] Freed by task 144:
[  285.418960]  kasan_save_stack+0x1e/0x40
[  285.419469]  kasan_set_track+0x21/0x30
[  285.419988]  kasan_save_free_info+0x27/0x40
[  285.420556]  ____kasan_slab_free+0x109/0x1a0
[  285.421146]  kmem_cache_free+0x1c2/0x450
[  285.421680]  __netif_receive_skb_core+0x2ce/0x1870
[  285.422333]  __netif_receive_skb_one_core+0x97/0x140
[  285.423003]  process_backlog+0x100/0x2f0
[  285.423537]  __napi_poll+0x5c/0x2d0
[  285.424023]  net_rx_action+0x2be/0x560
[  285.424510]  __do_softirq+0x11b/0x3de
[  285.425034]
[  285.425254] The buggy address belongs to the object at ffff8880bad31280
[  285.425254]  which belongs to the cache skbuff_head_cache of size 224
[  285.426993] The buggy address is located 40 bytes inside of
[  285.426993]  freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[  285.428572]
[  285.428798] The buggy address belongs to the physical page:
[  285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[  285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[  285.431447] page_type: 0xffffffff()
[  285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
[  285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[  285.433562] page dumped because: kasan: bad access detected
[  285.434144]
[  285.434320] Memory state around the buggy address:
[  285.434828]  ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.435580]  ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  285.436777]                                   ^
[  285.437106]  ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  285.437616]  ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.438126] ==================================================================
[  285.438662] Disabling lock debugging due to kernel taint

Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
	Reported-by: valis <[email protected]>
	Signed-off-by: valis <[email protected]>
	Signed-off-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit 8fc134f)
	Signed-off-by: Greg Rose <[email protected]>
pvts-mat pushed a commit to pvts-mat/kernel-src-tree that referenced this pull request Jan 14, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2
commit-author minoura makoto <[email protected]>
commit b18cba0

Commit 9130b8d ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.

When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for.
Moreover, it might return a msg which is not sent to rpc.gssd yet.

We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel.  The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.

PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
 #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
 ctrliq#1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
 ctrliq#2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
 ctrliq#3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
[sunrpc]
 ctrliq#4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
 ctrliq#5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
 ctrliq#6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
 ctrliq#7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
 ctrliq#8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
 ctrliq#9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]

The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.

When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall.  And the process waiting for the msg
corresponding to service A will be woken up.

Actual scheduing of that process might be after rpc.gssd processes the
next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.

This patch adds a simple check to ensure that a msg which is not
sent to rpc.gssd yet is not chosen as the matching upcall upon
receiving a downcall.

	Signed-off-by: minoura makoto <[email protected]>
	Signed-off-by: Hiroshi Shimamoto <[email protected]>
	Tested-by: Hiroshi Shimamoto <[email protected]>
	Cc: Trond Myklebust <[email protected]>
Fixes: 9130b8d ("SUNRPC: allow for upcalls for same uid but different gss service")
	Signed-off-by: Trond Myklebust <[email protected]>
(cherry picked from commit b18cba0)
	Signed-off-by: Jonathan Maple <[email protected]>
pvts-mat pushed a commit to pvts-mat/kernel-src-tree that referenced this pull request Jan 14, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2
commit-author Stefan Assmann <[email protected]>
commit 4e264be

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  ctrliq#1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  ctrliq#2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  ctrliq#3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  ctrliq#4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  ctrliq#5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  ctrliq#6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  ctrliq#7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  ctrliq#8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  ctrliq#9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 ctrliq#10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 ctrliq#11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 ctrliq#12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 ctrliq#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 ctrliq#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 ctrliq#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 ctrliq#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 ctrliq#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 ctrliq#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 ctrliq#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 ctrliq#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 ctrliq#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 ctrliq#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
	Reported-by: Marius Cornea <[email protected]>
	Signed-off-by: Stefan Assmann <[email protected]>
	Reviewed-by: Michal Kubiak <[email protected]>
	Tested-by: Rafal Romanowski <[email protected]>
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit 4e264be)
	Signed-off-by: Jonathan Maple <[email protected]>
pvts-mat pushed a commit to pvts-mat/kernel-src-tree that referenced this pull request Jan 14, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2
commit-author Michael Ellerman <[email protected]>
commit 6d65028

As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit ce7d805 ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").

DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.

The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  ctrliq#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  ctrliq#2  0x00007fffffffd960 in ?? ()
  ctrliq#3  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  Backtrace stopped: frame did not save the PC

Alan helpfully describes some rules for correctly maintaining the CFI information:

  1) Every adjustment to the current frame address reg (ie. r1) must be
     described, and exactly at the instruction where r1 changes. Why?
     Because stack unwinding might want to access previous frames.

  2) If a function changes LR or any non-volatile register, the save
     location for those regs must be given. The CFI can be at any
     instruction after the saves up to the point that the reg is
     changed.
     (Exception: LR save should be described before a bl. not after)

  3) If asychronous unwind info is needed then restores of LR and
     non-volatile regs must also be described. The CFI can be at any
     instruction after the reg is restored up to the point where the
     save location is (potentially) trashed.

Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.

Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.

Finally, add CFI directives describing the save/restore of r2.

With the fix gdb can correctly back trace and navigate up and down the stack:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  ctrliq#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  ctrliq#2  0x0000000100015b60 in gettime ()
  ctrliq#3  0x000000010000c8bc in print_long_format ()
  ctrliq#4  0x000000010000d180 in print_current_files ()
  ctrliq#5  0x00000001000054ac in main ()
  (gdb) up
  ctrliq#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  ctrliq#2  0x0000000100015b60 in gettime ()
  (gdb)
  ctrliq#3  0x000000010000c8bc in print_long_format ()
  (gdb)
  ctrliq#4  0x000000010000d180 in print_current_files ()
  (gdb)
  ctrliq#5  0x00000001000054ac in main ()
  (gdb)
  Initial frame selected; you cannot go up.
  (gdb) down
  ctrliq#4  0x000000010000d180 in print_current_files ()
  (gdb)
  ctrliq#3  0x000000010000c8bc in print_long_format ()
  (gdb)
  ctrliq#2  0x0000000100015b60 in gettime ()
  (gdb)
  ctrliq#1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb)

Fixes: ce7d805 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
	Cc: [email protected] # v5.11+
	Reported-by: Alan Modra <[email protected]>
	Signed-off-by: Michael Ellerman <[email protected]>
	Reviewed-by: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 6d65028)
	Signed-off-by: Jonathan Maple <[email protected]>
pvts-mat pushed a commit to pvts-mat/kernel-src-tree that referenced this pull request Jan 14, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2
commit-author Eelco Chaudron <[email protected]>
commit de9df6c

Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.

Here is an example:

  PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
   #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
   ctrliq#1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
   ctrliq#2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
   ctrliq#3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
   ctrliq#4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
   ctrliq#5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
   ctrliq#6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
   ctrliq#7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
   ...

  PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
   #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
   ctrliq#1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
   ctrliq#2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
   ctrliq#3 [ffff80000a5d3120] die at ffffb70f0628234c
   ctrliq#4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
   ctrliq#5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
   ctrliq#6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
   ctrliq#7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
   ctrliq#8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
   ctrliq#9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
  ctrliq#10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
  ctrliq#11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
  ctrliq#12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
  ctrliq#13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
  ctrliq#14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
  ctrliq#15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
  ctrliq#16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
  ctrliq#17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.

Fixes: 95637d9 ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea3 ("net: openvswitch: Add support to count upcall packets")
	Signed-off-by: Eelco Chaudron <[email protected]>
	Reviewed-by: Simon Horman <[email protected]>
	Acked-by: Aaron Conole <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit de9df6c)
	Signed-off-by: Jonathan Maple <[email protected]>
gvrose8192 added a commit that referenced this pull request Jan 16, 2025
jira VULN-6730
cve CVE-2023-4921
commit-author valis <[email protected]>
commit 8fc134f

When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:

  tc qdisc add dev lo root handle 1: qfq
  tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
  tc qdisc add dev lo parent 1:1 handle 2: plug
  tc filter add dev lo parent 1: basic classid 1:1
  ping -c1 127.0.0.1

and boom:

[  285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[  285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[  285.355903]
[  285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[  285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  285.358376] Call Trace:
[  285.358773]  <IRQ>
[  285.359109]  dump_stack_lvl+0x44/0x60
[  285.359708]  print_address_description.constprop.0+0x2c/0x3c0
[  285.360611]  kasan_report+0x10c/0x120
[  285.361195]  ? qfq_dequeue+0xa7/0x7f0
[  285.361780]  qfq_dequeue+0xa7/0x7f0
[  285.362342]  __qdisc_run+0xf1/0x970
[  285.362903]  net_tx_action+0x28e/0x460
[  285.363502]  __do_softirq+0x11b/0x3de
[  285.364097]  do_softirq.part.0+0x72/0x90
[  285.364721]  </IRQ>
[  285.365072]  <TASK>
[  285.365422]  __local_bh_enable_ip+0x77/0x90
[  285.366079]  __dev_queue_xmit+0x95f/0x1550
[  285.366732]  ? __pfx_csum_and_copy_from_iter+0x10/0x10
[  285.367526]  ? __pfx___dev_queue_xmit+0x10/0x10
[  285.368259]  ? __build_skb_around+0x129/0x190
[  285.368960]  ? ip_generic_getfrag+0x12c/0x170
[  285.369653]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  285.370390]  ? csum_partial+0x8/0x20
[  285.370961]  ? raw_getfrag+0xe5/0x140
[  285.371559]  ip_finish_output2+0x539/0xa40
[  285.372222]  ? __pfx_ip_finish_output2+0x10/0x10
[  285.372954]  ip_output+0x113/0x1e0
[  285.373512]  ? __pfx_ip_output+0x10/0x10
[  285.374130]  ? icmp_out_count+0x49/0x60
[  285.374739]  ? __pfx_ip_finish_output+0x10/0x10
[  285.375457]  ip_push_pending_frames+0xf3/0x100
[  285.376173]  raw_sendmsg+0xef5/0x12d0
[  285.376760]  ? do_syscall_64+0x40/0x90
[  285.377359]  ? __static_call_text_end+0x136578/0x136578
[  285.378173]  ? do_syscall_64+0x40/0x90
[  285.378772]  ? kasan_enable_current+0x11/0x20
[  285.379469]  ? __pfx_raw_sendmsg+0x10/0x10
[  285.380137]  ? __sock_create+0x13e/0x270
[  285.380673]  ? __sys_socket+0xf3/0x180
[  285.381174]  ? __x64_sys_socket+0x3d/0x50
[  285.381725]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.382425]  ? __rcu_read_unlock+0x48/0x70
[  285.382975]  ? ip4_datagram_release_cb+0xd8/0x380
[  285.383608]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[  285.384295]  ? preempt_count_sub+0x14/0xc0
[  285.384844]  ? __list_del_entry_valid+0x76/0x140
[  285.385467]  ? _raw_spin_lock_bh+0x87/0xe0
[  285.386014]  ? __pfx__raw_spin_lock_bh+0x10/0x10
[  285.386645]  ? release_sock+0xa0/0xd0
[  285.387148]  ? preempt_count_sub+0x14/0xc0
[  285.387712]  ? freeze_secondary_cpus+0x348/0x3c0
[  285.388341]  ? aa_sk_perm+0x177/0x390
[  285.388856]  ? __pfx_aa_sk_perm+0x10/0x10
[  285.389441]  ? check_stack_object+0x22/0x70
[  285.390032]  ? inet_send_prepare+0x2f/0x120
[  285.390603]  ? __pfx_inet_sendmsg+0x10/0x10
[  285.391172]  sock_sendmsg+0xcc/0xe0
[  285.391667]  __sys_sendto+0x190/0x230
[  285.392168]  ? __pfx___sys_sendto+0x10/0x10
[  285.392727]  ? kvm_clock_get_cycles+0x14/0x30
[  285.393328]  ? set_normalized_timespec64+0x57/0x70
[  285.393980]  ? _raw_spin_unlock_irq+0x1b/0x40
[  285.394578]  ? __x64_sys_clock_gettime+0x11c/0x160
[  285.395225]  ? __pfx___x64_sys_clock_gettime+0x10/0x10
[  285.395908]  ? _copy_to_user+0x3e/0x60
[  285.396432]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.397086]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.397734]  ? do_syscall_64+0x71/0x90
[  285.398258]  __x64_sys_sendto+0x74/0x90
[  285.398786]  do_syscall_64+0x64/0x90
[  285.399273]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.399949]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.400605]  ? do_syscall_64+0x71/0x90
[  285.401124]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.401807] RIP: 0033:0x495726
[  285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[  285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[  285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[  285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[  285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[  285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[  285.410403]  </TASK>
[  285.410704]
[  285.410929] Allocated by task 144:
[  285.411402]  kasan_save_stack+0x1e/0x40
[  285.411926]  kasan_set_track+0x21/0x30
[  285.412442]  __kasan_slab_alloc+0x55/0x70
[  285.412973]  kmem_cache_alloc_node+0x187/0x3d0
[  285.413567]  __alloc_skb+0x1b4/0x230
[  285.414060]  __ip_append_data+0x17f7/0x1b60
[  285.414633]  ip_append_data+0x97/0xf0
[  285.415144]  raw_sendmsg+0x5a8/0x12d0
[  285.415640]  sock_sendmsg+0xcc/0xe0
[  285.416117]  __sys_sendto+0x190/0x230
[  285.416626]  __x64_sys_sendto+0x74/0x90
[  285.417145]  do_syscall_64+0x64/0x90
[  285.417624]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.418306]
[  285.418531] Freed by task 144:
[  285.418960]  kasan_save_stack+0x1e/0x40
[  285.419469]  kasan_set_track+0x21/0x30
[  285.419988]  kasan_save_free_info+0x27/0x40
[  285.420556]  ____kasan_slab_free+0x109/0x1a0
[  285.421146]  kmem_cache_free+0x1c2/0x450
[  285.421680]  __netif_receive_skb_core+0x2ce/0x1870
[  285.422333]  __netif_receive_skb_one_core+0x97/0x140
[  285.423003]  process_backlog+0x100/0x2f0
[  285.423537]  __napi_poll+0x5c/0x2d0
[  285.424023]  net_rx_action+0x2be/0x560
[  285.424510]  __do_softirq+0x11b/0x3de
[  285.425034]
[  285.425254] The buggy address belongs to the object at ffff8880bad31280
[  285.425254]  which belongs to the cache skbuff_head_cache of size 224
[  285.426993] The buggy address is located 40 bytes inside of
[  285.426993]  freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[  285.428572]
[  285.428798] The buggy address belongs to the physical page:
[  285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[  285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[  285.431447] page_type: 0xffffffff()
[  285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
[  285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[  285.433562] page dumped because: kasan: bad access detected
[  285.434144]
[  285.434320] Memory state around the buggy address:
[  285.434828]  ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.435580]  ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  285.436777]                                   ^
[  285.437106]  ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  285.437616]  ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.438126] ==================================================================
[  285.438662] Disabling lock debugging due to kernel taint

Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
	Reported-by: valis <[email protected]>
	Signed-off-by: valis <[email protected]>
	Signed-off-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit 8fc134f)
	Signed-off-by: Greg Rose <[email protected]>
PlaidCat added a commit that referenced this pull request Sep 4, 2025
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Li Lingfeng <[email protected]>
commit b313a8c
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/b313a8c8.failed

Lockdep reported a warning in Linux version 6.6:

[  414.344659] ================================
[  414.345155] WARNING: inconsistent lock state
[  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[  414.346221] --------------------------------
[  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.351204] {IN-SOFTIRQ-W} state was registered at:
[  414.351751]   lock_acquire+0x18d/0x460
[  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
[  414.352769]   __wake_up_common_lock+0x22/0x60
[  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
[  414.353829]   sbitmap_queue_clear+0xdd/0x270
[  414.354338]   blk_mq_put_tag+0xdf/0x170
[  414.354807]   __blk_mq_free_request+0x381/0x4d0
[  414.355335]   blk_mq_free_request+0x28b/0x3e0
[  414.355847]   __blk_mq_end_request+0x242/0xc30
[  414.356367]   scsi_end_request+0x2c1/0x830
[  414.345155] WARNING: inconsistent lock state
[  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[  414.346221] --------------------------------
[  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.351204] {IN-SOFTIRQ-W} state was registered at:
[  414.351751]   lock_acquire+0x18d/0x460
[  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
[  414.352769]   __wake_up_common_lock+0x22/0x60
[  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
[  414.353829]   sbitmap_queue_clear+0xdd/0x270
[  414.354338]   blk_mq_put_tag+0xdf/0x170
[  414.354807]   __blk_mq_free_request+0x381/0x4d0
[  414.355335]   blk_mq_free_request+0x28b/0x3e0
[  414.355847]   __blk_mq_end_request+0x242/0xc30
[  414.356367]   scsi_end_request+0x2c1/0x830
[  414.356863]   scsi_io_completion+0x177/0x1610
[  414.357379]   scsi_complete+0x12f/0x260
[  414.357856]   blk_complete_reqs+0xba/0xf0
[  414.358338]   __do_softirq+0x1b0/0x7a2
[  414.358796]   irq_exit_rcu+0x14b/0x1a0
[  414.359262]   sysvec_call_function_single+0xaf/0xc0
[  414.359828]   asm_sysvec_call_function_single+0x1a/0x20
[  414.360426]   default_idle+0x1e/0x30
[  414.360873]   default_idle_call+0x9b/0x1f0
[  414.361390]   do_idle+0x2d2/0x3e0
[  414.361819]   cpu_startup_entry+0x55/0x60
[  414.362314]   start_secondary+0x235/0x2b0
[  414.362809]   secondary_startup_64_no_verify+0x18f/0x19b
[  414.363413] irq event stamp: 428794
[  414.363825] hardirqs last  enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200
[  414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50
[  414.365629] softirqs last  enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2
[  414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0
[  414.367425]
               other info that might help us debug this:
[  414.368194]  Possible unsafe locking scenario:
[  414.368900]        CPU0
[  414.369225]        ----
[  414.369548]   lock(&sbq->ws[i].wait);
[  414.370000]   <Interrupt>
[  414.370342]     lock(&sbq->ws[i].wait);
[  414.370802]
                *** DEADLOCK ***
[  414.371569] 5 locks held by kworker/u10:3/1152:
[  414.372088]  #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0
[  414.373180]  #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0
[  414.374384]  #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00
[  414.375342]  #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.376377]  #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0
[  414.378607]
               stack backtrace:
[  414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6
[  414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  414.381177] Workqueue: writeback wb_workfn (flush-253:0)
[  414.381805] Call Trace:
[  414.382136]  <TASK>
[  414.382429]  dump_stack_lvl+0x91/0xf0
[  414.382884]  mark_lock_irq+0xb3b/0x1260
[  414.383367]  ? __pfx_mark_lock_irq+0x10/0x10
[  414.383889]  ? stack_trace_save+0x8e/0xc0
[  414.384373]  ? __pfx_stack_trace_save+0x10/0x10
[  414.384903]  ? graph_lock+0xcf/0x410
[  414.385350]  ? save_trace+0x3d/0xc70
[  414.385808]  mark_lock.part.20+0x56d/0xa90
[  414.386317]  mark_held_locks+0xb0/0x110
[  414.386791]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  414.387320]  lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.387901]  ? _raw_spin_unlock_irq+0x28/0x50
[  414.388422]  trace_hardirqs_on+0x58/0x100
[  414.388917]  _raw_spin_unlock_irq+0x28/0x50
[  414.389422]  __blk_mq_tag_busy+0x1d6/0x2a0
[  414.389920]  __blk_mq_get_driver_tag+0x761/0x9f0
[  414.390899]  blk_mq_dispatch_rq_list+0x1780/0x1ee0
[  414.391473]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
[  414.392070]  ? sbitmap_get+0x2b8/0x450
[  414.392533]  ? __blk_mq_get_driver_tag+0x210/0x9f0
[  414.393095]  __blk_mq_sched_dispatch_requests+0xd99/0x1690
[  414.393730]  ? elv_attempt_insert_merge+0x1b1/0x420
[  414.394302]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
[  414.394970]  ? lock_acquire+0x18d/0x460
[  414.395456]  ? blk_mq_run_hw_queue+0x637/0xa00
[  414.395986]  ? __pfx_lock_acquire+0x10/0x10
[  414.396499]  blk_mq_sched_dispatch_requests+0x109/0x190
[  414.397100]  blk_mq_run_hw_queue+0x66e/0xa00
[  414.397616]  blk_mq_flush_plug_list.part.17+0x614/0x2030
[  414.398244]  ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10
[  414.398897]  ? writeback_sb_inodes+0x241/0xcc0
[  414.399429]  blk_mq_flush_plug_list+0x65/0x80
[  414.399957]  __blk_flush_plug+0x2f1/0x530
[  414.400458]  ? __pfx___blk_flush_plug+0x10/0x10
[  414.400999]  blk_finish_plug+0x59/0xa0
[  414.401467]  wb_writeback+0x7cc/0x920
[  414.401935]  ? __pfx_wb_writeback+0x10/0x10
[  414.402442]  ? mark_held_locks+0xb0/0x110
[  414.402931]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  414.403462]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.404062]  wb_workfn+0x2b3/0xcf0
[  414.404500]  ? __pfx_wb_workfn+0x10/0x10
[  414.404989]  process_scheduled_works+0x432/0x13f0
[  414.405546]  ? __pfx_process_scheduled_works+0x10/0x10
[  414.406139]  ? do_raw_spin_lock+0x101/0x2a0
[  414.406641]  ? assign_work+0x19b/0x240
[  414.407106]  ? lock_is_held_type+0x9d/0x110
[  414.407604]  worker_thread+0x6f2/0x1160
[  414.408075]  ? __kthread_parkme+0x62/0x210
[  414.408572]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.409168]  ? __kthread_parkme+0x13c/0x210
[  414.409678]  ? __pfx_worker_thread+0x10/0x10
[  414.410191]  kthread+0x33c/0x440
[  414.410602]  ? __pfx_kthread+0x10/0x10
[  414.411068]  ret_from_fork+0x4d/0x80
[  414.411526]  ? __pfx_kthread+0x10/0x10
[  414.411993]  ret_from_fork_asm+0x1b/0x30
[  414.412489]  </TASK>

When interrupt is turned on while a lock holding by spin_lock_irq it
throws a warning because of potential deadlock.

blk_mq_prep_dispatch_rq
 blk_mq_get_driver_tag
  __blk_mq_get_driver_tag
   __blk_mq_alloc_driver_tag
    blk_mq_tag_busy -> tag is already busy
    // failed to get driver tag
 blk_mq_mark_tag_wait
  spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait)
  __add_wait_queue(wq, wait) -> wait queue active
  blk_mq_get_driver_tag
  __blk_mq_tag_busy
-> 1) tag must be idle, which means there can't be inflight IO
   spin_lock_irq(&tags->lock) -> lock B (hctx->tags)
   spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally
-> 2) context must be preempt by IO interrupt to trigger deadlock.

As shown above, the deadlock is not possible in theory, but the warning
still need to be fixed.

Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.

Fixes: 4f1731d ("blk-mq: fix potential io hang by wrong 'wake_batch'")
	Signed-off-by: Li Lingfeng <[email protected]>
	Reviewed-by: Ming Lei <[email protected]>
	Reviewed-by: Yu Kuai <[email protected]>
	Reviewed-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jens Axboe <[email protected]>
(cherry picked from commit b313a8c)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	block/blk-mq-tag.c
github-actions bot pushed a commit that referenced this pull request Sep 5, 2025
… context

JIRA: https://issues.redhat.com/browse/RHEL-102692

commit 9ca7a42
Author: Sudeep Holla <[email protected]>
Date: Wed, 28 May 2025 09:49:43 +0100

    The current use of a mutex to protect the notifier hashtable accesses
    can lead to issues in the atomic context. It results in the below
    kernel warnings:

      |  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:258
      |  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 9, name: kworker/0:0
      |  preempt_count: 1, expected: 0
      |  RCU nest depth: 0, expected: 0
      |  CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.14.0 #4
      |  Workqueue: ffa_pcpu_irq_notification notif_pcpu_irq_work_fn
      |  Call trace:
      |   show_stack+0x18/0x24 (C)
      |   dump_stack_lvl+0x78/0x90
      |   dump_stack+0x18/0x24
      |   __might_resched+0x114/0x170
      |   __might_sleep+0x48/0x98
      |   mutex_lock+0x24/0x80
      |   handle_notif_callbacks+0x54/0xe0
      |   notif_get_and_handle+0x40/0x88
      |   generic_exec_single+0x80/0xc0
      |   smp_call_function_single+0xfc/0x1a0
      |   notif_pcpu_irq_work_fn+0x2c/0x38
      |   process_one_work+0x14c/0x2b4
      |   worker_thread+0x2e4/0x3e0
      |   kthread+0x13c/0x210
      |   ret_from_fork+0x10/0x20

    To address this, replace the mutex with an rwlock to protect the notifier
    hashtable accesses. This ensures that read-side locking does not sleep and
    multiple readers can acquire the lock concurrently, avoiding unnecessary
    contention and potential deadlocks. Writer access remains exclusive,
    preserving correctness.

    This change resolves warnings from lockdep about potential sleep in
    atomic context.

    Cc: Jens Wiklander <[email protected]>
    Reported-by: Jérôme Forissier <[email protected]>
    Closes: OP-TEE/optee_os#7394
    Fixes: e057344 ("firmware: arm_ffa: Add interfaces to request notification callbacks")
    Message-Id: <[email protected]>
    Reviewed-by: Jens Wiklander <[email protected]>
    Tested-by: Jens Wiklander <[email protected]>
    Signed-off-by: Sudeep Holla <[email protected]>

Signed-off-by: Marcin Juszkiewicz <[email protected]>
Message-ID: <d2632a513f05ec79f4fffa92d618792eb3cea12f.1752653152.git.mjuszkiewicz@redhat.com>
Message-ID: <b4e3e2e1248ca9909bdea280d9009bdb659d0c81.1752659027.git.mjuszkiewicz@redhat.com>
github-actions bot pushed a commit that referenced this pull request Sep 7, 2025
… context

JIRA: https://issues.redhat.com/browse/RHEL-102691

commit 9ca7a42
Author: Sudeep Holla <[email protected]>
Date: Wed, 28 May 2025 09:49:43 +0100

    The current use of a mutex to protect the notifier hashtable accesses
    can lead to issues in the atomic context. It results in the below
    kernel warnings:

      |  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:258
      |  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 9, name: kworker/0:0
      |  preempt_count: 1, expected: 0
      |  RCU nest depth: 0, expected: 0
      |  CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.14.0 #4
      |  Workqueue: ffa_pcpu_irq_notification notif_pcpu_irq_work_fn
      |  Call trace:
      |   show_stack+0x18/0x24 (C)
      |   dump_stack_lvl+0x78/0x90
      |   dump_stack+0x18/0x24
      |   __might_resched+0x114/0x170
      |   __might_sleep+0x48/0x98
      |   mutex_lock+0x24/0x80
      |   handle_notif_callbacks+0x54/0xe0
      |   notif_get_and_handle+0x40/0x88
      |   generic_exec_single+0x80/0xc0
      |   smp_call_function_single+0xfc/0x1a0
      |   notif_pcpu_irq_work_fn+0x2c/0x38
      |   process_one_work+0x14c/0x2b4
      |   worker_thread+0x2e4/0x3e0
      |   kthread+0x13c/0x210
      |   ret_from_fork+0x10/0x20

    To address this, replace the mutex with an rwlock to protect the notifier
    hashtable accesses. This ensures that read-side locking does not sleep and
    multiple readers can acquire the lock concurrently, avoiding unnecessary
    contention and potential deadlocks. Writer access remains exclusive,
    preserving correctness.

    This change resolves warnings from lockdep about potential sleep in
    atomic context.

    Cc: Jens Wiklander <[email protected]>
    Reported-by: Jérôme Forissier <[email protected]>
    Closes: OP-TEE/optee_os#7394
    Fixes: e057344 ("firmware: arm_ffa: Add interfaces to request notification callbacks")
    Message-Id: <[email protected]>
    Reviewed-by: Jens Wiklander <[email protected]>
    Tested-by: Jens Wiklander <[email protected]>
    Signed-off-by: Sudeep Holla <[email protected]>

Signed-off-by: Marcin Juszkiewicz <[email protected]>
PlaidCat added a commit that referenced this pull request Sep 10, 2025
jira VULN-33329
jira VULN-33328
cve CVE-2022-48919
commit-author Ronnie Sahlberg <[email protected]>
commit 3d6cc98

When cifs_get_root() fails during cifs_smb3_do_mount() we call
deactivate_locked_super() which eventually will call delayed_free() which
will free the context.
In this situation we should not proceed to enter the out: section in
cifs_smb3_do_mount() and free the same resources a second time.

[Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0

[Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE     5.17.0-rc3+ #4
[Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
[Thu Feb 10 12:59:06 2022] Call Trace:
[Thu Feb 10 12:59:06 2022]  <IRQ>
[Thu Feb 10 12:59:06 2022]  dump_stack_lvl+0x5d/0x78
[Thu Feb 10 12:59:06 2022]  print_address_description.constprop.0+0x24/0x150
[Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  kasan_report.cold+0x7d/0x117
[Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  __asan_load8+0x86/0xa0
[Thu Feb 10 12:59:06 2022]  rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  rcu_core+0x547/0xca0
[Thu Feb 10 12:59:06 2022]  ? call_rcu+0x3c0/0x3c0
[Thu Feb 10 12:59:06 2022]  ? __this_cpu_preempt_check+0x13/0x20
[Thu Feb 10 12:59:06 2022]  ? lock_is_held_type+0xea/0x140
[Thu Feb 10 12:59:06 2022]  rcu_core_si+0xe/0x10
[Thu Feb 10 12:59:06 2022]  __do_softirq+0x1d4/0x67b
[Thu Feb 10 12:59:06 2022]  __irq_exit_rcu+0x100/0x150
[Thu Feb 10 12:59:06 2022]  irq_exit_rcu+0xe/0x30
[Thu Feb 10 12:59:06 2022]  sysvec_hyperv_stimer0+0x9d/0xc0
...
[Thu Feb 10 12:59:07 2022] Freed by task 58179:
[Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
[Thu Feb 10 12:59:07 2022]  kasan_set_track+0x25/0x30
[Thu Feb 10 12:59:07 2022]  kasan_set_free_info+0x24/0x40
[Thu Feb 10 12:59:07 2022]  ____kasan_slab_free+0x137/0x170
[Thu Feb 10 12:59:07 2022]  __kasan_slab_free+0x12/0x20
[Thu Feb 10 12:59:07 2022]  slab_free_freelist_hook+0xb3/0x1d0
[Thu Feb 10 12:59:07 2022]  kfree+0xcd/0x520
[Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0x149/0xbe0 [cifs]
[Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
[Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
[Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
[Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
[Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
[Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thu Feb 10 12:59:07 2022] Last potentially related work creation:
[Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
[Thu Feb 10 12:59:07 2022]  __kasan_record_aux_stack+0xb6/0xc0
[Thu Feb 10 12:59:07 2022]  kasan_record_aux_stack_noalloc+0xb/0x10
[Thu Feb 10 12:59:07 2022]  call_rcu+0x76/0x3c0
[Thu Feb 10 12:59:07 2022]  cifs_umount+0xce/0xe0 [cifs]
[Thu Feb 10 12:59:07 2022]  cifs_kill_sb+0xc8/0xe0 [cifs]
[Thu Feb 10 12:59:07 2022]  deactivate_locked_super+0x5d/0xd0
[Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0xab9/0xbe0 [cifs]
[Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
[Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
[Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
[Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
[Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
[Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae

	Reported-by: Shyam Prasad N <[email protected]>
	Reviewed-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Ronnie Sahlberg <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 3d6cc98)
	Signed-off-by: Jonathan Maple <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 16, 2025
JIRA: https://issues.redhat.com/browse/RHEL-106845

commit a9c83a0
Author: Jens Axboe <[email protected]>
Date:   Mon Dec 30 14:15:17 2024 -0700

    io_uring/timeout: flush timeouts outside of the timeout lock

    syzbot reports that a recent fix causes nesting issues between the (now)
    raw timeoutlock and the eventfd locking:

    =============================
    [ BUG: Invalid wait context ]
    6.13.0-rc4-00080-g9828a4c0901f #29 Not tainted
    -----------------------------
    kworker/u32:0/68094 is trying to lock:
    ffff000014d7a520 (&ctx->wqh#2){..-.}-{3:3}, at: eventfd_signal_mask+0x64/0x180
    other info that might help us debug this:
    context-{5:5}
    6 locks held by kworker/u32:0/68094:
     #0: ffff0000c1d98148 ((wq_completion)iou_exit){+.+.}-{0:0}, at: process_one_work+0x4e8/0xfc0
     #1: ffff80008d927c78 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x53c/0xfc0
     #2: ffff0000c59bc3d8 (&ctx->completion_lock){+.+.}-{3:3}, at: io_kill_timeouts+0x40/0x180
     #3: ffff0000c59bc358 (&ctx->timeout_lock){-.-.}-{2:2}, at: io_kill_timeouts+0x48/0x180
     #4: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38
     #5: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38
    stack backtrace:
    CPU: 7 UID: 0 PID: 68094 Comm: kworker/u32:0 Not tainted 6.13.0-rc4-00080-g9828a4c0901f #29
    Hardware name: linux,dummy-virt (DT)
    Workqueue: iou_exit io_ring_exit_work
    Call trace:
     show_stack+0x1c/0x30 (C)
     __dump_stack+0x24/0x30
     dump_stack_lvl+0x60/0x80
     dump_stack+0x14/0x20
     __lock_acquire+0x19f8/0x60c8
     lock_acquire+0x1a4/0x540
     _raw_spin_lock_irqsave+0x90/0xd0
     eventfd_signal_mask+0x64/0x180
     io_eventfd_signal+0x64/0x108
     io_req_local_work_add+0x294/0x430
     __io_req_task_work_add+0x1c0/0x270
     io_kill_timeout+0x1f0/0x288
     io_kill_timeouts+0xd4/0x180
     io_uring_try_cancel_requests+0x2e8/0x388
     io_ring_exit_work+0x150/0x550
     process_one_work+0x5e8/0xfc0
     worker_thread+0x7ec/0xc80
     kthread+0x24c/0x300
     ret_from_fork+0x10/0x20

    because after the preempt-rt fix for the timeout lock nesting inside
    the io-wq lock, we now have the eventfd spinlock nesting inside the
    raw timeout spinlock.

    Rather than play whack-a-mole with other nesting on the timeout lock,
    split the deletion and killing of timeouts so queueing the task_work
    for the timeout cancelations can get done outside of the timeout lock.

    Reported-by: [email protected]
    Fixes: 020b40f ("io_uring: make ctx->timeout_lock a raw spinlock")
    Signed-off-by: Jens Axboe <[email protected]>

Signed-off-by: Ming Lei <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 16, 2025
…xit()

JIRA: https://issues.redhat.com/browse/RHEL-106845

commit 78c2713
Author: Ming Lei <[email protected]>
Date:   Mon May 5 22:18:03 2025 +0800

    block: move wbt_enable_default() out of queue freezing from sched ->exit()

    scheduler's ->exit() is called with queue frozen and elevator lock is held, and
    wbt_enable_default() can't be called with queue frozen, otherwise the
    following lockdep warning is triggered:

            #6 (&q->rq_qos_mutex){+.+.}-{4:4}:
            #5 (&eq->sysfs_lock){+.+.}-{4:4}:
            #4 (&q->elevator_lock){+.+.}-{4:4}:
            #3 (&q->q_usage_counter(io)#3){++++}-{0:0}:
            #2 (fs_reclaim){+.+.}-{0:0}:
            #1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}:
            #0 (&q->debugfs_mutex){+.+.}-{4:4}:

    Fix the issue by moving wbt_enable_default() out of bfq's exit(), and
    call it from elevator_change_done().

    Meantime add disk->rqos_state_mutex for covering wbt state change, which
    matches the purpose more than ->elevator_lock.

    Reviewed-by: Hannes Reinecke <[email protected]>
    Reviewed-by: Nilay Shroff <[email protected]>
    Signed-off-by: Ming Lei <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>

Signed-off-by: Ming Lei <[email protected]>
PlaidCat added a commit that referenced this pull request Sep 17, 2025
jira LE-4159
Rebuild_History Non-Buildable kernel-5.14.0-570.41.1.el9_6
commit-author Dave Marquardt <[email protected]>
commit 053f3ff

v2:
- Created a single error handling unlock and exit in veth_pool_store
- Greatly expanded commit message with previous explanatory-only text

Summary: Use rtnl_mutex to synchronize veth_pool_store with itself,
ibmveth_close and ibmveth_open, preventing multiple calls in a row to
napi_disable.

Background: Two (or more) threads could call veth_pool_store through
writing to /sys/devices/vio/30000002/pool*/*. You can do this easily
with a little shell script. This causes a hang.

I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new
kernel. I ran this test again and saw:

    Setting pool0/active to 0
    Setting pool1/active to 1
    [   73.911067][ T4365] ibmveth 30000002 eth0: close starting
    Setting pool1/active to 1
    Setting pool1/active to 0
    [   73.911367][ T4366] ibmveth 30000002 eth0: close starting
    [   73.916056][ T4365] ibmveth 30000002 eth0: close complete
    [   73.916064][ T4365] ibmveth 30000002 eth0: open starting
    [  110.808564][  T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
    [  230.808495][  T712] systemd-journald[712]: Sent WATCHDOG=1 notification.
    [  243.683786][  T123] INFO: task stress.sh:4365 blocked for more than 122 seconds.
    [  243.683827][  T123]       Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8
    [  243.683833][  T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [  243.683838][  T123] task:stress.sh       state:D stack:28096 pid:4365  tgid:4365  ppid:4364   task_flags:0x400040 flags:0x00042000
    [  243.683852][  T123] Call Trace:
    [  243.683857][  T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable)
    [  243.683868][  T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0
    [  243.683878][  T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0
    [  243.683888][  T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210
    [  243.683896][  T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50
    [  243.683904][  T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0
    [  243.683913][  T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60
    [  243.683921][  T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc
    [  243.683928][  T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270
    [  243.683936][  T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0
    [  243.683944][  T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0
    [  243.683951][  T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650
    [  243.683958][  T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150
    [  243.683966][  T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340
    [  243.683973][  T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
    ...
    [  243.684087][  T123] Showing all locks held in the system:
    [  243.684095][  T123] 1 lock held by khungtaskd/123:
    [  243.684099][  T123]  #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248
    [  243.684114][  T123] 4 locks held by stress.sh/4365:
    [  243.684119][  T123]  #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
    [  243.684132][  T123]  #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0
    [  243.684143][  T123]  #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0
    [  243.684155][  T123]  #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60
    [  243.684166][  T123] 5 locks held by stress.sh/4366:
    [  243.684170][  T123]  #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150
    [  243.684183][  T123]  #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0
    [  243.684194][  T123]  #2: c0000000366f4ba8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0
    [  243.684205][  T123]  #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_disable+0x30/0x60
    [  243.684216][  T123]  #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x138/0x12a0

From the ibmveth debug, two threads are calling veth_pool_store, which
calls ibmveth_close and ibmveth_open. Here's the sequence:

  T4365             T4366
  ----------------- ----------------- ---------
  veth_pool_store   veth_pool_store
                    ibmveth_close
  ibmveth_close
  napi_disable
                    napi_disable
  ibmveth_open
  napi_enable                         <- HANG

ibmveth_close calls napi_disable at the top and ibmveth_open calls
napi_enable at the top.

https://docs.kernel.org/networking/napi.html]] says

  The control APIs are not idempotent. Control API calls are safe
  against concurrent use of datapath APIs but an incorrect sequence of
  control API calls may result in crashes, deadlocks, or race
  conditions. For example, calling napi_disable() multiple times in a
  row will deadlock.

In the normal open and close paths, rtnl_mutex is acquired to prevent
other callers. This is missing from veth_pool_store. Use rtnl_mutex in
veth_pool_store fixes these hangs.

	Signed-off-by: Dave Marquardt <[email protected]>
Fixes: 860f242 ("[PATCH] ibmveth change buffer pools dynamically")
	Reviewed-by: Nick Child <[email protected]>
	Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 053f3ff)
	Signed-off-by: Jonathan Maple <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 1, 2025
…mplaint

JIRA: https://issues.redhat.com/browse/RHEL-36251

commit bbaa6ff
Author: Kai-Heng Feng <[email protected]>
Date:   Wed Sep 13 11:32:33 2023 +0800

AMD PMF driver can cause the following warning:
[  196.159546] ------------[ cut here ]------------
[  196.159556] Voluntary context switch within RCU read-side critical section!
[  196.159571] WARNING: CPU: 0 PID: 9 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x43d/0x560
[  196.159604] Modules linked in: nvme_fabrics ccm rfcomm snd_hda_scodec_cs35l41_spi cmac algif_hash algif_skcipher af_alg bnep joydev btusb btrtl uvcvideo btintel btbcm videobuf2_vmalloc intel_rapl_msr btmtk videobuf2_memops uvc videobuf2_v4l2 intel_rapl_common binfmt_misc hid_sensor_als snd_sof_amd_vangogh hid_sensor_trigger bluetooth industrialio_triggered_buffer videodev snd_sof_amd_rembrandt hid_sensor_iio_common amdgpu ecdh_generic kfifo_buf videobuf2_common hp_wmi kvm_amd sparse_keymap snd_sof_amd_renoir wmi_bmof industrialio ecc mc nls_iso8859_1 kvm snd_sof_amd_acp irqbypass snd_sof_xtensa_dsp crct10dif_pclmul crc32_pclmul mt7921e snd_sof_pci snd_ctl_led polyval_clmulni mt7921_common polyval_generic snd_sof ghash_clmulni_intel mt792x_lib mt76_connac_lib sha512_ssse3 snd_sof_utils aesni_intel snd_hda_codec_realtek crypto_simd mt76 snd_hda_codec_generic cryptd snd_soc_core snd_hda_codec_hdmi rapl ledtrig_audio input_leds snd_compress i2c_algo_bit drm_ttm_helper mac80211 snd_pci_ps hid_multitouch ttm drm_exec
[  196.159970]  drm_suballoc_helper snd_rpl_pci_acp6x amdxcp drm_buddy snd_hda_intel snd_acp_pci snd_hda_scodec_cs35l41_i2c serio_raw gpu_sched snd_hda_scodec_cs35l41 snd_acp_legacy_common snd_intel_dspcfg snd_hda_cs_dsp_ctls snd_hda_codec libarc4 drm_display_helper snd_pci_acp6x cs_dsp snd_hwdep snd_soc_cs35l41_lib video k10temp snd_pci_acp5x thunderbolt snd_hda_core drm_kms_helper cfg80211 snd_seq snd_rn_pci_acp3x snd_pcm snd_acp_config cec snd_soc_acpi snd_seq_device rc_core ccp snd_pci_acp3x snd_timer snd soundcore wmi amd_pmf platform_profile amd_pmc mac_hid serial_multi_instantiate wireless_hotkey hid_sensor_hub sch_fq_codel msr parport_pc ppdev lp parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log cdc_ether usbnet r8152 mii hid_generic nvme i2c_hid_acpi i2c_hid nvme_core i2c_piix4 xhci_pci amd_sfh drm xhci_pci_renesas nvme_common hid
[  196.160382] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1 #4
[  196.160397] Hardware name: HP HP EliteBook 845 14 inch G10 Notebook PC/8B6E, BIOS V82 Ver. 01.02.00 08/24/2023
[  196.160405] Workqueue: events power_supply_changed_work
[  196.160426] RIP: 0010:rcu_note_context_switch+0x43d/0x560
[  196.160440] Code: 00 48 89 be 40 08 00 00 48 89 86 48 08 00 00 48 89 10 e9 63 fe ff ff 48 c7 c7 10 e7 b0 9e c6 05 e8 d8 20 02 01 e8 13 0f f3 ff <0f> 0b e9 27 fc ff ff a9 ff ff ff 7f 0f 84 cf fc ff ff 65 48 8b 3c
[  196.160450] RSP: 0018:ffffc900001878f0 EFLAGS: 00010046
[  196.160462] RAX: 0000000000000000 RBX: ffff88885e834040 RCX: 0000000000000000
[  196.160470] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  196.160476] RBP: ffffc90000187910 R08: 0000000000000000 R09: 0000000000000000
[  196.160482] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  196.160488] R13: 0000000000000000 R14: ffff888100990000 R15: ffff888100990000
[  196.160495] FS:  0000000000000000(0000) GS:ffff88885e800000(0000) knlGS:0000000000000000
[  196.160504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  196.160512] CR2: 000055cb053c8246 CR3: 000000013443a000 CR4: 0000000000750ef0
[  196.160520] PKRU: 55555554
[  196.160526] Call Trace:
[  196.160532]  <TASK>
[  196.160548]  ? show_regs+0x72/0x90
[  196.160570]  ? rcu_note_context_switch+0x43d/0x560
[  196.160580]  ? __warn+0x8d/0x160
[  196.160600]  ? rcu_note_context_switch+0x43d/0x560
[  196.160613]  ? report_bug+0x1bb/0x1d0
[  196.160637]  ? handle_bug+0x46/0x90
[  196.160658]  ? exc_invalid_op+0x19/0x80
[  196.160675]  ? asm_exc_invalid_op+0x1b/0x20
[  196.160709]  ? rcu_note_context_switch+0x43d/0x560
[  196.160727]  __schedule+0xb9/0x15f0
[  196.160746]  ? srso_alias_return_thunk+0x5/0x7f
[  196.160765]  ? srso_alias_return_thunk+0x5/0x7f
[  196.160778]  ? acpi_ns_search_one_scope+0xbe/0x270
[  196.160806]  schedule+0x68/0x110
[  196.160820]  schedule_timeout+0x151/0x160
[  196.160829]  ? srso_alias_return_thunk+0x5/0x7f
[  196.160842]  ? srso_alias_return_thunk+0x5/0x7f
[  196.160855]  ? acpi_ns_lookup+0x3c5/0xa90
[  196.160878]  __down_common+0xff/0x220
[  196.160905]  __down_timeout+0x16/0x30
[  196.160920]  down_timeout+0x64/0x70
[  196.160938]  acpi_os_wait_semaphore+0x85/0x200
[  196.160959]  acpi_ut_acquire_mutex+0x9e/0x280
[  196.160979]  acpi_ex_enter_interpreter+0x2d/0xb0
[  196.160992]  acpi_ns_evaluate+0x2f0/0x5f0
[  196.161005]  acpi_evaluate_object+0x172/0x490
[  196.161018]  ? acpi_os_signal_semaphore+0x8a/0xd0
[  196.161038]  acpi_evaluate_integer+0x52/0xe0
[  196.161055]  ? kfree+0x79/0x120
[  196.161071]  ? srso_alias_return_thunk+0x5/0x7f
[  196.161089]  acpi_ac_get_state.part.0+0x27/0x80
[  196.161110]  get_ac_property+0x5c/0x70
[  196.161127]  ? __pfx___power_supply_is_system_supplied+0x10/0x10
[  196.161146]  __power_supply_is_system_supplied+0x44/0xb0
[  196.161166]  class_for_each_device+0x124/0x160
[  196.161184]  ? acpi_ac_get_state.part.0+0x27/0x80
[  196.161203]  ? srso_alias_return_thunk+0x5/0x7f
[  196.161223]  power_supply_is_system_supplied+0x3c/0x70
[  196.161243]  amd_pmf_get_power_source+0xe/0x20 [amd_pmf]
[  196.161276]  amd_pmf_power_slider_update_event+0x49/0x90 [amd_pmf]
[  196.161310]  amd_pmf_pwr_src_notify_call+0xe7/0x100 [amd_pmf]
[  196.161340]  notifier_call_chain+0x5f/0xe0
[  196.161362]  atomic_notifier_call_chain+0x33/0x60
[  196.161378]  power_supply_changed_work+0x84/0x110
[  196.161394]  process_one_work+0x178/0x360
[  196.161412]  ? __pfx_worker_thread+0x10/0x10
[  196.161424]  worker_thread+0x307/0x430
[  196.161440]  ? __pfx_worker_thread+0x10/0x10
[  196.161451]  kthread+0xf4/0x130
[  196.161467]  ? __pfx_kthread+0x10/0x10
[  196.161486]  ret_from_fork+0x43/0x70
[  196.161502]  ? __pfx_kthread+0x10/0x10
[  196.161518]  ret_from_fork_asm+0x1b/0x30
[  196.161558]  </TASK>
[  196.161562] ---[ end trace 0000000000000000 ]---

Since there's no guarantee that all the callbacks can work in atomic
context, switch to use blocking_notifier_call_chain to relax the
constraint.

Signed-off-by: Kai-Heng Feng <[email protected]>
Reported-by: Allen Zhong <[email protected]>
Fixes: 4c71ae4 ("platform/x86/amd/pmf: Add support SPS PMF feature")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217571
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sebastian Reichel <[email protected]>
Signed-off-by: Mark Langsdorf <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 7, 2025
Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  #1 [3800313fb88] pci_remove_bus_device at c915105ba
  #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  #4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  #5 [3800313fca0] __zpci_event_availability at c90fb3dca
  #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  #7 [3800313fd60] crw_collect_info at c91905822
  #8 [3800313fe10] kthread at c90feb390
  #9 [3800313fe68] __ret_from_fork at c90f6aa64
  #10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Benjamin Block <[email protected]>
Reviewed-by: Farhan Ali <[email protected]>
Reviewed-by: Julian Ruess <[email protected]>
Cc: [email protected]
Link: https://patch.msgid.link/[email protected]
github-actions bot pushed a commit that referenced this pull request Oct 9, 2025
…uctions

JIRA: https://issues.redhat.com/browse/RHEL-78202

commit ff3afe5
Author: Peilin Ye <[email protected]>
Date:   Tue Mar 4 01:06:46 2025 +0000

    selftests/bpf: Add selftests for load-acquire and store-release instructions
    
    Add several ./test_progs tests:
    
      - arena_atomics/load_acquire
      - arena_atomics/store_release
      - verifier_load_acquire/*
      - verifier_store_release/*
      - verifier_precision/bpf_load_acquire
      - verifier_precision/bpf_store_release
    
    The last two tests are added to check if backtrack_insn() handles the
    new instructions correctly.
    
    Additionally, the last test also makes sure that the verifier
    "remembers" the value (in src_reg) we store-release into e.g. a stack
    slot.  For example, if we take a look at the test program:
    
        #0:  r1 = 8;
          /* store_release((u64 *)(r10 - 8), r1); */
        #1:  .8byte %[store_release];
        #2:  r1 = *(u64 *)(r10 - 8);
        #3:  r2 = r10;
        #4:  r2 += r1;
        #5:  r0 = 0;
        #6:  exit;
    
    At #1, if the verifier doesn't remember that we wrote 8 to the stack,
    then later at #4 we would be adding an unbounded scalar value to the
    stack pointer, which would cause the program to be rejected:
    
      VERIFIER LOG:
      =============
    ...
      math between fp pointer and register with unbounded min value is not allowed
    
    For easier CI integration, instead of using built-ins like
    __atomic_{load,store}_n() which depend on the new
    __BPF_FEATURE_LOAD_ACQ_STORE_REL pre-defined macro, manually craft
    load-acquire/store-release instructions using __imm_insn(), as suggested
    by Eduard.
    
    All new tests depend on:
    
      (1) Clang major version >= 18, and
      (2) ENABLE_ATOMICS_TESTS is defined (currently implies -mcpu=v3 or
          v4), and
      (3) JIT supports load-acquire/store-release (currently arm64 and
          x86-64)
    
    In .../progs/arena_atomics.c:
    
      /* 8-byte-aligned */
      __u8 __arena_global load_acquire8_value = 0x12;
      /* 1-byte hole */
      __u16 __arena_global load_acquire16_value = 0x1234;
    
    That 1-byte hole in the .addr_space.1 ELF section caused clang-17 to
    crash:
    
      fatal error: error in backend: unable to write nop sequence of 1 bytes
    
    To work around such llvm-17 CI job failures, conditionally define
    __arena_global variables as 64-bit if __clang_major__ < 18, to make sure
    .addr_space.1 has no holes.  Ideally we should avoid compiling this file
    using clang-17 at all (arena tests depend on
    __BPF_FEATURE_ADDR_SPACE_CAST, and are skipped for llvm-17 anyway), but
    that is a separate topic.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Peilin Ye <[email protected]>
    Link: https://lore.kernel.org/r/1b46c6feaf0f1b6984d9ec80e500cc7383e9da1a.1741049567.git.yepeilin@google.com
    Signed-off-by: Alexei Starovoitov <[email protected]>

Signed-off-by: Gregory Bell <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 9, 2025
The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.

Before:
```
$ perf test -vv 7
  7: PERF_RECORD_* events & perf_sample fields:
 --- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
 ------------------------------------------------------------
perf_event_attr:
  type                    	   0 (PERF_TYPE_HARDWARE)
  config                  	   0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                	   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
 ------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  sample_type                      IP|TID|TIME|CPU
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
 ------------------------------------------------------------
sys_perf_event_open: pid 1189569  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
 ---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
 ---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
    #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
    #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
    #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
    #3 0x7f29ce849cc2 in raise raise.c:27
    #4 0x7f29ce8324ac in abort abort.c:81
    #5 0x565358f662d4 in check_leaks builtin-test.c:226
    #6 0x565358f6682e in run_test_child builtin-test.c:344
    #7 0x565358ef7121 in start_command run-command.c:128
    #8 0x565358f67273 in start_test builtin-test.c:545
    #9 0x565358f6771d in __cmd_test builtin-test.c:647
    #10 0x565358f682bd in cmd_test builtin-test.c:849
    #11 0x565358ee5ded in run_builtin perf.c:349
    #12 0x565358ee6085 in handle_internal_command perf.c:401
    #13 0x565358ee61de in run_argv perf.c:448
    #14 0x565358ee6527 in main perf.c:555
    #15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
    #16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #17 0x565358e391c1 in _start perf[851c1]
  7: PERF_RECORD_* events & perf_sample fields                       : FAILED!
```

After:
```
$ perf test 7
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Chun-Tse Shao <[email protected]>
Cc: Howard Chu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 17, 2025
Since blamed commit, unregister_netdevice_many_notify() takes the netdev
mutex if the device needs it.

If the device list is too long, this will lock more device mutexes than
lockdep can handle:

unshare -n \
 bash -c 'for i in $(seq 1 100);do ip link add foo$i type dummy;done'

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by kworker/u16:1/69:
 #0: ..148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work
 #1: ..d40 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work
 #2: ..bd0 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net
 #3: ..aa8 (rtnl_mutex){+.+.}-{4:4}, at: default_device_exit_batch
 #4: ..cb0 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: unregister_netdevice_many_notify
[..]

Add a helper to close and then unlock a list of net_devices.
Devices that are not up have to be skipped - netif_close_many always
removes them from the list without any other actions taken, so they'd
remain in locked state.

Close devices whenever we've used up half of the tracking slots or we
processed entire list without hitting the limit.

Fixes: 7e4d784 ("net: hold netdev instance lock during rtnetlink operations")
Signed-off-by: Florian Westphal <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 20, 2025
[ Upstream commit 48918ca ]

The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.

Before:
```
$ perf test -vv 7
  7: PERF_RECORD_* events & perf_sample fields:
 --- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
 ------------------------------------------------------------
perf_event_attr:
  type                    	   0 (PERF_TYPE_HARDWARE)
  config                  	   0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                	   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
 ------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  sample_type                      IP|TID|TIME|CPU
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
 ------------------------------------------------------------
sys_perf_event_open: pid 1189569  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
 ---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
 ---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
    #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
    #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
    #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
    #3 0x7f29ce849cc2 in raise raise.c:27
    #4 0x7f29ce8324ac in abort abort.c:81
    #5 0x565358f662d4 in check_leaks builtin-test.c:226
    #6 0x565358f6682e in run_test_child builtin-test.c:344
    #7 0x565358ef7121 in start_command run-command.c:128
    #8 0x565358f67273 in start_test builtin-test.c:545
    #9 0x565358f6771d in __cmd_test builtin-test.c:647
    #10 0x565358f682bd in cmd_test builtin-test.c:849
    #11 0x565358ee5ded in run_builtin perf.c:349
    #12 0x565358ee6085 in handle_internal_command perf.c:401
    #13 0x565358ee61de in run_argv perf.c:448
    #14 0x565358ee6527 in main perf.c:555
    #15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
    #16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #17 0x565358e391c1 in _start perf[851c1]
  7: PERF_RECORD_* events & perf_sample fields                       : FAILED!
```

After:
```
$ perf test 7
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Chun-Tse Shao <[email protected]>
Cc: Howard Chu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 20, 2025
commit 0570327 upstream.

Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  #1 [3800313fb88] pci_remove_bus_device at c915105ba
  #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  #4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  #5 [3800313fca0] __zpci_event_availability at c90fb3dca
  #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  #7 [3800313fd60] crw_collect_info at c91905822
  #8 [3800313fe10] kthread at c90feb390
  #9 [3800313fe68] __ret_from_fork at c90f6aa64
  #10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Benjamin Block <[email protected]>
Reviewed-by: Farhan Ali <[email protected]>
Reviewed-by: Julian Ruess <[email protected]>
Cc: [email protected]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 23, 2025
JIRA: https://issues.redhat.com/browse/RHEL-78203

commit ee684de
Author: Viktor Malik <[email protected]>
Date:   Tue Apr 15 17:50:14 2025 +0200

    libbpf: Fix buffer overflow in bpf_object__init_prog
    
    As shown in [1], it is possible to corrupt a BPF ELF file such that
    arbitrary BPF instructions are loaded by libbpf. This can be done by
    setting a symbol (BPF program) section offset to a large (unsigned)
    number such that <section start + symbol offset> overflows and points
    before the section data in the memory.
    
    Consider the situation below where:
    - prog_start = sec_start + symbol_offset    <-- size_t overflow here
    - prog_end   = prog_start + prog_size
    
        prog_start        sec_start        prog_end        sec_end
            |                |                 |              |
            v                v                 v              v
        .....................|################################|............
    
    The report in [1] also provides a corrupted BPF ELF which can be used as
    a reproducer:
    
        $ readelf -S crash
        Section Headers:
          [Nr] Name              Type             Address           Offset
               Size              EntSize          Flags  Link  Info  Align
        ...
          [ 2] uretprobe.mu[...] PROGBITS         0000000000000000  00000040
               0000000000000068  0000000000000000  AX       0     0     8
    
        $ readelf -s crash
        Symbol table '.symtab' contains 8 entries:
           Num:    Value          Size Type    Bind   Vis      Ndx Name
        ...
             6: ffffffffffffffb8   104 FUNC    GLOBAL DEFAULT    2 handle_tp
    
    Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will
    point before the actual memory where section 2 is allocated.
    
    This is also reported by AddressSanitizer:
    
        =================================================================
        ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490
        READ of size 104 at 0x7c7302fe0000 thread T0
            #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76)
            #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856
            #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928
            #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930
            #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067
            #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090
            #6 0x000000400c16 in main /poc/poc.c:8
            #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4)
            #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667)
            #9 0x000000400b34 in _start (/poc/poc+0x400b34)
    
        0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8)
        allocated by thread T0 here:
            #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b)
            #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600)
            #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018)
            #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740
    
    The problem here is that currently, libbpf only checks that the program
    end is within the section bounds. There used to be a check
    `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was
    removed by commit 6245947 ("libbpf: Allow gaps in BPF program
    sections to support overriden weak functions").
    
    Add a check for detecting the overflow of `sec_off + prog_sz` to
    bpf_object__init_prog to fix this issue.
    
    [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md
    
    Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
    Reported-by: lmarch2 <[email protected]>
    Signed-off-by: Viktor Malik <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Reviewed-by: Shung-Hsi Yu <[email protected]>
    Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md
    Link: https://lore.kernel.org/bpf/[email protected]

Signed-off-by: Viktor Malik <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 24, 2025
JIRA: https://issues.redhat.com/browse/RHEL-112997

commit ffa1e7a
Author: Thomas Hellström <[email protected]>
Date:   Tue Mar 18 10:55:48 2025 +0100

    block: Make request_queue lockdep splats show up earlier

    In recent kernels, there are lockdep splats around the
    struct request_queue::io_lockdep_map, similar to [1], but they
    typically don't show up until reclaim with writeback happens.

    Having multiple kernel versions released with a known risc of kernel
    deadlock during reclaim writeback should IMHO be addressed and
    backported to -stable with the highest priority.

    In order to have these lockdep splats show up earlier,
    preferrably during system initialization, prime the
    struct request_queue::io_lockdep_map as GFP_KERNEL reclaim-
    tainted. This will instead lead to lockdep splats looking similar
    to [2], but without the need for reclaim + writeback
    happening.

    [1]:
    [  189.762244] ======================================================
    [  189.762432] WARNING: possible circular locking dependency detected
    [  189.762441] 6.14.0-rc6-xe+ #6 Tainted: G     U
    [  189.762450] ------------------------------------------------------
    [  189.762459] kswapd0/119 is trying to acquire lock:
    [  189.762467] ffff888110ceb710 (&q->q_usage_counter(io)#26){++++}-{0:0}, at: __submit_bio+0x76/0x230
    [  189.762485]
                   but task is already holding lock:
    [  189.762494] ffffffff834c97c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xbe/0xb00
    [  189.762507]
                   which lock already depends on the new lock.

    [  189.762519]
                   the existing dependency chain (in reverse order) is:
    [  189.762529]
                   -> #2 (fs_reclaim){+.+.}-{0:0}:
    [  189.762540]        fs_reclaim_acquire+0xc5/0x100
    [  189.762548]        kmem_cache_alloc_lru_noprof+0x4a/0x480
    [  189.762558]        alloc_inode+0xaa/0xe0
    [  189.762566]        iget_locked+0x157/0x330
    [  189.762573]        kernfs_get_inode+0x1b/0x110
    [  189.762582]        kernfs_get_tree+0x1b0/0x2e0
    [  189.762590]        sysfs_get_tree+0x1f/0x60
    [  189.762597]        vfs_get_tree+0x2a/0xf0
    [  189.762605]        path_mount+0x4cd/0xc00
    [  189.762613]        __x64_sys_mount+0x119/0x150
    [  189.762621]        x64_sys_call+0x14f2/0x2310
    [  189.762630]        do_syscall_64+0x91/0x180
    [  189.762637]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  189.762647]
                   -> #1 (&root->kernfs_rwsem){++++}-{3:3}:
    [  189.762659]        down_write+0x3e/0xf0
    [  189.762667]        kernfs_remove+0x32/0x60
    [  189.762676]        sysfs_remove_dir+0x4f/0x60
    [  189.762685]        __kobject_del+0x33/0xa0
    [  189.762709]        kobject_del+0x13/0x30
    [  189.762716]        elv_unregister_queue+0x52/0x80
    [  189.762725]        elevator_switch+0x68/0x360
    [  189.762733]        elv_iosched_store+0x14b/0x1b0
    [  189.762756]        queue_attr_store+0x181/0x1e0
    [  189.762765]        sysfs_kf_write+0x49/0x80
    [  189.762773]        kernfs_fop_write_iter+0x17d/0x250
    [  189.762781]        vfs_write+0x281/0x540
    [  189.762790]        ksys_write+0x72/0xf0
    [  189.762798]        __x64_sys_write+0x19/0x30
    [  189.762807]        x64_sys_call+0x2a3/0x2310
    [  189.762815]        do_syscall_64+0x91/0x180
    [  189.762823]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  189.762833]
                   -> #0 (&q->q_usage_counter(io)#26){++++}-{0:0}:
    [  189.762845]        __lock_acquire+0x1525/0x2760
    [  189.762854]        lock_acquire+0xca/0x310
    [  189.762861]        blk_mq_submit_bio+0x8a2/0xba0
    [  189.762870]        __submit_bio+0x76/0x230
    [  189.762878]        submit_bio_noacct_nocheck+0x323/0x430
    [  189.762888]        submit_bio_noacct+0x2cc/0x620
    [  189.762896]        submit_bio+0x38/0x110
    [  189.762904]        __swap_writepage+0xf5/0x380
    [  189.762912]        swap_writepage+0x3c7/0x600
    [  189.762920]        shmem_writepage+0x3da/0x4f0
    [  189.762929]        pageout+0x13f/0x310
    [  189.762937]        shrink_folio_list+0x61c/0xf60
    [  189.763261]        evict_folios+0x378/0xcd0
    [  189.763584]        try_to_shrink_lruvec+0x1b0/0x360
    [  189.763946]        shrink_one+0x10e/0x200
    [  189.764266]        shrink_node+0xc02/0x1490
    [  189.764586]        balance_pgdat+0x563/0xb00
    [  189.764934]        kswapd+0x1e8/0x430
    [  189.765249]        kthread+0x10b/0x260
    [  189.765559]        ret_from_fork+0x44/0x70
    [  189.765889]        ret_from_fork_asm+0x1a/0x30
    [  189.766198]
                   other info that might help us debug this:

    [  189.767089] Chain exists of:
                     &q->q_usage_counter(io)#26 --> &root->kernfs_rwsem --> fs_reclaim

    [  189.767971]  Possible unsafe locking scenario:

    [  189.768555]        CPU0                    CPU1
    [  189.768849]        ----                    ----
    [  189.769136]   lock(fs_reclaim);
    [  189.769421]                                lock(&root->kernfs_rwsem);
    [  189.769714]                                lock(fs_reclaim);
    [  189.770016]   rlock(&q->q_usage_counter(io)#26);
    [  189.770305]
                    *** DEADLOCK ***

    [  189.771167] 1 lock held by kswapd0/119:
    [  189.771453]  #0: ffffffff834c97c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xbe/0xb00
    [  189.771770]
                   stack backtrace:
    [  189.772351] CPU: 4 UID: 0 PID: 119 Comm: kswapd0 Tainted: G     U             6.14.0-rc6-xe+ #6
    [  189.772353] Tainted: [U]=USER
    [  189.772354] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
    [  189.772354] Call Trace:
    [  189.772355]  <TASK>
    [  189.772356]  dump_stack_lvl+0x6e/0xa0
    [  189.772359]  dump_stack+0x10/0x18
    [  189.772360]  print_circular_bug.cold+0x17a/0x1b7
    [  189.772363]  check_noncircular+0x13a/0x150
    [  189.772365]  ? __pfx_stack_trace_consume_entry+0x10/0x10
    [  189.772368]  __lock_acquire+0x1525/0x2760
    [  189.772368]  ? ret_from_fork_asm+0x1a/0x30
    [  189.772371]  lock_acquire+0xca/0x310
    [  189.772372]  ? __submit_bio+0x76/0x230
    [  189.772375]  ? lock_release+0xd5/0x2c0
    [  189.772376]  blk_mq_submit_bio+0x8a2/0xba0
    [  189.772378]  ? __submit_bio+0x76/0x230
    [  189.772380]  __submit_bio+0x76/0x230
    [  189.772382]  ? trace_hardirqs_on+0x1e/0xe0
    [  189.772384]  submit_bio_noacct_nocheck+0x323/0x430
    [  189.772386]  ? submit_bio_noacct_nocheck+0x323/0x430
    [  189.772387]  ? __might_sleep+0x58/0xa0
    [  189.772390]  submit_bio_noacct+0x2cc/0x620
    [  189.772391]  ? count_memcg_events+0x68/0x90
    [  189.772393]  submit_bio+0x38/0x110
    [  189.772395]  __swap_writepage+0xf5/0x380
    [  189.772396]  swap_writepage+0x3c7/0x600
    [  189.772397]  shmem_writepage+0x3da/0x4f0
    [  189.772401]  pageout+0x13f/0x310
    [  189.772406]  shrink_folio_list+0x61c/0xf60
    [  189.772409]  ? isolate_folios+0xe80/0x16b0
    [  189.772410]  ? mark_held_locks+0x46/0x90
    [  189.772412]  evict_folios+0x378/0xcd0
    [  189.772414]  ? evict_folios+0x34a/0xcd0
    [  189.772415]  ? lock_is_held_type+0xa3/0x130
    [  189.772417]  try_to_shrink_lruvec+0x1b0/0x360
    [  189.772420]  shrink_one+0x10e/0x200
    [  189.772421]  shrink_node+0xc02/0x1490
    [  189.772423]  ? shrink_node+0xa08/0x1490
    [  189.772424]  ? shrink_node+0xbd8/0x1490
    [  189.772425]  ? mem_cgroup_iter+0x366/0x480
    [  189.772427]  balance_pgdat+0x563/0xb00
    [  189.772428]  ? balance_pgdat+0x563/0xb00
    [  189.772430]  ? trace_hardirqs_on+0x1e/0xe0
    [  189.772431]  ? finish_task_switch.isra.0+0xcb/0x330
    [  189.772433]  ? __switch_to_asm+0x33/0x70
    [  189.772437]  kswapd+0x1e8/0x430
    [  189.772438]  ? __pfx_autoremove_wake_function+0x10/0x10
    [  189.772440]  ? __pfx_kswapd+0x10/0x10
    [  189.772441]  kthread+0x10b/0x260
    [  189.772443]  ? __pfx_kthread+0x10/0x10
    [  189.772444]  ret_from_fork+0x44/0x70
    [  189.772446]  ? __pfx_kthread+0x10/0x10
    [  189.772447]  ret_from_fork_asm+0x1a/0x30
    [  189.772450]  </TASK>

    [2]:
    [    8.760253] ======================================================
    [    8.760254] WARNING: possible circular locking dependency detected
    [    8.760255] 6.14.0-rc6-xe+ #7 Tainted: G     U
    [    8.760256] ------------------------------------------------------
    [    8.760257] (udev-worker)/674 is trying to acquire lock:
    [    8.760259] ffff888100e39148 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_remove+0x32/0x60
    [    8.760265]
                   but task is already holding lock:
    [    8.760266] ffff888110dc7680 (&q->q_usage_counter(io)#27){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x30
    [    8.760272]
                   which lock already depends on the new lock.

    [    8.760272]
                   the existing dependency chain (in reverse order) is:
    [    8.760273]
                   -> #2 (&q->q_usage_counter(io)#27){++++}-{0:0}:
    [    8.760276]        blk_alloc_queue+0x30a/0x350
    [    8.760279]        blk_mq_alloc_queue+0x6b/0xe0
    [    8.760281]        scsi_alloc_sdev+0x276/0x3c0
    [    8.760284]        scsi_probe_and_add_lun+0x22a/0x440
    [    8.760286]        __scsi_scan_target+0x109/0x230
    [    8.760288]        scsi_scan_channel+0x65/0xc0
    [    8.760290]        scsi_scan_host_selected+0xff/0x140
    [    8.760292]        do_scsi_scan_host+0xa7/0xc0
    [    8.760293]        do_scan_async+0x1c/0x160
    [    8.760295]        async_run_entry_fn+0x32/0x150
    [    8.760299]        process_one_work+0x224/0x5f0
    [    8.760302]        worker_thread+0x1d4/0x3e0
    [    8.760304]        kthread+0x10b/0x260
    [    8.760306]        ret_from_fork+0x44/0x70
    [    8.760309]        ret_from_fork_asm+0x1a/0x30
    [    8.760312]
                   -> #1 (fs_reclaim){+.+.}-{0:0}:
    [    8.760315]        fs_reclaim_acquire+0xc5/0x100
    [    8.760317]        kmem_cache_alloc_lru_noprof+0x4a/0x480
    [    8.760319]        alloc_inode+0xaa/0xe0
    [    8.760322]        iget_locked+0x157/0x330
    [    8.760323]        kernfs_get_inode+0x1b/0x110
    [    8.760325]        kernfs_get_tree+0x1b0/0x2e0
    [    8.760327]        sysfs_get_tree+0x1f/0x60
    [    8.760329]        vfs_get_tree+0x2a/0xf0
    [    8.760332]        path_mount+0x4cd/0xc00
    [    8.760334]        __x64_sys_mount+0x119/0x150
    [    8.760336]        x64_sys_call+0x14f2/0x2310
    [    8.760338]        do_syscall_64+0x91/0x180
    [    8.760340]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [    8.760342]
                   -> #0 (&root->kernfs_rwsem){++++}-{3:3}:
    [    8.760345]        __lock_acquire+0x1525/0x2760
    [    8.760347]        lock_acquire+0xca/0x310
    [    8.760348]        down_write+0x3e/0xf0
    [    8.760350]        kernfs_remove+0x32/0x60
    [    8.760351]        sysfs_remove_dir+0x4f/0x60
    [    8.760353]        __kobject_del+0x33/0xa0
    [    8.760355]        kobject_del+0x13/0x30
    [    8.760356]        elv_unregister_queue+0x52/0x80
    [    8.760358]        elevator_switch+0x68/0x360
    [    8.760360]        elv_iosched_store+0x14b/0x1b0
    [    8.760362]        queue_attr_store+0x181/0x1e0
    [    8.760364]        sysfs_kf_write+0x49/0x80
    [    8.760366]        kernfs_fop_write_iter+0x17d/0x250
    [    8.760367]        vfs_write+0x281/0x540
    [    8.760370]        ksys_write+0x72/0xf0
    [    8.760372]        __x64_sys_write+0x19/0x30
    [    8.760374]        x64_sys_call+0x2a3/0x2310
    [    8.760376]        do_syscall_64+0x91/0x180
    [    8.760377]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [    8.760380]
                   other info that might help us debug this:

    [    8.760380] Chain exists of:
                     &root->kernfs_rwsem --> fs_reclaim --> &q->q_usage_counter(io)#27

    [    8.760384]  Possible unsafe locking scenario:

    [    8.760384]        CPU0                    CPU1
    [    8.760385]        ----                    ----
    [    8.760385]   lock(&q->q_usage_counter(io)#27);
    [    8.760387]                                lock(fs_reclaim);
    [    8.760388]                                lock(&q->q_usage_counter(io)#27);
    [    8.760390]   lock(&root->kernfs_rwsem);
    [    8.760391]
                    *** DEADLOCK ***

    [    8.760391] 6 locks held by (udev-worker)/674:
    [    8.760392]  #0: ffff8881209ac420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x72/0xf0
    [    8.760398]  #1: ffff88810c80f488 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x136/0x250
    [    8.760402]  #2: ffff888125d1d330 (kn->active#101){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x13f/0x250
    [    8.760406]  #3: ffff888110dc7bb0 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x148/0x1e0
    [    8.760411]  #4: ffff888110dc7680 (&q->q_usage_counter(io)#27){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x30
    [    8.760416]  #5: ffff888110dc76b8 (&q->q_usage_counter(queue)#27){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x30
    [    8.760421]
                   stack backtrace:
    [    8.760422] CPU: 7 UID: 0 PID: 674 Comm: (udev-worker) Tainted: G     U             6.14.0-rc6-xe+ #7
    [    8.760424] Tainted: [U]=USER
    [    8.760425] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023
    [    8.760426] Call Trace:
    [    8.760427]  <TASK>
    [    8.760428]  dump_stack_lvl+0x6e/0xa0
    [    8.760431]  dump_stack+0x10/0x18
    [    8.760433]  print_circular_bug.cold+0x17a/0x1b7
    [    8.760437]  check_noncircular+0x13a/0x150
    [    8.760441]  ? save_trace+0x54/0x360
    [    8.760445]  __lock_acquire+0x1525/0x2760
    [    8.760446]  ? irqentry_exit+0x3a/0xb0
    [    8.760448]  ? sysvec_apic_timer_interrupt+0x57/0xc0
    [    8.760452]  lock_acquire+0xca/0x310
    [    8.760453]  ? kernfs_remove+0x32/0x60
    [    8.760457]  down_write+0x3e/0xf0
    [    8.760459]  ? kernfs_remove+0x32/0x60
    [    8.760460]  kernfs_remove+0x32/0x60
    [    8.760462]  sysfs_remove_dir+0x4f/0x60
    [    8.760464]  __kobject_del+0x33/0xa0
    [    8.760466]  kobject_del+0x13/0x30
    [    8.760467]  elv_unregister_queue+0x52/0x80
    [    8.760470]  elevator_switch+0x68/0x360
    [    8.760472]  elv_iosched_store+0x14b/0x1b0
    [    8.760475]  queue_attr_store+0x181/0x1e0
    [    8.760479]  ? lock_acquire+0xca/0x310
    [    8.760480]  ? kernfs_fop_write_iter+0x13f/0x250
    [    8.760482]  ? lock_is_held_type+0xa3/0x130
    [    8.760485]  sysfs_kf_write+0x49/0x80
    [    8.760487]  kernfs_fop_write_iter+0x17d/0x250
    [    8.760489]  vfs_write+0x281/0x540
    [    8.760494]  ksys_write+0x72/0xf0
    [    8.760497]  __x64_sys_write+0x19/0x30
    [    8.760499]  x64_sys_call+0x2a3/0x2310
    [    8.760502]  do_syscall_64+0x91/0x180
    [    8.760504]  ? trace_hardirqs_off+0x5d/0xe0
    [    8.760506]  ? handle_softirqs+0x479/0x4d0
    [    8.760508]  ? hrtimer_interrupt+0x13f/0x280
    [    8.760511]  ? irqentry_exit_to_user_mode+0x8b/0x260
    [    8.760513]  ? clear_bhb_loop+0x15/0x70
    [    8.760515]  ? clear_bhb_loop+0x15/0x70
    [    8.760516]  ? clear_bhb_loop+0x15/0x70
    [    8.760518]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [    8.760520] RIP: 0033:0x7aa3bf2f5504
    [    8.760522] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
    [    8.760523] RSP: 002b:00007ffc1e3697d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
    [    8.760526] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007aa3bf2f5504
    [    8.760527] RDX: 0000000000000003 RSI: 00007ffc1e369ae0 RDI: 000000000000001c
    [    8.760528] RBP: 00007ffc1e369800 R08: 00007aa3bf3f51c8 R09: 00007ffc1e3698b0
    [    8.760528] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
    [    8.760529] R13: 00007ffc1e369ae0 R14: 0000613ccf21f2f0 R15: 00007aa3bf3f4e80
    [    8.760533]  </TASK>

    v2:
    - Update a code comment to increase readability (Ming Lei).

    Cc: Jens Axboe <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Cc: Ming Lei <[email protected]>
    Signed-off-by: Thomas Hellström <[email protected]>
    Reviewed-by: Ming Lei <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>

Signed-off-by: Ming Lei <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 24, 2025
JIRA: https://issues.redhat.com/browse/RHEL-112997

commit 9730763
Author: Nilay Shroff <[email protected]>
Date:   Wed Mar 19 16:23:46 2025 +0530

    block: correct locking order for protecting blk-wbt parameters

    The commit '245618f8e45f ("block: protect wbt_lat_usec using q->
    elevator_lock")' introduced q->elevator_lock to protect updates
    to blk-wbt parameters when writing to the sysfs attribute wbt_
    lat_usec and the cgroup attribute io.cost.qos.  However, both
    these attributes also acquire q->rq_qos_mutex, leading to the
    following lockdep warning:

    ======================================================
    WARNING: possible circular locking dependency detected
    6.14.0-rc5+ #138 Not tainted
    ------------------------------------------------------
    bash/5902 is trying to acquire lock:
    c000000085d495a0 (&q->rq_qos_mutex){+.+.}-{4:4}, at: wbt_init+0x164/0x238

    but task is already holding lock:
    c000000085d498c8 (&q->elevator_lock){+.+.}-{4:4}, at: queue_wb_lat_store+0xb0/0x20c

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&q->elevator_lock){+.+.}-{4:4}:
            __mutex_lock+0xf0/0xa58
            ioc_qos_write+0x16c/0x85c
            cgroup_file_write+0xc4/0x32c
            kernfs_fop_write_iter+0x1b8/0x29c
            vfs_write+0x410/0x584
            ksys_write+0x84/0x140
            system_call_exception+0x134/0x360
            system_call_vectored_common+0x15c/0x2ec

    -> #0 (&q->rq_qos_mutex){+.+.}-{4:4}:
            __lock_acquire+0x1b6c/0x2ae0
            lock_acquire+0x140/0x430
            __mutex_lock+0xf0/0xa58
            wbt_init+0x164/0x238
            queue_wb_lat_store+0x1dc/0x20c
            queue_attr_store+0x12c/0x164
            sysfs_kf_write+0x6c/0xb0
            kernfs_fop_write_iter+0x1b8/0x29c
            vfs_write+0x410/0x584
            ksys_write+0x84/0x140
            system_call_exception+0x134/0x360
            system_call_vectored_common+0x15c/0x2ec

    other info that might help us debug this:

        Possible unsafe locking scenario:

            CPU0                    CPU1
            ----                    ----
        lock(&q->elevator_lock);
                                    lock(&q->rq_qos_mutex);
                                    lock(&q->elevator_lock);
        lock(&q->rq_qos_mutex);

        *** DEADLOCK ***

    6 locks held by bash/5902:
        #0: c000000051122400 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x84/0x140
        #1: c00000007383f088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x174/0x29c
        #2: c000000008550428 (kn->active#182){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x180/0x29c
        #3: c000000085d493a8 (&q->q_usage_counter(io)#5){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
        #4: c000000085d493e0 (&q->q_usage_counter(queue)#5){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
        #5: c000000085d498c8 (&q->elevator_lock){+.+.}-{4:4}, at: queue_wb_lat_store+0xb0/0x20c

    stack backtrace:
    CPU: 17 UID: 0 PID: 5902 Comm: bash Kdump: loaded Not tainted 6.14.0-rc5+ #138
    Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries
    Call Trace:
    [c0000000721ef590] [c00000000118f8a8] dump_stack_lvl+0x108/0x18c (unreliable)
    [c0000000721ef5c0] [c00000000022563c] print_circular_bug+0x448/0x604
    [c0000000721ef670] [c000000000225a44] check_noncircular+0x24c/0x26c
    [c0000000721ef740] [c00000000022bf28] __lock_acquire+0x1b6c/0x2ae0
    [c0000000721ef870] [c000000000229240] lock_acquire+0x140/0x430
    [c0000000721ef970] [c0000000011cfbec] __mutex_lock+0xf0/0xa58
    [c0000000721efaa0] [c00000000096c46c] wbt_init+0x164/0x238
    [c0000000721efaf0] [c0000000008f8cd8] queue_wb_lat_store+0x1dc/0x20c
    [c0000000721efb50] [c0000000008f8fa0] queue_attr_store+0x12c/0x164
    [c0000000721efc60] [c0000000007c11cc] sysfs_kf_write+0x6c/0xb0
    [c0000000721efca0] [c0000000007bfa4c] kernfs_fop_write_iter+0x1b8/0x29c
    [c0000000721efcf0] [c0000000006a281c] vfs_write+0x410/0x584
    [c0000000721efdc0] [c0000000006a2cc8] ksys_write+0x84/0x140
    [c0000000721efe10] [c000000000031b64] system_call_exception+0x134/0x360
    [c0000000721efe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec

    >From the above log it's apparent that method which writes to sysfs attr
    wbt_lat_usec acquires q->elevator_lock first, and then acquires q->rq_
    qos_mutex. However the another method which writes to io.cost.qos,
    acquires q->rq_qos_mutex first, and then acquires q->rq_qos_mutex. So
    this could potentially cause the deadlock.

    A closer look at ioc_qos_write shows that correcting the lock order is
    non-trivial because q->rq_qos_mutex is acquired in blkg_conf_open_bdev
    and released in blkg_conf_exit. The function blkg_conf_open_bdev is
    responsible for parsing user input and finding the corresponding block
    device (bdev) from the user provided major:minor number.

    Since we do not know the bdev until blkg_conf_open_bdev completes, we
    cannot simply move q->elevator_lock acquisition before blkg_conf_open_
    bdev. So to address this, we intoduce new helpers blkg_conf_open_bdev_
    frozen and blkg_conf_exit_frozen which are just wrappers around blkg_
    conf_open_bdev and blkg_conf_exit respectively. The helper blkg_conf_
    open_bdev_frozen is similar to blkg_conf_open_bdev, but additionally
    freezes the queue, acquires q->elevator_lock and ensures the correct
    locking order is followed between q->elevator_lock and q->rq_qos_mutex.
    Similarly another helper blkg_conf_exit_frozen in addition to unfreezing
    the queue ensures that we release the locks in correct order.

    By using these helpers, now we maintain the same locking order in all
    code paths where we update blk-wbt parameters.

    Fixes: 245618f ("block: protect wbt_lat_usec using q->elevator_lock")
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-lkp/[email protected]
    Signed-off-by: Nilay Shroff <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>

Signed-off-by: Ming Lei <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 24, 2025
…xit()

JIRA: https://issues.redhat.com/browse/RHEL-112997

commit 78c2713
Author: Ming Lei <[email protected]>
Date:   Mon May 5 22:18:03 2025 +0800

    block: move wbt_enable_default() out of queue freezing from sched ->exit()

    scheduler's ->exit() is called with queue frozen and elevator lock is held, and
    wbt_enable_default() can't be called with queue frozen, otherwise the
    following lockdep warning is triggered:

            #6 (&q->rq_qos_mutex){+.+.}-{4:4}:
            #5 (&eq->sysfs_lock){+.+.}-{4:4}:
            #4 (&q->elevator_lock){+.+.}-{4:4}:
            #3 (&q->q_usage_counter(io)#3){++++}-{0:0}:
            #2 (fs_reclaim){+.+.}-{0:0}:
            #1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}:
            #0 (&q->debugfs_mutex){+.+.}-{4:4}:

    Fix the issue by moving wbt_enable_default() out of bfq's exit(), and
    call it from elevator_change_done().

    Meantime add disk->rqos_state_mutex for covering wbt state change, which
    matches the purpose more than ->elevator_lock.

    Reviewed-by: Hannes Reinecke <[email protected]>
    Reviewed-by: Nilay Shroff <[email protected]>
    Signed-off-by: Ming Lei <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>

Signed-off-by: Ming Lei <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 24, 2025
…ked_roots()

If fs_info->super_copy or fs_info->super_for_commit allocated failed in
btrfs_get_tree_subvol(), then no need to call btrfs_free_fs_info().
Otherwise btrfs_check_leaked_roots() would access NULL pointer because
fs_info->allocated_roots had not been initialised.

syzkaller reported the following information:
  ------------[ cut here ]------------
  BUG: unable to handle page fault for address: fffffffffffffbb0
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 64c9067 P4D 64c9067 PUD 64cb067 PMD 0
  Oops: Oops: 0000 [#1] SMP KASAN PTI
  CPU: 0 UID: 0 PID: 1402 Comm: syz.1.35 Not tainted 6.15.8 #4 PREEMPT(lazy)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), (...)
  RIP: 0010:arch_atomic_read arch/x86/include/asm/atomic.h:23 [inline]
  RIP: 0010:raw_atomic_read include/linux/atomic/atomic-arch-fallback.h:457 [inline]
  RIP: 0010:atomic_read include/linux/atomic/atomic-instrumented.h:33 [inline]
  RIP: 0010:refcount_read include/linux/refcount.h:170 [inline]
  RIP: 0010:btrfs_check_leaked_roots+0x18f/0x2c0 fs/btrfs/disk-io.c:1230
  [...]
  Call Trace:
   <TASK>
   btrfs_free_fs_info+0x310/0x410 fs/btrfs/disk-io.c:1280
   btrfs_get_tree_subvol+0x592/0x6b0 fs/btrfs/super.c:2029
   btrfs_get_tree+0x63/0x80 fs/btrfs/super.c:2097
   vfs_get_tree+0x98/0x320 fs/super.c:1759
   do_new_mount+0x357/0x660 fs/namespace.c:3899
   path_mount+0x716/0x19c0 fs/namespace.c:4226
   do_mount fs/namespace.c:4239 [inline]
   __do_sys_mount fs/namespace.c:4450 [inline]
   __se_sys_mount fs/namespace.c:4427 [inline]
   __x64_sys_mount+0x28c/0x310 fs/namespace.c:4427
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x92/0x180 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f032eaffa8d
  [...]

Fixes: 3bb17a2 ("btrfs: add get_tree callback for new mount API")
CC: [email protected] # 6.12+
Reviewed-by: Daniel Vacek <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Dewei Meng <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 25, 2025
JIRA: https://issues.redhat.com/browse/RHEL-110301
Conflicts: Minor fuzz in core.c in hunk #4.

commit 676e8cf
Author: Peter Zijlstra <[email protected]>
Date:   Fri May 9 13:36:59 2025 +0200

    sched,livepatch: Untangle cond_resched() and live-patching

    With the goal of deprecating / removing VOLUNTARY preempt, live-patch
    needs to stop relying on cond_resched() to make forward progress.

    Instead, rely on schedule() with TASK_FREEZABLE set. Just like
    live-patching, the freezer needs to be able to stop tasks in a safe /
    known state.

    [bigeasy: use likely() in __klp_sched_try_switch() and update comments]

    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Reviewed-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Petr Mladek <[email protected]>
    Tested-by: Petr Mladek <[email protected]>
    Tested-by: Miroslav Benes <[email protected]>
    Acked-by: Miroslav Benes <[email protected]>
    Acked-by: Josh Poimboeuf <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]

Signed-off-by: Phil Auld <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 26, 2025
The original code causes a circular locking dependency found by lockdep.

======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc6-lgci-xe-xe-pw-151626v3+ #1 Tainted: G S   U
------------------------------------------------------
xe_fault_inject/5091 is trying to acquire lock:
ffff888156815688 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}, at: __flush_work+0x25d/0x660

but task is already holding lock:

ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&devcd->mutex){+.+.}-{3:3}:
       mutex_lock_nested+0x4e/0xc0
       devcd_data_write+0x27/0x90
       sysfs_kf_bin_write+0x80/0xf0
       kernfs_fop_write_iter+0x169/0x220
       vfs_write+0x293/0x560
       ksys_write+0x72/0xf0
       __x64_sys_write+0x19/0x30
       x64_sys_call+0x2bf/0x2660
       do_syscall_64+0x93/0xb60
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (kn->active#236){++++}-{0:0}:
       kernfs_drain+0x1e2/0x200
       __kernfs_remove+0xae/0x400
       kernfs_remove_by_name_ns+0x5d/0xc0
       remove_files+0x54/0x70
       sysfs_remove_group+0x3d/0xa0
       sysfs_remove_groups+0x2e/0x60
       device_remove_attrs+0xc7/0x100
       device_del+0x15d/0x3b0
       devcd_del+0x19/0x30
       process_one_work+0x22b/0x6f0
       worker_thread+0x1e8/0x3d0
       kthread+0x11c/0x250
       ret_from_fork+0x26c/0x2e0
       ret_from_fork_asm+0x1a/0x30
-> #0 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}:
       __lock_acquire+0x1661/0x2860
       lock_acquire+0xc4/0x2f0
       __flush_work+0x27a/0x660
       flush_delayed_work+0x5d/0xa0
       dev_coredump_put+0x63/0xa0
       xe_driver_devcoredump_fini+0x12/0x20 [xe]
       devm_action_release+0x12/0x30
       release_nodes+0x3a/0x120
       devres_release_all+0x8a/0xd0
       device_unbind_cleanup+0x12/0x80
       device_release_driver_internal+0x23a/0x280
       device_driver_detach+0x14/0x20
       unbind_store+0xaf/0xc0
       drv_attr_store+0x21/0x50
       sysfs_kf_write+0x4a/0x80
       kernfs_fop_write_iter+0x169/0x220
       vfs_write+0x293/0x560
       ksys_write+0x72/0xf0
       __x64_sys_write+0x19/0x30
       x64_sys_call+0x2bf/0x2660
       do_syscall_64+0x93/0xb60
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Chain exists of: (work_completion)(&(&devcd->del_wk)->work) --> kn->active#236 --> &devcd->mutex
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&devcd->mutex);
                               lock(kn->active#236);
                               lock(&devcd->mutex);
  lock((work_completion)(&(&devcd->del_wk)->work));
 *** DEADLOCK ***
5 locks held by xe_fault_inject/5091:
 #0: ffff8881129f9488 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x72/0xf0
 #1: ffff88810c755078 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x123/0x220
 #2: ffff8881054811a0 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x55/0x280
 #3: ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0
 #4: ffffffff8359e020 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x72/0x660
stack backtrace:
CPU: 14 UID: 0 PID: 5091 Comm: xe_fault_inject Tainted: G S   U              6.16.0-rc6-lgci-xe-xe-pw-151626v3+ #1 PREEMPT_{RT,(lazy)}
Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.10 12/13/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x91/0xf0
 dump_stack+0x10/0x20
 print_circular_bug+0x285/0x360
 check_noncircular+0x135/0x150
 ? register_lock_class+0x48/0x4a0
 __lock_acquire+0x1661/0x2860
 lock_acquire+0xc4/0x2f0
 ? __flush_work+0x25d/0x660
 ? mark_held_locks+0x46/0x90
 ? __flush_work+0x25d/0x660
 __flush_work+0x27a/0x660
 ? __flush_work+0x25d/0x660
 ? trace_hardirqs_on+0x1e/0xd0
 ? __pfx_wq_barrier_func+0x10/0x10
 flush_delayed_work+0x5d/0xa0
 dev_coredump_put+0x63/0xa0
 xe_driver_devcoredump_fini+0x12/0x20 [xe]
 devm_action_release+0x12/0x30
 release_nodes+0x3a/0x120
 devres_release_all+0x8a/0xd0
 device_unbind_cleanup+0x12/0x80
 device_release_driver_internal+0x23a/0x280
 ? bus_find_device+0xa8/0xe0
 device_driver_detach+0x14/0x20
 unbind_store+0xaf/0xc0
 drv_attr_store+0x21/0x50
 sysfs_kf_write+0x4a/0x80
 kernfs_fop_write_iter+0x169/0x220
 vfs_write+0x293/0x560
 ksys_write+0x72/0xf0
 __x64_sys_write+0x19/0x30
 x64_sys_call+0x2bf/0x2660
 do_syscall_64+0x93/0xb60
 ? __f_unlock_pos+0x15/0x20
 ? __x64_sys_getdents64+0x9b/0x130
 ? __pfx_filldir64+0x10/0x10
 ? do_syscall_64+0x1a2/0xb60
 ? clear_bhb_loop+0x30/0x80
 ? clear_bhb_loop+0x30/0x80
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x76e292edd574
Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
RSP: 002b:00007fffe247a828 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000076e292edd574
RDX: 000000000000000c RSI: 00006267f6306063 RDI: 000000000000000b
RBP: 000000000000000c R08: 000076e292fc4b20 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00006267f6306063
R13: 000000000000000b R14: 00006267e6859c00 R15: 000076e29322a000
 </TASK>
xe 0000:03:00.0: [drm] Xe device coredump has been deleted.

Fixes: 01daccf ("devcoredump : Serialize devcd_del work")
Cc: Mukesh Ojha <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: [email protected]
Cc: [email protected] # v6.1+
Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: Matthew Brost <[email protected]>
Acked-by: Mukesh Ojha <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 30, 2025
JIRA: https://issues.redhat.com/browse/RHEL-115358

commit c98cc97
Author: Steven Rostedt <[email protected]>
Date:   Tue May 27 10:58:20 2025 -0400

    ring-buffer: Move cpus_read_lock() outside of buffer->mutex

    Running a modified trace-cmd record --nosplice where it does a mmap of the
    ring buffer when '--nosplice' is set, caused the following lockdep splat:

     ======================================================
     WARNING: possible circular locking dependency detected
     6.15.0-rc7-test-00002-gfb7d03d8a82f #551 Not tainted
     ------------------------------------------------------
     trace-cmd/1113 is trying to acquire lock:
     ffff888100062888 (&buffer->mutex){+.+.}-{4:4}, at: ring_buffer_map+0x11c/0xe70

     but task is already holding lock:
     ffff888100a5f9f8 (&cpu_buffer->mapping_lock){+.+.}-{4:4}, at: ring_buffer_map+0xcf/0xe70

     which lock already depends on the new lock.

     the existing dependency chain (in reverse order) is:

     -> #5 (&cpu_buffer->mapping_lock){+.+.}-{4:4}:
            __mutex_lock+0x192/0x18c0
            ring_buffer_map+0xcf/0xe70
            tracing_buffers_mmap+0x1c4/0x3b0
            __mmap_region+0xd8d/0x1f70
            do_mmap+0x9d7/0x1010
            vm_mmap_pgoff+0x20b/0x390
            ksys_mmap_pgoff+0x2e9/0x440
            do_syscall_64+0x79/0x1c0
            entry_SYSCALL_64_after_hwframe+0x76/0x7e

     -> #4 (&mm->mmap_lock){++++}-{4:4}:
            __might_fault+0xa5/0x110
            _copy_to_user+0x22/0x80
            _perf_ioctl+0x61b/0x1b70
            perf_ioctl+0x62/0x90
            __x64_sys_ioctl+0x134/0x190
            do_syscall_64+0x79/0x1c0
            entry_SYSCALL_64_after_hwframe+0x76/0x7e

     -> #3 (&cpuctx_mutex){+.+.}-{4:4}:
            __mutex_lock+0x192/0x18c0
            perf_event_init_cpu+0x325/0x7c0
            perf_event_init+0x52a/0x5b0
            start_kernel+0x263/0x3e0
            x86_64_start_reservations+0x24/0x30
            x86_64_start_kernel+0x95/0xa0
            common_startup_64+0x13e/0x141

     -> #2 (pmus_lock){+.+.}-{4:4}:
            __mutex_lock+0x192/0x18c0
            perf_event_init_cpu+0xb7/0x7c0
            cpuhp_invoke_callback+0x2c0/0x1030
            __cpuhp_invoke_callback_range+0xbf/0x1f0
            _cpu_up+0x2e7/0x690
            cpu_up+0x117/0x170
            cpuhp_bringup_mask+0xd5/0x120
            bringup_nonboot_cpus+0x13d/0x170
            smp_init+0x2b/0xf0
            kernel_init_freeable+0x441/0x6d0
            kernel_init+0x1e/0x160
            ret_from_fork+0x34/0x70
            ret_from_fork_asm+0x1a/0x30

     -> #1 (cpu_hotplug_lock){++++}-{0:0}:
            cpus_read_lock+0x2a/0xd0
            ring_buffer_resize+0x610/0x14e0
            __tracing_resize_ring_buffer.part.0+0x42/0x120
            tracing_set_tracer+0x7bd/0xa80
            tracing_set_trace_write+0x132/0x1e0
            vfs_write+0x21c/0xe80
            ksys_write+0xf9/0x1c0
            do_syscall_64+0x79/0x1c0
            entry_SYSCALL_64_after_hwframe+0x76/0x7e

     -> #0 (&buffer->mutex){+.+.}-{4:4}:
            __lock_acquire+0x1405/0x2210
            lock_acquire+0x174/0x310
            __mutex_lock+0x192/0x18c0
            ring_buffer_map+0x11c/0xe70
            tracing_buffers_mmap+0x1c4/0x3b0
            __mmap_region+0xd8d/0x1f70
            do_mmap+0x9d7/0x1010
            vm_mmap_pgoff+0x20b/0x390
            ksys_mmap_pgoff+0x2e9/0x440
            do_syscall_64+0x79/0x1c0
            entry_SYSCALL_64_after_hwframe+0x76/0x7e

     other info that might help us debug this:

     Chain exists of:
       &buffer->mutex --> &mm->mmap_lock --> &cpu_buffer->mapping_lock

      Possible unsafe locking scenario:

            CPU0                    CPU1
            ----                    ----
       lock(&cpu_buffer->mapping_lock);
                                    lock(&mm->mmap_lock);
                                    lock(&cpu_buffer->mapping_lock);
       lock(&buffer->mutex);

      *** DEADLOCK ***

     2 locks held by trace-cmd/1113:
      #0: ffff888106b847e0 (&mm->mmap_lock){++++}-{4:4}, at: vm_mmap_pgoff+0x192/0x390
      #1: ffff888100a5f9f8 (&cpu_buffer->mapping_lock){+.+.}-{4:4}, at: ring_buffer_map+0xcf/0xe70

     stack backtrace:
     CPU: 5 UID: 0 PID: 1113 Comm: trace-cmd Not tainted 6.15.0-rc7-test-00002-gfb7d03d8a82f #551 PREEMPT
     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
     Call Trace:
      <TASK>
      dump_stack_lvl+0x6e/0xa0
      print_circular_bug.cold+0x178/0x1be
      check_noncircular+0x146/0x160
      __lock_acquire+0x1405/0x2210
      lock_acquire+0x174/0x310
      ? ring_buffer_map+0x11c/0xe70
      ? ring_buffer_map+0x11c/0xe70
      ? __mutex_lock+0x169/0x18c0
      __mutex_lock+0x192/0x18c0
      ? ring_buffer_map+0x11c/0xe70
      ? ring_buffer_map+0x11c/0xe70
      ? function_trace_call+0x296/0x370
      ? __pfx___mutex_lock+0x10/0x10
      ? __pfx_function_trace_call+0x10/0x10
      ? __pfx___mutex_lock+0x10/0x10
      ? _raw_spin_unlock+0x2d/0x50
      ? ring_buffer_map+0x11c/0xe70
      ? ring_buffer_map+0x11c/0xe70
      ? __mutex_lock+0x5/0x18c0
      ring_buffer_map+0x11c/0xe70
      ? do_raw_spin_lock+0x12d/0x270
      ? find_held_lock+0x2b/0x80
      ? _raw_spin_unlock+0x2d/0x50
      ? rcu_is_watching+0x15/0xb0
      ? _raw_spin_unlock+0x2d/0x50
      ? trace_preempt_on+0xd0/0x110
      tracing_buffers_mmap+0x1c4/0x3b0
      __mmap_region+0xd8d/0x1f70
      ? ring_buffer_lock_reserve+0x99/0xff0
      ? __pfx___mmap_region+0x10/0x10
      ? ring_buffer_lock_reserve+0x99/0xff0
      ? __pfx_ring_buffer_lock_reserve+0x10/0x10
      ? __pfx_ring_buffer_lock_reserve+0x10/0x10
      ? bpf_lsm_mmap_addr+0x4/0x10
      ? security_mmap_addr+0x46/0xd0
      ? lock_is_held_type+0xd9/0x130
      do_mmap+0x9d7/0x1010
      ? 0xffffffffc0370095
      ? __pfx_do_mmap+0x10/0x10
      vm_mmap_pgoff+0x20b/0x390
      ? __pfx_vm_mmap_pgoff+0x10/0x10
      ? 0xffffffffc0370095
      ksys_mmap_pgoff+0x2e9/0x440
      do_syscall_64+0x79/0x1c0
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
     RIP: 0033:0x7fb0963a7de2
     Code: 00 00 00 0f 1f 44 00 00 41 f7 c1 ff 0f 00 00 75 27 55 89 cd 53 48 89 fb 48 85 ff 74 3b 41 89 ea 48 89 df b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 5b 5d c3 0f 1f 00 48 8b 05 e1 9f 0d 00 64
     RSP: 002b:00007ffdcc8fb878 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
     RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb0963a7de2
     RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
     RBP: 0000000000000001 R08: 0000000000000006 R09: 0000000000000000
     R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
     R13: 00007ffdcc8fbe68 R14: 00007fb096628000 R15: 00005633e01a5c90
      </TASK>

    The issue is that cpus_read_lock() is taken within buffer->mutex. The
    memory mapped pages are taken with the mmap_lock held. The buffer->mutex
    is taken within the cpu_buffer->mapping_lock. There's quite a chain with
    all these locks, where the deadlock can be fixed by moving the
    cpus_read_lock() outside the taking of the buffer->mutex.

    Cc: [email protected]
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Vincent Donnefort <[email protected]>
    Link: https://lore.kernel.org/[email protected]
    Fixes: 117c392 ("ring-buffer: Introducing ring-buffer mapping functions")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>

Signed-off-by: Jerome Marchand <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 30, 2025
…ked_roots()

commit 17679ac upstream.

If fs_info->super_copy or fs_info->super_for_commit allocated failed in
btrfs_get_tree_subvol(), then no need to call btrfs_free_fs_info().
Otherwise btrfs_check_leaked_roots() would access NULL pointer because
fs_info->allocated_roots had not been initialised.

syzkaller reported the following information:
  ------------[ cut here ]------------
  BUG: unable to handle page fault for address: fffffffffffffbb0
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 64c9067 P4D 64c9067 PUD 64cb067 PMD 0
  Oops: Oops: 0000 [#1] SMP KASAN PTI
  CPU: 0 UID: 0 PID: 1402 Comm: syz.1.35 Not tainted 6.15.8 #4 PREEMPT(lazy)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), (...)
  RIP: 0010:arch_atomic_read arch/x86/include/asm/atomic.h:23 [inline]
  RIP: 0010:raw_atomic_read include/linux/atomic/atomic-arch-fallback.h:457 [inline]
  RIP: 0010:atomic_read include/linux/atomic/atomic-instrumented.h:33 [inline]
  RIP: 0010:refcount_read include/linux/refcount.h:170 [inline]
  RIP: 0010:btrfs_check_leaked_roots+0x18f/0x2c0 fs/btrfs/disk-io.c:1230
  [...]
  Call Trace:
   <TASK>
   btrfs_free_fs_info+0x310/0x410 fs/btrfs/disk-io.c:1280
   btrfs_get_tree_subvol+0x592/0x6b0 fs/btrfs/super.c:2029
   btrfs_get_tree+0x63/0x80 fs/btrfs/super.c:2097
   vfs_get_tree+0x98/0x320 fs/super.c:1759
   do_new_mount+0x357/0x660 fs/namespace.c:3899
   path_mount+0x716/0x19c0 fs/namespace.c:4226
   do_mount fs/namespace.c:4239 [inline]
   __do_sys_mount fs/namespace.c:4450 [inline]
   __se_sys_mount fs/namespace.c:4427 [inline]
   __x64_sys_mount+0x28c/0x310 fs/namespace.c:4427
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x92/0x180 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f032eaffa8d
  [...]

Fixes: 3bb17a2 ("btrfs: add get_tree callback for new mount API")
CC: [email protected] # 6.12+
Reviewed-by: Daniel Vacek <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Dewei Meng <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 30, 2025
[ Upstream commit a91c809 ]

The original code causes a circular locking dependency found by lockdep.

======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc6-lgci-xe-xe-pw-151626v3+ #1 Tainted: G S   U
------------------------------------------------------
xe_fault_inject/5091 is trying to acquire lock:
ffff888156815688 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}, at: __flush_work+0x25d/0x660

but task is already holding lock:

ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&devcd->mutex){+.+.}-{3:3}:
       mutex_lock_nested+0x4e/0xc0
       devcd_data_write+0x27/0x90
       sysfs_kf_bin_write+0x80/0xf0
       kernfs_fop_write_iter+0x169/0x220
       vfs_write+0x293/0x560
       ksys_write+0x72/0xf0
       __x64_sys_write+0x19/0x30
       x64_sys_call+0x2bf/0x2660
       do_syscall_64+0x93/0xb60
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (kn->active#236){++++}-{0:0}:
       kernfs_drain+0x1e2/0x200
       __kernfs_remove+0xae/0x400
       kernfs_remove_by_name_ns+0x5d/0xc0
       remove_files+0x54/0x70
       sysfs_remove_group+0x3d/0xa0
       sysfs_remove_groups+0x2e/0x60
       device_remove_attrs+0xc7/0x100
       device_del+0x15d/0x3b0
       devcd_del+0x19/0x30
       process_one_work+0x22b/0x6f0
       worker_thread+0x1e8/0x3d0
       kthread+0x11c/0x250
       ret_from_fork+0x26c/0x2e0
       ret_from_fork_asm+0x1a/0x30
-> #0 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}:
       __lock_acquire+0x1661/0x2860
       lock_acquire+0xc4/0x2f0
       __flush_work+0x27a/0x660
       flush_delayed_work+0x5d/0xa0
       dev_coredump_put+0x63/0xa0
       xe_driver_devcoredump_fini+0x12/0x20 [xe]
       devm_action_release+0x12/0x30
       release_nodes+0x3a/0x120
       devres_release_all+0x8a/0xd0
       device_unbind_cleanup+0x12/0x80
       device_release_driver_internal+0x23a/0x280
       device_driver_detach+0x14/0x20
       unbind_store+0xaf/0xc0
       drv_attr_store+0x21/0x50
       sysfs_kf_write+0x4a/0x80
       kernfs_fop_write_iter+0x169/0x220
       vfs_write+0x293/0x560
       ksys_write+0x72/0xf0
       __x64_sys_write+0x19/0x30
       x64_sys_call+0x2bf/0x2660
       do_syscall_64+0x93/0xb60
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Chain exists of: (work_completion)(&(&devcd->del_wk)->work) --> kn->active#236 --> &devcd->mutex
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&devcd->mutex);
                               lock(kn->active#236);
                               lock(&devcd->mutex);
  lock((work_completion)(&(&devcd->del_wk)->work));
 *** DEADLOCK ***
5 locks held by xe_fault_inject/5091:
 #0: ffff8881129f9488 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x72/0xf0
 #1: ffff88810c755078 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x123/0x220
 #2: ffff8881054811a0 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x55/0x280
 #3: ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0
 #4: ffffffff8359e020 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x72/0x660
stack backtrace:
CPU: 14 UID: 0 PID: 5091 Comm: xe_fault_inject Tainted: G S   U              6.16.0-rc6-lgci-xe-xe-pw-151626v3+ #1 PREEMPT_{RT,(lazy)}
Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.10 12/13/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x91/0xf0
 dump_stack+0x10/0x20
 print_circular_bug+0x285/0x360
 check_noncircular+0x135/0x150
 ? register_lock_class+0x48/0x4a0
 __lock_acquire+0x1661/0x2860
 lock_acquire+0xc4/0x2f0
 ? __flush_work+0x25d/0x660
 ? mark_held_locks+0x46/0x90
 ? __flush_work+0x25d/0x660
 __flush_work+0x27a/0x660
 ? __flush_work+0x25d/0x660
 ? trace_hardirqs_on+0x1e/0xd0
 ? __pfx_wq_barrier_func+0x10/0x10
 flush_delayed_work+0x5d/0xa0
 dev_coredump_put+0x63/0xa0
 xe_driver_devcoredump_fini+0x12/0x20 [xe]
 devm_action_release+0x12/0x30
 release_nodes+0x3a/0x120
 devres_release_all+0x8a/0xd0
 device_unbind_cleanup+0x12/0x80
 device_release_driver_internal+0x23a/0x280
 ? bus_find_device+0xa8/0xe0
 device_driver_detach+0x14/0x20
 unbind_store+0xaf/0xc0
 drv_attr_store+0x21/0x50
 sysfs_kf_write+0x4a/0x80
 kernfs_fop_write_iter+0x169/0x220
 vfs_write+0x293/0x560
 ksys_write+0x72/0xf0
 __x64_sys_write+0x19/0x30
 x64_sys_call+0x2bf/0x2660
 do_syscall_64+0x93/0xb60
 ? __f_unlock_pos+0x15/0x20
 ? __x64_sys_getdents64+0x9b/0x130
 ? __pfx_filldir64+0x10/0x10
 ? do_syscall_64+0x1a2/0xb60
 ? clear_bhb_loop+0x30/0x80
 ? clear_bhb_loop+0x30/0x80
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x76e292edd574
Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
RSP: 002b:00007fffe247a828 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000076e292edd574
RDX: 000000000000000c RSI: 00006267f6306063 RDI: 000000000000000b
RBP: 000000000000000c R08: 000076e292fc4b20 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00006267f6306063
R13: 000000000000000b R14: 00006267e6859c00 R15: 000076e29322a000
 </TASK>
xe 0000:03:00.0: [drm] Xe device coredump has been deleted.

Fixes: 01daccf ("devcoredump : Serialize devcd_del work")
Cc: Mukesh Ojha <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: [email protected]
Cc: [email protected] # v6.1+
Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: Matthew Brost <[email protected]>
Acked-by: Mukesh Ojha <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ removed const qualifier from bin_attribute callback parameters ]
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 2, 2025
JIRA: https://issues.redhat.com/browse/RHEL-78200

upstream
========
commit ea04fe1
Author: Aditya Bodkhe <[email protected]>
Date: Tue Apr 29 12:21:32 2025 +0530

description
===========
pert script tests fails with segmentation fault as below:

  92: perf script tests:
  --- start ---
  test child forked, pid 103769
  DB test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ]
  /usr/libexec/perf-core/tests/shell/script.sh: line 35:
  103780 Segmentation fault      (core dumped)
  perf script -i "${perfdatafile}" -s "${db_test}"
  --- Cleaning up ---
  ---- end(-1) ----
  92: perf script tests                                               : FAILED!

Backtrace pointed to :
	#0  0x0000000010247dd0 in maps.machine ()
	#1  0x00000000101d178c in db_export.sample ()
	#2  0x00000000103412c8 in python_process_event ()
	#3  0x000000001004eb28 in process_sample_event ()
	#4  0x000000001024fcd0 in machines.deliver_event ()
	#5  0x000000001025005c in perf_session.deliver_event ()
	#6  0x00000000102568b0 in __ordered_events__flush.part.0 ()
	#7  0x0000000010251618 in perf_session.process_events ()
	#8  0x0000000010053620 in cmd_script ()
	#9  0x00000000100b5a28 in run_builtin ()
	#10 0x00000000100b5f94 in handle_internal_command ()
	#11 0x0000000010011114 in main ()

Further investigation reveals that this occurs in the `perf script tests`,
because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`.

With `perf_db_export_mode` enabled, if a sample originates from a hypervisor,
perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL
when `maps__machine(al->maps)` is called from `db_export__sample`.

As al->maps can be NULL in case of Hypervisor samples , use thread->maps
because even for Hypervisor sample, machine should exist.
If we don't have machine for some reason, return -1 to avoid segmentation fault.

    Reported-by: Disha Goel <[email protected]>
    Signed-off-by: Aditya Bodkhe <[email protected]>
    Reviewed-by: Adrian Hunter <[email protected]>
    Tested-by: Disha Goel <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Adrian Hunter <[email protected]>
    Signed-off-by: Namhyung Kim <[email protected]>

Signed-off-by: Anubhav Shelat <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 2, 2025
JIRA: https://issues.redhat.com/browse/RHEL-78200

upstream
========
commit c21986d
Author: Sergei Trofimovich <[email protected]>
Date: Mon May 5 18:44:19 2025 +0100

description
===========
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:

    $ perf record -a -g --call-graph=dwarf
    $ perf report
    # hung

`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`

    $ strace -y -f -p 2780484
    strace: Process 2780484 attached
    pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached

It's call trace descends into `elfutils`:

    $ gdb -p 2780484
    (gdb) bt
    #0  0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
        at ../sysdeps/unix/sysv/linux/pread64.c:25
    #1  0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
    #2  0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #3  0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #4  0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
       from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #5  0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
        at util/dso.h:537
    #6  0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
    #7  frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
    #8  0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #9  0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
        thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
        best_effort=best_effort@entry=false) at util/thread.h:152
    #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
        sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
    #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
        max_stack=127, symbols=true) at util/machine.c:2920
    #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
        sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
        at util/machine.c:2970
    #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
        sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
    #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
        evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
    #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
        arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
    #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
        evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
    #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
        file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
    #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
    #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
    #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
        file_path=0x105ffbf0 "perf.data") at util/session.c:1419
    #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
    --Type <RET> for more, q to quit, c to continue without paging--
    quit
        prog=prog@entry=0x7fff9df81220) at util/session.c:2132
    #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
        at util/session.c:2181
    #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
    #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
    #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
    #29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
    #30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
        at perf.c:351
    #31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
    #32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
    #33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556

The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.

The change conservatively skips all non-regular files.

    Signed-off-by: Sergei Trofimovich <[email protected]>
    Acked-by: Namhyung Kim <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Namhyung Kim <[email protected]>

Signed-off-by: Anubhav Shelat <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 2, 2025
JIRA: https://issues.redhat.com/browse/RHEL-78200

upstream
========
commit 9c9f4a2
Author: Ian Rogers <[email protected]>
Date: Tue Jun 24 14:05:00 2025 -0700

description
===========
Symbolize stack traces by creating a live machine. Add this
functionality to dump_stack and switch dump_stack users to use
it. Switch TUI to use it. Add stack traces to the child test function
which can be useful to diagnose blocked code.

Example output:
```
$ perf test -vv PERF_RECORD_
...
  7: PERF_RECORD_* events & perf_sample fields:
  7: PERF_RECORD_* events & perf_sample fields                       : Running (1 active)
^C
Signal (2) while running tests.
Terminating tests with the same signal
Internal test harness failure. Completing any started tests:
:  7: PERF_RECORD_* events & perf_sample fields:

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #4 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #5 0x7fc12ff02d68 in __sleep sleep.c:55
    #6 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #7 0x55788c620fb0 in run_test_child builtin-test.c:0
    #8 0x55788c5bd18d in start_command run-command.c:127
    #9 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #10 0x55788c6225bf in cmd_test ??:0
    #11 0x55788c5afbd0 in run_builtin perf.c:0
    #12 0x55788c5afeeb in handle_internal_command perf.c:0
    #13 0x55788c52b383 in main ??:0
    #14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #16 0x55788c52b9d1 in _start ??:0

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45
    #3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26
    #4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36
    #5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0
    #6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #9 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #10 0x7fc12ff02d68 in __sleep sleep.c:55
    #11 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #12 0x55788c620fb0 in run_test_child builtin-test.c:0
    #13 0x55788c5bd18d in start_command run-command.c:127
    #14 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #15 0x55788c6225bf in cmd_test ??:0
    #16 0x55788c5afbd0 in run_builtin perf.c:0
    #17 0x55788c5afeeb in handle_internal_command perf.c:0
    #18 0x55788c52b383 in main ??:0
    #19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #21 0x55788c52b9d1 in _start ??:0
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

    Signed-off-by: Ian Rogers <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Namhyung Kim <[email protected]>

Signed-off-by: Anubhav Shelat <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 2, 2025
JIRA: https://issues.redhat.com/browse/RHEL-78200

upstream
========
commit c72bf82
Author: Thomas Falcon <[email protected]>
Date: Thu Jun 12 11:36:59 2025 -0500

description
===========
Calling perf top with branch filters enabled on Intel CPU's
with branch counters logging (A.K.A LBR event logging [1]) support
results in a segfault.

$ perf top  -e '{cpu_core/cpu-cycles/,cpu_core/event=0xc6,umask=0x3,frontend=0x11,name=frontend_retired_dsb_miss/}' -j any,counter
...
Thread 27 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffafff76c0 (LWP 949003)]
perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
653			*width = env->cpu_pmu_caps ? env->br_cntr_width :
(gdb) bt
 #0  perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
 #1  0x00000000005b1599 in symbol__account_br_cntr (branch=0x7fffcc3db580, evsel=0xfea2d0, offset=12, br_cntr=8) at util/annotate.c:345
 #2  0x00000000005b17fb in symbol__account_cycles (addr=5658172, start=5658160, sym=0x7fffcc0ee420, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:389
 #3  0x00000000005b1976 in addr_map_symbol__account_cycles (ams=0x7fffcd7b01d0, start=0x7fffcd7b02b0, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:422
 #4  0x000000000068d57f in hist__account_cycles (bs=0x110d288, al=0x7fffafff6540, sample=0x7fffafff6760, nonany_branch_mode=false, total_cycles=0x0, evsel=0xfea2d0) at util/hist.c:2850
 #5  0x0000000000446216 in hist_iter__top_callback (iter=0x7fffafff6590, al=0x7fffafff6540, single=true, arg=0x7fffffff9e00) at builtin-top.c:737
 #6  0x0000000000689787 in hist_entry_iter__add (iter=0x7fffafff6590, al=0x7fffafff6540, max_stack_depth=127, arg=0x7fffffff9e00) at util/hist.c:1359
 #7  0x0000000000446710 in perf_event__process_sample (tool=0x7fffffff9e00, event=0x110d250, evsel=0xfea2d0, sample=0x7fffafff6760, machine=0x108c968) at builtin-top.c:845
 #8  0x0000000000447735 in deliver_event (qe=0x7fffffffa120, qevent=0x10fc200) at builtin-top.c:1211
 #9  0x000000000064ccae in do_flush (oe=0x7fffffffa120, show_progress=false) at util/ordered-events.c:245
 #10 0x000000000064d005 in __ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP, timestamp=0) at util/ordered-events.c:324
 #11 0x000000000064d0ef in ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP) at util/ordered-events.c:342
 #12 0x00000000004472a9 in process_thread (arg=0x7fffffff9e00) at builtin-top.c:1120
 #13 0x00007ffff6e7dba8 in start_thread (arg=<optimized out>) at pthread_create.c:448
 #14 0x00007ffff6f01b8c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The cause is that perf_env__find_br_cntr_info tries to access a
null pointer pmu_caps in the perf_env struct. A similar issue exists
for homogeneous core systems which use the cpu_pmu_caps structure.

Fix this by populating cpu_pmu_caps and pmu_caps structures with
values from sysfs when calling perf top with branch stack sampling
enabled.

[1], LBR event logging introduced here:
https://lore.kernel.org/all/[email protected]/

    Reviewed-by: Ian Rogers <[email protected]>
    Signed-off-by: Thomas Falcon <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Namhyung Kim <[email protected]>

Signed-off-by: Anubhav Shelat <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 4, 2025
JIRA: https://issues.redhat.com/browse/RHEL-105612

commit a9c83a0
Author: Jens Axboe <[email protected]>
Date:   Mon Dec 30 14:15:17 2024 -0700

    io_uring/timeout: flush timeouts outside of the timeout lock
    
    syzbot reports that a recent fix causes nesting issues between the (now)
    raw timeoutlock and the eventfd locking:
    
    =============================
    [ BUG: Invalid wait context ]
    6.13.0-rc4-00080-g9828a4c0901f #29 Not tainted
    -----------------------------
    kworker/u32:0/68094 is trying to lock:
    ffff000014d7a520 (&ctx->wqh#2){..-.}-{3:3}, at: eventfd_signal_mask+0x64/0x180
    other info that might help us debug this:
    context-{5:5}
    6 locks held by kworker/u32:0/68094:
     #0: ffff0000c1d98148 ((wq_completion)iou_exit){+.+.}-{0:0}, at: process_one_work+0x4e8/0xfc0
     #1: ffff80008d927c78 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x53c/0xfc0
     #2: ffff0000c59bc3d8 (&ctx->completion_lock){+.+.}-{3:3}, at: io_kill_timeouts+0x40/0x180
     #3: ffff0000c59bc358 (&ctx->timeout_lock){-.-.}-{2:2}, at: io_kill_timeouts+0x48/0x180
     #4: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38
     #5: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38
    stack backtrace:
    CPU: 7 UID: 0 PID: 68094 Comm: kworker/u32:0 Not tainted 6.13.0-rc4-00080-g9828a4c0901f #29
    Hardware name: linux,dummy-virt (DT)
    Workqueue: iou_exit io_ring_exit_work
    Call trace:
     show_stack+0x1c/0x30 (C)
     __dump_stack+0x24/0x30
     dump_stack_lvl+0x60/0x80
     dump_stack+0x14/0x20
     __lock_acquire+0x19f8/0x60c8
     lock_acquire+0x1a4/0x540
     _raw_spin_lock_irqsave+0x90/0xd0
     eventfd_signal_mask+0x64/0x180
     io_eventfd_signal+0x64/0x108
     io_req_local_work_add+0x294/0x430
     __io_req_task_work_add+0x1c0/0x270
     io_kill_timeout+0x1f0/0x288
     io_kill_timeouts+0xd4/0x180
     io_uring_try_cancel_requests+0x2e8/0x388
     io_ring_exit_work+0x150/0x550
     process_one_work+0x5e8/0xfc0
     worker_thread+0x7ec/0xc80
     kthread+0x24c/0x300
     ret_from_fork+0x10/0x20
    
    because after the preempt-rt fix for the timeout lock nesting inside
    the io-wq lock, we now have the eventfd spinlock nesting inside the
    raw timeout spinlock.
    
    Rather than play whack-a-mole with other nesting on the timeout lock,
    split the deletion and killing of timeouts so queueing the task_work
    for the timeout cancelations can get done outside of the timeout lock.
    
    Reported-by: [email protected]
    Fixes: 020b40f ("io_uring: make ctx->timeout_lock a raw spinlock")
    Signed-off-by: Jens Axboe <[email protected]>

Signed-off-by: Jeff Moyer <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 7, 2025
Michael Chan says:

====================
bnxt_en: Bug fixes

Patches 1, 3, and 4 are bug fixes related to the FW log tracing driver
coredump feature recently added in 6.13.  Patch #1 adds the necessary
call to shutdown the FW logging DMA during PCI shutdown.  Patch #3 fixes
a possible null pointer derefernce when using early versions of the FW
with this feature.  Patch #4 adds the coredump header information
unconditionally to make it more robust.

Patch #2 fixes a possible memory leak during PTP shutdown.  Patch #5
eliminates a dmesg warning when doing devlink reload.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants