Skip to content

Gvrose fips legacy 8 compliant/4.18.0 425.13.1 #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gvrose8192
Copy link

@gvrose8192 gvrose8192 commented Oct 21, 2024

Latest work on the FIPS 8 compliant kernel Various linked tasks:
VULN-429
VULN-4095
VULN-597
SECO-169
SECO-94

Kernel selftests have passed https://github.com/user-attachments/files/17512330/kernel-selftest.log with no change in results from previous run.

I've been running netfilter tests in a loop overnight - for i in {1..50000}; do sudo valgrind --log-file=valgrind-results$i.log ./run-tests.sh; done
Typical output, unchanged over any number runs.
nftables-test.log

Valgrind out put from any of hundreds of runs is all the same -
valgrind-results.log

With lockdep enabled and running sudo stress --cpu 28 --io 28 --vm 28 --vm-bytes 1G --timeout 3h I ran multiple passes of the nftables tests with valgrind: for i in {1..4}; do sudo valgrind --log-file=valgrind-results$i.log ./run-tests.sh; done

The nftables tests all pass with no difference from the original tests. Valgrind logs here:
valgrind-results1.log
valgrind-results2.log
valgrind-results3.log
valgrind-results4.log

No lockdep splats, any other splats, no OOMs, no panics, nor any other error messages during the run.

After pulling in the missing patch "netfilter: nf_tables: set backend .flush always succeeds" I reran the netfilter tests overnight with lockdep and kmemleak enabled in the kernel and running "sudo stress --cpu 28 --io 28 --vm 28 --vm-bytes 1G --timeout 8h" and found no issues. The logfiles are unchanged from the previous night's runs.

@gvrose8192 gvrose8192 requested a review from PlaidCat October 21, 2024 21:12
@gvrose8192
Copy link
Author

The 2nd commit is crap - the code is right but whacked the commit message metadata. I'll fix that up. Leaving the rest for your review.

@gvrose8192 gvrose8192 force-pushed the gvrose_fips-legacy-8-compliant/4.18.0-425.13.1 branch from 22f98fe to 3f2fa28 Compare October 21, 2024 21:23
@gvrose8192
Copy link
Author

The 2nd commit is crap - the code is right but whacked the commit message metadata. I'll fix that up. Leaving the rest for your review.

OK, fixed with a force push.

@gvrose8192 gvrose8192 force-pushed the gvrose_fips-legacy-8-compliant/4.18.0-425.13.1 branch 2 times, most recently from 8f50770 to 9cff1f7 Compare October 23, 2024 00:31
@gvrose8192
Copy link
Author

This PR is now ready for full review.
CVE's addressed by this PR:
CVE-2023-4244
CVE-2023-52581
CVE-2024-26925

Github actions build checks here:
https://github.com/ctrliq/kernel-src-tree/actions/runs/11505520259 Checked the PR for valid commit messages
https://github.com/ctrliq/kernel-src-tree/actions/runs/11505519609 Checked the compile/build for x86_64
https://github.com/ctrliq/kernel-src-tree/actions/runs/11505519601 Checked the compile/build for aarch64 - this is not really valid for fips8, but it demonstrates the code changes are portable.

Kernel selftest log shows no new errors or consistent discrepancies from the base kernel from before this PR. I.E things that failed before still fail, things that passed before, still pass.

kernel-selftest.log

What remains to do:

  1. This will require some close inspection of those commits marked with an 'upstream-diff' tag.
  2. Netfilter testsuite - will run it against the current fips-compliant8 branch and then against the same branch with this PR. Checking for no new errors. If something passes that didn't used to then that's great and I will note it in the PR conversation.

@PlaidCat This PR is ready for review. I'll be configuring and running the netfilter tests in parallel and record the results here.

@gvrose8192
Copy link
Author

Oh - totally forgot about this included patch: 5647beb

So that adds an additional CVE fixed by this PR - CVE-2024-39502

So we have 4 total CVEs addressed by this PR, not 3.

@gvrose8192
Copy link
Author

nft-test-results.log

Sample results from an nftables testsuite run on my dev system running Rocky 9.4. This was a sanity check to make sure I could actually build and install the nftables testsuite available here: https://git.netfilter.org/nftables/

Next step is collect the results from a run with the currently available fips8-compliant kernel and compare to the results when I run the kernel built from this PR.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: previous version invalid do to wrong reference

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: previous version invalid do to wrong reference

@gvrose8192
Copy link
Author

"in nf_tables_commit under case NFT_MSG_NEWSETELEM: also uses nft_setelem_remove
https://github.com/ctrliq/kernel-src-tree/blob/centos_kernel-4.18.0-534.el8/net/netfilter/nf_tables_api.c#L8341

Same as above in __nf_tables_abort NFT_MSG_NEWSETELEM
https://github.com/ctrliq/kernel-src-tree/blob/centos_kernel-4.18.0-534.el8/net/netfilter/nf_tables_api.c#L8547"

Acked - requires more investigation.

@gvrose8192 gvrose8192 force-pushed the gvrose_fips-legacy-8-compliant/4.18.0-425.13.1 branch from 35ee0b2 to c7185b5 Compare October 26, 2024 18:18
@gvrose8192
Copy link
Author

Closing this pull request - will post an updated PR

@gvrose8192 gvrose8192 closed this Oct 28, 2024
@gvrose8192 gvrose8192 reopened this Oct 28, 2024
@gvrose8192 gvrose8192 force-pushed the gvrose_fips-legacy-8-compliant/4.18.0-425.13.1 branch 2 times, most recently from e82f39b to 240a26d Compare October 28, 2024 20:29
@gvrose8192
Copy link
Author

All kernel selftests continue to show no new errors or consistent discrepancies between the base version and with this patch series.
I am running the nftables testing run-tests.sh in continuous loop with valgrind. No memory leaks detected so far after hundreds of loops. The logs are too big to store but I ran a single loops manually and got the following results:

test-results.log
valgrind-results.log

I'll resume valgrind checking of the nftables nfct checks for an overnight run, make sure no long term (within a day) damage is found and to increase confidence in the PR.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout 240a26d

it should be jira VULN-835

@gvrose8192
Copy link
Author

For netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout 240a26d

it should be jira VULN-835

Good catch - I wondered about pulling that in with a different jira and meant to ask you but then got distracted by other work. I'll fix that up.

@gvrose8192
Copy link
Author

"in nf_tables_commit under case NFT_MSG_NEWSETELEM: also uses nft_setelem_remove https://github.com/ctrliq/kernel-src-tree/blob/centos_kernel-4.18.0-534.el8/net/netfilter/nf_tables_api.c#L8341

Same as above in __nf_tables_abort NFT_MSG_NEWSETELEM https://github.com/ctrliq/kernel-src-tree/blob/centos_kernel-4.18.0-534.el8/net/netfilter/nf_tables_api.c#L8547"

Acked - requires more investigation.

OK, yes. Found and fixed - just missed it in an otherwise large commit. Fix incoming with next branch force push.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsystem-sync netfilter:nf_tables 4.18.0-553
These should be:
subsystem-sync netfilter:nf_tables 4.18.0-534

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netfilter: nf_tables: fix table flag updates 95b04bb

Is the upstream-diff here just contextual information in the fuzz?

github-actions bot pushed a commit that referenced this pull request Jul 5, 2025
JIRA: https://issues.redhat.com/browse/RHEL-93414

commit adf2de5
Author: Pawan Gupta <[email protected]>
Date:   Tue, 11 Mar 2025 08:02:52 -0700

    x86/cpu: Update x86_match_cpu() to also use cpu-type

    Non-hybrid CPU variants that share the same Family/Model could be
    differentiated by their cpu-type. x86_match_cpu() currently does not use
    cpu-type for CPU matching.

    Dave Hansen suggested to use below conditions to match CPU-type:

      1. If CPU_TYPE_ANY (the wildcard), then matched
      2. If hybrid, then matched
      3. If !hybrid, look at the boot CPU and compare the cpu-type to determine
         if it is a match.

      This special case for hybrid systems allows more compact vulnerability
      list.  Imagine that "Haswell" CPUs might or might not be hybrid and that
      only Atom cores are vulnerable to Meltdown.  That means there are three
      possibilities:

            1. P-core only
            2. Atom only
            3. Atom + P-core (aka. hybrid)

      One might be tempted to code up the vulnerability list like this:

            MATCH(     HASWELL, X86_FEATURE_HYBRID, MELTDOWN)
            MATCH_TYPE(HASWELL, ATOM,               MELTDOWN)

      Logically, this matches #2 and #3. But that's a little silly. You would
      only ask for the "ATOM" match in cases where there *WERE* hybrid cores in
      play. You shouldn't have to _also_ ask for hybrid cores explicitly.

      In short, assume that processors that enumerate Hybrid==1 have a
      vulnerable core type.

    Update x86_match_cpu() to also match cpu-type. Also treat hybrid systems as
    special, and match them to any cpu-type.

    Suggested-by: Dave Hansen <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Signed-off-by: Ingo Molnar <[email protected]>
    Acked-by: Dave Hansen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]

Signed-off-by: Waiman Long <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 5, 2025
When I run the NVME over TCP test in virtme-ng, I get the following
"suspicious RCU usage" warning in nvme_mpath_add_sysfs_link():

'''
[    5.024557][   T44] nvmet: Created nvm controller 1 for subsystem nqn.2025-06.org.nvmexpress.mptcp for NQN nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77.
[    5.027401][  T183] nvme nvme0: creating 2 I/O queues.
[    5.029017][  T183] nvme nvme0: mapped 2/0/0 default/read/poll queues.
[    5.032587][  T183] nvme nvme0: new ctrl: NQN "nqn.2025-06.org.nvmexpress.mptcp", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77
[    5.042214][   T25]
[    5.042440][   T25] =============================
[    5.042579][   T25] WARNING: suspicious RCU usage
[    5.042705][   T25] 6.16.0-rc3+ #23 Not tainted
[    5.042812][   T25] -----------------------------
[    5.042934][   T25] drivers/nvme/host/multipath.c:1203 RCU-list traversed in non-reader section!!
[    5.043111][   T25]
[    5.043111][   T25] other info that might help us debug this:
[    5.043111][   T25]
[    5.043341][   T25]
[    5.043341][   T25] rcu_scheduler_active = 2, debug_locks = 1
[    5.043502][   T25] 3 locks held by kworker/u9:0/25:
[    5.043615][   T25]  #0: ffff888008730948 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x7ed/0x1350
[    5.043830][   T25]  #1: ffffc900001afd40 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0xcf3/0x1350
[    5.044084][   T25]  #2: ffff888013ee0020 (&head->srcu){.+.+}-{0:0}, at: nvme_mpath_add_sysfs_link.part.0+0xb4/0x3a0
[    5.044300][   T25]
[    5.044300][   T25] stack backtrace:
[    5.044439][   T25] CPU: 0 UID: 0 PID: 25 Comm: kworker/u9:0 Not tainted 6.16.0-rc3+ #23 PREEMPT(full)
[    5.044441][   T25] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    5.044442][   T25] Workqueue: async async_run_entry_fn
[    5.044445][   T25] Call Trace:
[    5.044446][   T25]  <TASK>
[    5.044449][   T25]  dump_stack_lvl+0x6f/0xb0
[    5.044453][   T25]  lockdep_rcu_suspicious.cold+0x4f/0xb1
[    5.044457][   T25]  nvme_mpath_add_sysfs_link.part.0+0x2fb/0x3a0
[    5.044459][   T25]  ? queue_work_on+0x90/0xf0
[    5.044461][   T25]  ? lockdep_hardirqs_on+0x78/0x110
[    5.044466][   T25]  nvme_mpath_set_live+0x1e9/0x4f0
[    5.044470][   T25]  nvme_mpath_add_disk+0x240/0x2f0
[    5.044472][   T25]  ? __pfx_nvme_mpath_add_disk+0x10/0x10
[    5.044475][   T25]  ? add_disk_fwnode+0x361/0x580
[    5.044480][   T25]  nvme_alloc_ns+0x81c/0x17c0
[    5.044483][   T25]  ? kasan_quarantine_put+0x104/0x240
[    5.044487][   T25]  ? __pfx_nvme_alloc_ns+0x10/0x10
[    5.044495][   T25]  ? __pfx_nvme_find_get_ns+0x10/0x10
[    5.044496][   T25]  ? rcu_read_lock_any_held+0x45/0xa0
[    5.044498][   T25]  ? validate_chain+0x232/0x4f0
[    5.044503][   T25]  nvme_scan_ns+0x4c8/0x810
[    5.044506][   T25]  ? __pfx_nvme_scan_ns+0x10/0x10
[    5.044508][   T25]  ? find_held_lock+0x2b/0x80
[    5.044512][   T25]  ? ktime_get+0x16d/0x220
[    5.044517][   T25]  ? kvm_clock_get_cycles+0x18/0x30
[    5.044520][   T25]  ? __pfx_nvme_scan_ns_async+0x10/0x10
[    5.044522][   T25]  async_run_entry_fn+0x97/0x560
[    5.044523][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044526][   T25]  process_one_work+0xd3c/0x1350
[    5.044532][   T25]  ? __pfx_process_one_work+0x10/0x10
[    5.044536][   T25]  ? assign_work+0x16c/0x240
[    5.044539][   T25]  worker_thread+0x4da/0xd50
[    5.044545][   T25]  ? __pfx_worker_thread+0x10/0x10
[    5.044546][   T25]  kthread+0x356/0x5c0
[    5.044548][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044549][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044552][   T25]  ? __lock_release.isra.0+0x5d/0x180
[    5.044553][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044555][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044557][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044559][   T25]  ret_from_fork+0x218/0x2e0
[    5.044561][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044562][   T25]  ret_from_fork_asm+0x1a/0x30
[    5.044570][   T25]  </TASK>
'''

This patch uses sleepable RCU version of helper list_for_each_entry_srcu()
instead of list_for_each_entry_rcu() to fix it.

Fixes: 4dbd2b2 ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nilay Shroff <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 7, 2025
[ Upstream commit 387602d ]

Don't set WDM_READ flag in wdm_in_callback() for ZLP-s, otherwise when
userspace tries to poll for available data, it might - incorrectly -
believe there is something available, and when it tries to non-blocking
read it, it might get stuck in the read loop.

For example this is what glib does for non-blocking read (briefly):

  1. poll()
  2. if poll returns with non-zero, starts a read data loop:
    a. loop on poll() (EINTR disabled)
    b. if revents was set, reads data
      I. if read returns with EINTR or EAGAIN, goto 2.a.
      II. otherwise return with data

So if ZLP sets WDM_READ (#1), we expect data, and try to read it (#2).
But as that was a ZLP, and we are doing non-blocking read, wdm_read()
returns with EAGAIN (#2.b.I), so loop again, and try to read again
(#2.a.).

With glib, we might stuck in this loop forever, as EINTR is disabled
(#2.a).

Signed-off-by: Robert Hodaszi <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 7, 2025
[ Upstream commit 711741f ]

Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order
and prevent the following deadlock from happening

======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc3-build2+ #1301 Tainted: G S      W
------------------------------------------------------
cifsd/6055 is trying to acquire lock:
ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200

but task is already holding lock:
ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&ret_buf->chan_lock){+.+.}-{3:3}:
       validate_chain+0x1cf/0x270
       __lock_acquire+0x60e/0x780
       lock_acquire.part.0+0xb4/0x1f0
       _raw_spin_lock+0x2f/0x40
       cifs_setup_session+0x81/0x4b0
       cifs_get_smb_ses+0x771/0x900
       cifs_mount_get_session+0x7e/0x170
       cifs_mount+0x92/0x2d0
       cifs_smb3_do_mount+0x161/0x460
       smb3_get_tree+0x55/0x90
       vfs_get_tree+0x46/0x180
       do_new_mount+0x1b0/0x2e0
       path_mount+0x6ee/0x740
       do_mount+0x98/0xe0
       __do_sys_mount+0x148/0x180
       do_syscall_64+0xa4/0x260
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&ret_buf->ses_lock){+.+.}-{3:3}:
       validate_chain+0x1cf/0x270
       __lock_acquire+0x60e/0x780
       lock_acquire.part.0+0xb4/0x1f0
       _raw_spin_lock+0x2f/0x40
       cifs_match_super+0x101/0x320
       sget+0xab/0x270
       cifs_smb3_do_mount+0x1e0/0x460
       smb3_get_tree+0x55/0x90
       vfs_get_tree+0x46/0x180
       do_new_mount+0x1b0/0x2e0
       path_mount+0x6ee/0x740
       do_mount+0x98/0xe0
       __do_sys_mount+0x148/0x180
       do_syscall_64+0xa4/0x260
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}:
       check_noncircular+0x95/0xc0
       check_prev_add+0x115/0x2f0
       validate_chain+0x1cf/0x270
       __lock_acquire+0x60e/0x780
       lock_acquire.part.0+0xb4/0x1f0
       _raw_spin_lock+0x2f/0x40
       cifs_signal_cifsd_for_reconnect+0x134/0x200
       __cifs_reconnect+0x8f/0x500
       cifs_handle_standard+0x112/0x280
       cifs_demultiplex_thread+0x64d/0xbc0
       kthread+0x2f7/0x310
       ret_from_fork+0x2a/0x230
       ret_from_fork_asm+0x1a/0x30

other info that might help us debug this:

Chain exists of:
  &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ret_buf->chan_lock);
                               lock(&ret_buf->ses_lock);
                               lock(&ret_buf->chan_lock);
  lock(&tcp_ses->srv_lock);

 *** DEADLOCK ***

3 locks held by cifsd/6055:
 #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200
 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200
 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200

Cc: [email protected]
Reported-by: David Howells <[email protected]>
Fixes: d7d7a66 ("cifs: avoid use of global locks for high contention data")
Reviewed-by: David Howells <[email protected]>
Tested-by: David Howells <[email protected]>
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 8, 2025
JIRA: https://issues.redhat.com/browse/RHEL-74410

commit c5b6aba
Author: Maxim Levitsky <[email protected]>
Date:   Mon May 12 14:04:02 2025 -0400

    locking/mutex: implement mutex_trylock_nested

    Despite the fact that several lockdep-related checks are skipped when
    calling trylock* versions of the locking primitives, for example
    mutex_trylock, each time the mutex is acquired, a held_lock is still
    placed onto the lockdep stack by __lock_acquire() which is called
    regardless of whether the trylock* or regular locking API was used.

    This means that if the caller successfully acquires more than
    MAX_LOCK_DEPTH locks of the same class, even when using mutex_trylock,
    lockdep will still complain that the maximum depth of the held lock stack
    has been reached and disable itself.

    For example, the following error currently occurs in the ARM version
    of KVM, once the code tries to lock all vCPUs of a VM configured with more
    than MAX_LOCK_DEPTH vCPUs, a situation that can easily happen on modern
    systems, where having more than 48 CPUs is common, and it's also common to
    run VMs that have vCPU counts approaching that number:

    [  328.171264] BUG: MAX_LOCK_DEPTH too low!
    [  328.175227] turning off the locking correctness validator.
    [  328.180726] Please attach the output of /proc/lock_stat to the bug report
    [  328.187531] depth: 48  max: 48!
    [  328.190678] 48 locks held by qemu-kvm/11664:
    [  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
    [  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0

    Luckily, in all instances that require locking all vCPUs, the
    'kvm->lock' is taken a priori, and that fact makes it possible to use
    the little known feature of lockdep, called a 'nest_lock', to avoid this
    warning and subsequent lockdep self-disablement.

    The action of 'nested lock' being provided to lockdep's lock_acquire(),
    causes the lockdep to detect that the top of the held lock stack contains
    a lock of the same class and then increment its reference counter instead
    of pushing a new held_lock item onto that stack.

    See __lock_acquire for more information.

    Signed-off-by: Maxim Levitsky <[email protected]>
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 8, 2025
JIRA: https://issues.redhat.com/browse/RHEL-74410

commit b586c5d
Author: Maxim Levitsky <[email protected]>
Date:   Mon May 12 14:04:06 2025 -0400

    KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs

    Use kvm_trylock_all_vcpus instead of a custom implementation when locking
    all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in
    which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs.

    This fixes the following false lockdep warning:

    [  328.171264] BUG: MAX_LOCK_DEPTH too low!
    [  328.175227] turning off the locking correctness validator.
    [  328.180726] Please attach the output of /proc/lock_stat to the bug report
    [  328.187531] depth: 48  max: 48!
    [  328.190678] 48 locks held by qemu-kvm/11664:
    [  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
    [  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0

    Suggested-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Maxim Levitsky <[email protected]>
    Acked-by: Marc Zyngier <[email protected]>
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 9, 2025
JIRA: https://issues.redhat.com/browse/RHEL-78678
CVE: CVE-2025-21693

commit c11bcbc
Author: Yosry Ahmed <[email protected]>
Date:   Wed Feb 26 18:56:25 2025 +0000

    mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()

    Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
    the per-CPU acomp_ctx mutex.  crypto_free_acomp() then holds scomp_lock
    (through crypto_exit_scomp_ops_async()).

    On the other hand, crypto_alloc_acomp_node() holds the scomp_lock (through
    crypto_scomp_init_tfm()), and then allocates memory.  If the allocation
    results in reclaim, we may attempt to hold the per-CPU acomp_ctx mutex.

    The above dependencies can cause an ABBA deadlock.  For example in the
    following scenario:

    (1) Task A running on CPU #1:
        crypto_alloc_acomp_node()
          Holds scomp_lock
          Enters reclaim
          Reads per_cpu_ptr(pool->acomp_ctx, 1)

    (2) Task A is descheduled

    (3) CPU #1 goes offline
        zswap_cpu_comp_dead(CPU #1)
          Holds per_cpu_ptr(pool->acomp_ctx, 1))
          Calls crypto_free_acomp()
          Waits for scomp_lock

    (4) Task A running on CPU #2:
          Waits for per_cpu_ptr(pool->acomp_ctx, 1) // Read on CPU #1
          DEADLOCK

    Since there is no requirement to call crypto_free_acomp() with the per-CPU
    acomp_ctx mutex held in zswap_cpu_comp_dead(), move it after the mutex is
    unlocked.  Also move the acomp_request_free() and kfree() calls for
    consistency and to avoid any potential sublte locking dependencies in the
    future.

    With this, only setting acomp_ctx fields to NULL occurs with the mutex
    held.  This is similar to how zswap_cpu_comp_prepare() only initializes
    acomp_ctx fields with the mutex held, after performing all allocations
    before holding the mutex.

    Opportunistically, move the NULL check on acomp_ctx so that it takes place
    before the mutex dereference.

    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 12dcb0e ("mm: zswap: properly synchronize freeing resources during CPU hotunplug")
    Signed-off-by: Herbert Xu <[email protected]>
    Co-developed-by: Herbert Xu <[email protected]>
    Signed-off-by: Yosry Ahmed <[email protected]>
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/all/[email protected]/
    Acked-by: Herbert Xu <[email protected]>
    Reviewed-by: Chengming Zhou <[email protected]>
    Reviewed-by: Nhat Pham <[email protected]>
    Tested-by: Nhat Pham <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Eric Biggers <[email protected]>
    Cc: Johannes Weiner <[email protected]>
    Cc: Chris Murphy <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

Signed-off-by: Rafael Aquini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 9, 2025
JIRA: https://issues.redhat.com/browse/RHEL-85486

commit 58f038e
Author: Hou Tao <[email protected]>
Date:   Fri Jan 17 18:18:15 2025 +0800

    bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT

    During the update procedure, when overwrite element in a pre-allocated
    htab, the freeing of old_element is protected by the bucket lock. The
    reason why the bucket lock is necessary is that the old_element has
    already been stashed in htab->extra_elems after alloc_htab_elem()
    returns. If freeing the old_element after the bucket lock is unlocked,
    the stashed element may be reused by concurrent update procedure and the
    freeing of old_element will run concurrently with the reuse of the
    old_element. However, the invocation of check_and_free_fields() may
    acquire a spin-lock which violates the lockdep rule because its caller
    has already held a raw-spin-lock (bucket lock). The following warning
    will be reported when such race happens:

      BUG: scheduling while atomic: test_progs/676/0x00000003
      3 locks held by test_progs/676:
      #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
      #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500
      #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0
      Modules linked in: bpf_testmod(O)
      Preemption disabled at:
      [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500
      CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11
      Tainted: [W]=WARN, [O]=OOT_MODULE
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...
      Call Trace:
      <TASK>
      dump_stack_lvl+0x57/0x70
      dump_stack+0x10/0x20
      __schedule_bug+0x120/0x170
      __schedule+0x300c/0x4800
      schedule_rtlock+0x37/0x60
      rtlock_slowlock_locked+0x6d9/0x54c0
      rt_spin_lock+0x168/0x230
      hrtimer_cancel_wait_running+0xe9/0x1b0
      hrtimer_cancel+0x24/0x30
      bpf_timer_delete_work+0x1d/0x40
      bpf_timer_cancel_and_free+0x5e/0x80
      bpf_obj_free_fields+0x262/0x4a0
      check_and_free_fields+0x1d0/0x280
      htab_map_update_elem+0x7fc/0x1500
      bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43
      bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e
      bpf_prog_test_run_syscall+0x322/0x830
      __sys_bpf+0x135d/0x3ca0
      __x64_sys_bpf+0x75/0xb0
      x64_sys_call+0x1b5/0xa10
      do_syscall_64+0x3b/0xc0
      entry_SYSCALL_64_after_hwframe+0x4b/0x53
      ...
      </TASK>

    It seems feasible to break the reuse and refill of per-cpu extra_elems
    into two independent parts: reuse the per-cpu extra_elems with bucket
    lock being held and refill the old_element as per-cpu extra_elems after
    the bucket lock is unlocked. However, it will make the concurrent
    overwrite procedures on the same CPU return unexpected -E2BIG error when
    the map is full.

    Therefore, the patch fixes the lock problem by breaking the cancelling
    of bpf_timer into two steps for PREEMPT_RT:
    1) use hrtimer_try_to_cancel() and check its return value
    2) if the timer is running, use hrtimer_cancel() through a kworker to
       cancel it again
    Considering that the current implementation of hrtimer_cancel() will try
    to acquire a being held softirq_expiry_lock when the current timer is
    running, these steps above are reasonable. However, it also has
    downside. When the timer is running, the cancelling of the timer is
    delayed when releasing the last map uref. The delay is also fixable
    (e.g., break the cancelling of bpf timer into two parts: one part in
    locked scope, another one in unlocked scope), it can be revised later if
    necessary.

    It is a bit hard to decide the right fix tag. One reason is that the
    problem depends on PREEMPT_RT which is enabled in v6.12. Considering the
    softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced
    in v5.15, the bpf_timer commit is used in the fixes tag and an extra
    depends-on tag is added to state the dependency on PREEMPT_RT.

    Fixes: b00628b ("bpf: Introduce bpf timers.")
    Depends-on: v6.12+ with PREEMPT_RT enabled
    Reported-by: Sebastian Andrzej Siewior <[email protected]>
    Closes: https://lore.kernel.org/bpf/[email protected]
    Signed-off-by: Hou Tao <[email protected]>
    Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>

Signed-off-by: Jerome Marchand <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 11, 2025
…-flight

Reject migration of SEV{-ES} state if either the source or destination VM
is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the
section between incrementing created_vcpus and online_vcpus.  The bulk of
vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs
in parallel, and so sev_info.es_active can get toggled from false=>true in
the destination VM after (or during) svm_vcpu_create(), resulting in an
SEV{-ES} VM effectively having a non-SEV{-ES} vCPU.

The issue manifests most visibly as a crash when trying to free a vCPU's
NULL VMSA page in an SEV-ES VM, but any number of things can go wrong.

  BUG: unable to handle page fault for address: ffffebde00000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] SMP KASAN NOPTI
  CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G     U     O        6.15.0-smp-DEV #2 NONE
  Tainted: [U]=USER, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
  RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline]
  RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline]
  RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline]
  RIP: 0010:PageHead include/linux/page-flags.h:866 [inline]
  RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067
  Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0
  RSP: 0018:ffff8984551978d0 EFLAGS: 00010246
  RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000
  RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000
  R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000
  R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000
  FS:  0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0
  DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169
   svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515
   kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396
   kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline]
   kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490
   kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895
   kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310
   kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369
   __fput+0x3e4/0x9e0 fs/file_table.c:465
   task_work_run+0x1a9/0x220 kernel/task_work.c:227
   exit_task_work include/linux/task_work.h:40 [inline]
   do_exit+0x7f0/0x25b0 kernel/exit.c:953
   do_group_exit+0x203/0x2d0 kernel/exit.c:1102
   get_signal+0x1357/0x1480 kernel/signal.c:3034
   arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218
   do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f87a898e969
   </TASK>
  Modules linked in: gq(O)
  gsmi: Log Shutdown Reason 0x03
  CR2: ffffebde00000000
  ---[ end trace 0000000000000000 ]---

Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing
the host is likely desirable due to the VMSA being consumed by hardware.
E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a
bogus VMSA page.  Accessing PFN 0 is "fine"-ish now that it's sequestered
away thanks to L1TF, but panicking in this scenario is preferable to
potentially running with corrupted state.

Reported-by: Alexander Potapenko <[email protected]>
Tested-by: Alexander Potapenko <[email protected]>
Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration")
Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration")
Cc: [email protected]
Cc: James Houghton <[email protected]>
Cc: Peter Gonda <[email protected]>
Reviewed-by: Liam Merwick <[email protected]>
Tested-by: Liam Merwick <[email protected]>
Reviewed-by: James Houghton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 11, 2025
[ Upstream commit b2beb5b ]

With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated
on module load:

[  324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
[  324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager
[  324.701689] preempt_count: 201, expected: 0
[  324.701693] RCU nest depth: 0, expected: 0
[  324.701697] 2 locks held by NetworkManager/1582:
[  324.701702]  #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0
[  324.701730]  #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870
[  324.701749] Preemption disabled at:
[  324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870
[  324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary)
[  324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
[  324.701774] Call Trace:
[  324.701777]  <TASK>
[  324.701779]  dump_stack_lvl+0x5d/0x80
[  324.701788]  ? __dev_open+0x3dd/0x870
[  324.701793]  __might_resched.cold+0x1ef/0x23d
<..>
[  324.701818]  __mutex_lock+0x113/0x1b80
<..>
[  324.701917]  idpf_ctlq_clean_sq+0xad/0x4b0 [idpf]
[  324.701935]  ? kasan_save_track+0x14/0x30
[  324.701941]  idpf_mb_clean+0x143/0x380 [idpf]
<..>
[  324.701991]  idpf_send_mb_msg+0x111/0x720 [idpf]
[  324.702009]  idpf_vc_xn_exec+0x4cc/0x990 [idpf]
[  324.702021]  ? rcu_is_watching+0x12/0xc0
[  324.702035]  idpf_add_del_mac_filters+0x3ed/0xb50 [idpf]
<..>
[  324.702122]  __hw_addr_sync_dev+0x1cf/0x300
[  324.702126]  ? find_held_lock+0x32/0x90
[  324.702134]  idpf_set_rx_mode+0x317/0x390 [idpf]
[  324.702152]  __dev_open+0x3f8/0x870
[  324.702159]  ? __pfx___dev_open+0x10/0x10
[  324.702174]  __dev_change_flags+0x443/0x650
<..>
[  324.702208]  netif_change_flags+0x80/0x160
[  324.702218]  do_setlink.isra.0+0x16a0/0x3960
<..>
[  324.702349]  rtnl_newlink+0x12fd/0x21e0

The sequence is as follows:
	rtnl_newlink()->
	__dev_change_flags()->
	__dev_open()->
	dev_set_rx_mode() - >  # disables BH and grabs "dev->addr_list_lock"
	idpf_set_rx_mode() ->  # proceed only if VIRTCHNL2_CAP_MACFILTER is ON
	__dev_uc_sync() ->
	idpf_add_mac_filter ->
	idpf_add_del_mac_filters ->
	idpf_send_mb_msg() ->
	idpf_mb_clean() ->
	idpf_ctlq_clean_sq()   # mutex_lock(cq_lock)

Fix by converting cq_lock to a spinlock. All operations under the new
lock are safe except freeing the DMA memory, which may use vunmap(). Fix
by requesting a contiguous physical memory for the DMA mapping.

Fixes: a251eee ("idpf: add SRIOV support and other ndo_ops")
Reviewed-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Samuel Salin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 11, 2025
[ Upstream commit 2ed25aa ]

The issue arises when kzalloc() is invoked while holding umem_mutex or
any other lock acquired under umem_mutex. This is problematic because
kzalloc() can trigger fs_reclaim_aqcuire(), which may, in turn, invoke
mmu_notifier_invalidate_range_start(). This function can lead to
mlx5_ib_invalidate_range(), which attempts to acquire umem_mutex again,
resulting in a deadlock.

The problematic flow:
             CPU0                      |              CPU1
---------------------------------------|------------------------------------------------
mlx5_ib_dereg_mr()                     |
 → revoke_mr()                         |
   → mutex_lock(&umem_odp->umem_mutex) |
                                       | mlx5_mkey_cache_init()
                                       |  → mutex_lock(&dev->cache.rb_lock)
                                       |  → mlx5r_cache_create_ent_locked()
                                       |    → kzalloc(GFP_KERNEL)
                                       |      → fs_reclaim()
                                       |        → mmu_notifier_invalidate_range_start()
                                       |          → mlx5_ib_invalidate_range()
                                       |            → mutex_lock(&umem_odp->umem_mutex)
   → cache_ent_find_and_store()        |
     → mutex_lock(&dev->cache.rb_lock) |

Additionally, when kzalloc() is called from within
cache_ent_find_and_store(), we encounter the same deadlock due to
re-acquisition of umem_mutex.

Solve by releasing umem_mutex in dereg_mr() after umr_revoke_mr()
and before acquiring rb_lock. This ensures that we don't hold
umem_mutex while performing memory allocations that could trigger
the reclaim path.

This change prevents the deadlock by ensuring proper lock ordering and
avoiding holding locks during memory allocation operations that could
trigger the reclaim path.

The following lockdep warning demonstrates the deadlock:

 python3/20557 is trying to acquire lock:
 ffff888387542128 (&umem_odp->umem_mutex){+.+.}-{4:4}, at:
 mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib]

 but task is already holding lock:
 ffffffff82f6b840 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at:
 unmap_vmas+0x7b/0x1a0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
       fs_reclaim_acquire+0x60/0xd0
       mem_cgroup_css_alloc+0x6f/0x9b0
       cgroup_init_subsys+0xa4/0x240
       cgroup_init+0x1c8/0x510
       start_kernel+0x747/0x760
       x86_64_start_reservations+0x25/0x30
       x86_64_start_kernel+0x73/0x80
       common_startup_64+0x129/0x138

 -> #2 (fs_reclaim){+.+.}-{0:0}:
       fs_reclaim_acquire+0x91/0xd0
       __kmalloc_cache_noprof+0x4d/0x4c0
       mlx5r_cache_create_ent_locked+0x75/0x620 [mlx5_ib]
       mlx5_mkey_cache_init+0x186/0x360 [mlx5_ib]
       mlx5_ib_stage_post_ib_reg_umr_init+0x3c/0x60 [mlx5_ib]
       __mlx5_ib_add+0x4b/0x190 [mlx5_ib]
       mlx5r_probe+0xd9/0x320 [mlx5_ib]
       auxiliary_bus_probe+0x42/0x70
       really_probe+0xdb/0x360
       __driver_probe_device+0x8f/0x130
       driver_probe_device+0x1f/0xb0
       __driver_attach+0xd4/0x1f0
       bus_for_each_dev+0x79/0xd0
       bus_add_driver+0xf0/0x200
       driver_register+0x6e/0xc0
       __auxiliary_driver_register+0x6a/0xc0
       do_one_initcall+0x5e/0x390
       do_init_module+0x88/0x240
       init_module_from_file+0x85/0xc0
       idempotent_init_module+0x104/0x300
       __x64_sys_finit_module+0x68/0xc0
       do_syscall_64+0x6d/0x140
       entry_SYSCALL_64_after_hwframe+0x4b/0x53

 -> #1 (&dev->cache.rb_lock){+.+.}-{4:4}:
       __mutex_lock+0x98/0xf10
       __mlx5_ib_dereg_mr+0x6f2/0x890 [mlx5_ib]
       mlx5_ib_dereg_mr+0x21/0x110 [mlx5_ib]
       ib_dereg_mr_user+0x85/0x1f0 [ib_core]
       uverbs_free_mr+0x19/0x30 [ib_uverbs]
       destroy_hw_idr_uobject+0x21/0x80 [ib_uverbs]
       uverbs_destroy_uobject+0x60/0x3d0 [ib_uverbs]
       uobj_destroy+0x57/0xa0 [ib_uverbs]
       ib_uverbs_cmd_verbs+0x4d5/0x1210 [ib_uverbs]
       ib_uverbs_ioctl+0x129/0x230 [ib_uverbs]
       __x64_sys_ioctl+0x596/0xaa0
       do_syscall_64+0x6d/0x140
       entry_SYSCALL_64_after_hwframe+0x4b/0x53

 -> #0 (&umem_odp->umem_mutex){+.+.}-{4:4}:
       __lock_acquire+0x1826/0x2f00
       lock_acquire+0xd3/0x2e0
       __mutex_lock+0x98/0xf10
       mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib]
       __mmu_notifier_invalidate_range_start+0x18e/0x1f0
       unmap_vmas+0x182/0x1a0
       exit_mmap+0xf3/0x4a0
       mmput+0x3a/0x100
       do_exit+0x2b9/0xa90
       do_group_exit+0x32/0xa0
       get_signal+0xc32/0xcb0
       arch_do_signal_or_restart+0x29/0x1d0
       syscall_exit_to_user_mode+0x105/0x1d0
       do_syscall_64+0x79/0x140
       entry_SYSCALL_64_after_hwframe+0x4b/0x53

 Chain exists of:
 &dev->cache.rb_lock --> mmu_notifier_invalidate_range_start -->
 &umem_odp->umem_mutex

 Possible unsafe locking scenario:

       CPU0                        CPU1
       ----                        ----
   lock(&umem_odp->umem_mutex);
                                lock(mmu_notifier_invalidate_range_start);
                                lock(&umem_odp->umem_mutex);
   lock(&dev->cache.rb_lock);

 *** DEADLOCK ***

Fixes: abb604a ("RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error")
Signed-off-by: Or Har-Toov <[email protected]>
Reviewed-by: Michael Guralnik <[email protected]>
Link: https://patch.msgid.link/3c8f225a8a9fade647d19b014df1172544643e4a.1750061612.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
jainanmol84 added a commit that referenced this pull request Jul 11, 2025
jira VULN-71367
cve CVE-2022-50066
commit-author Chia-Lin Kao (AceLan) <[email protected]>
commit 2ba5e47

upstream-diff There was a merge conflict due to the CentOS 7 tree being old but it was minor.
The final update statement of the for loop exceeds the array range, the dereference of self->aq_vec[i] is not checked and then leads to the index out of range error.
Also fixed this kind of coding style in other for loop.

[   97.937604] UBSAN: array-index-out-of-bounds in drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1404:48
[   97.937607] index 8 is out of range for type 'aq_vec_s *[8]'
[   97.937608] CPU: 38 PID: 3767 Comm: kworker/u256:18 Not tainted 5.19.0+ #2
[   97.937610] Hardware name: Dell Inc. Precision 7865 Tower/, BIOS 1.0.0 06/12/2022
[   97.937611] Workqueue: events_unbound async_run_entry_fn
[   97.937616] Call Trace:
[   97.937617]  <TASK>
[   97.937619]  dump_stack_lvl+0x49/0x63
[   97.937624]  dump_stack+0x10/0x16
[   97.937626]  ubsan_epilogue+0x9/0x3f
[   97.937627]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[   97.937629]  ? __scm_send+0x348/0x440
[   97.937632]  ? aq_vec_stop+0x72/0x80 [atlantic]
[   97.937639]  aq_nic_stop+0x1b6/0x1c0 [atlantic]
[   97.937644]  aq_suspend_common+0x88/0x90 [atlantic]
[   97.937648]  aq_pm_suspend_poweroff+0xe/0x20 [atlantic]
[   97.937653]  pci_pm_suspend+0x7e/0x1a0
[   97.937655]  ? pci_pm_suspend_noirq+0x2b0/0x2b0
[   97.937657]  dpm_run_callback+0x54/0x190
[   97.937660]  __device_suspend+0x14c/0x4d0
[   97.937661]  async_suspend+0x23/0x70
[   97.937663]  async_run_entry_fn+0x33/0x120
[   97.937664]  process_one_work+0x21f/0x3f0
[   97.937666]  worker_thread+0x4a/0x3c0
[   97.937668]  ? process_one_work+0x3f0/0x3f0
[   97.937669]  kthread+0xf0/0x120
[   97.937671]  ? kthread_complete_and_exit+0x20/0x20
[   97.937672]  ret_from_fork+0x22/0x30
[   97.937676]  </TASK>

v2. fixed "warning: variable 'aq_vec' set but not used"

v3. simplified a for loop

Fixes: 97bde5c ("net: ethernet: aquantia: Support for NIC-specific code")
	Signed-off-by: Chia-Lin Kao (AceLan) <[email protected]>
	Acked-by: Sudarsana Reddy Kalluru <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 2ba5e47)
	Signed-off-by: Anmol Jain <[email protected]>
	Resolve merge conflicts
        Kept upstream logic and adjusted local error accordingly
github-actions bot pushed a commit that referenced this pull request Jul 15, 2025
JIRA: https://issues.redhat.com/browse/RHEL-80095

commit 3e64bb2
Author: Shradha Gupta <[email protected]>
Date:   Tue Mar 11 03:17:40 2025 -0700

    net: mana: cleanup mana struct after debugfs_remove()

    When on a MANA VM hibernation is triggered, as part of hibernate_snapshot(),
    mana_gd_suspend() and mana_gd_resume() are called. If during this
    mana_gd_resume(), a failure occurs with HWC creation, mana_port_debugfs
    pointer does not get reinitialized and ends up pointing to older,
    cleaned-up dentry.
    Further in the hibernation path, as part of power_down(), mana_gd_shutdown()
    is triggered. This call, unaware of the failures in resume, tries to cleanup
    the already cleaned up  mana_port_debugfs value and hits the following bug:

    [  191.359296] mana 7870:00:00.0: Shutdown was called
    [  191.359918] BUG: kernel NULL pointer dereference, address: 0000000000000098
    [  191.360584] #PF: supervisor write access in kernel mode
    [  191.361125] #PF: error_code(0x0002) - not-present page
    [  191.361727] PGD 1080ea067 P4D 0
    [  191.362172] Oops: Oops: 0002 [#1] SMP NOPTI
    [  191.362606] CPU: 11 UID: 0 PID: 1674 Comm: bash Not tainted 6.14.0-rc5+ #2
    [  191.363292] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
    [  191.364124] RIP: 0010:down_write+0x19/0x50
    [  191.364537] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 de cd ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 16 65 48 8b 05 88 24 4c 6a 48 89 43 08 48 8b 5d
    [  191.365867] RSP: 0000:ff45fbe0c1c037b8 EFLAGS: 00010246
    [  191.366350] RAX: 0000000000000000 RBX: 0000000000000098 RCX: ffffff8100000000
    [  191.366951] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098
    [  191.367600] RBP: ff45fbe0c1c037c0 R08: 0000000000000000 R09: 0000000000000001
    [  191.368225] R10: ff45fbe0d2b01000 R11: 0000000000000008 R12: 0000000000000000
    [  191.368874] R13: 000000000000000b R14: ff43dc27509d67c0 R15: 0000000000000020
    [  191.369549] FS:  00007dbc5001e740(0000) GS:ff43dc663f380000(0000) knlGS:0000000000000000
    [  191.370213] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  191.370830] CR2: 0000000000000098 CR3: 0000000168e8e002 CR4: 0000000000b73ef0
    [  191.371557] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  191.372192] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    [  191.372906] Call Trace:
    [  191.373262]  <TASK>
    [  191.373621]  ? show_regs+0x64/0x70
    [  191.374040]  ? __die+0x24/0x70
    [  191.374468]  ? page_fault_oops+0x290/0x5b0
    [  191.374875]  ? do_user_addr_fault+0x448/0x800
    [  191.375357]  ? exc_page_fault+0x7a/0x160
    [  191.375971]  ? asm_exc_page_fault+0x27/0x30
    [  191.376416]  ? down_write+0x19/0x50
    [  191.376832]  ? down_write+0x12/0x50
    [  191.377232]  simple_recursive_removal+0x4a/0x2a0
    [  191.377679]  ? __pfx_remove_one+0x10/0x10
    [  191.378088]  debugfs_remove+0x44/0x70
    [  191.378530]  mana_detach+0x17c/0x4f0
    [  191.378950]  ? __flush_work+0x1e2/0x3b0
    [  191.379362]  ? __cond_resched+0x1a/0x50
    [  191.379787]  mana_remove+0xf2/0x1a0
    [  191.380193]  mana_gd_shutdown+0x3b/0x70
    [  191.380642]  pci_device_shutdown+0x3a/0x80
    [  191.381063]  device_shutdown+0x13e/0x230
    [  191.381480]  kernel_power_off+0x35/0x80
    [  191.381890]  hibernate+0x3c6/0x470
    [  191.382312]  state_store+0xcb/0xd0
    [  191.382734]  kobj_attr_store+0x12/0x30
    [  191.383211]  sysfs_kf_write+0x3e/0x50
    [  191.383640]  kernfs_fop_write_iter+0x140/0x1d0
    [  191.384106]  vfs_write+0x271/0x440
    [  191.384521]  ksys_write+0x72/0xf0
    [  191.384924]  __x64_sys_write+0x19/0x20
    [  191.385313]  x64_sys_call+0x2b0/0x20b0
    [  191.385736]  do_syscall_64+0x79/0x150
    [  191.386146]  ? __mod_memcg_lruvec_state+0xe7/0x240
    [  191.386676]  ? __lruvec_stat_mod_folio+0x79/0xb0
    [  191.387124]  ? __pfx_lru_add+0x10/0x10
    [  191.387515]  ? queued_spin_unlock+0x9/0x10
    [  191.387937]  ? do_anonymous_page+0x33c/0xa00
    [  191.388374]  ? __handle_mm_fault+0xcf3/0x1210
    [  191.388805]  ? __count_memcg_events+0xbe/0x180
    [  191.389235]  ? handle_mm_fault+0xae/0x300
    [  191.389588]  ? do_user_addr_fault+0x559/0x800
    [  191.390027]  ? irqentry_exit_to_user_mode+0x43/0x230
    [  191.390525]  ? irqentry_exit+0x1d/0x30
    [  191.390879]  ? exc_page_fault+0x86/0x160
    [  191.391235]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  191.391745] RIP: 0033:0x7dbc4ff1c574
    [  191.392111] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
    [  191.393412] RSP: 002b:00007ffd95a23ab8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
    [  191.393990] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007dbc4ff1c574
    [  191.394594] RDX: 0000000000000005 RSI: 00005a6eeadb0ce0 RDI: 0000000000000001
    [  191.395215] RBP: 00007ffd95a23ae0 R08: 00007dbc50003b20 R09: 0000000000000000
    [  191.395805] R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000005
    [  191.396404] R13: 00005a6eeadb0ce0 R14: 00007dbc500045c0 R15: 00007dbc50001ee0
    [  191.396987]  </TASK>

    To fix this, we explicitly set such mana debugfs variables to NULL after
    debugfs_remove() is called.

    Fixes: 6607c17 ("net: mana: Enable debugfs files for MANA device")
    Cc: [email protected]
    Signed-off-by: Shradha Gupta <[email protected]>
    Reviewed-by: Haiyang Zhang <[email protected]>
    Reviewed-by: Michal Kubiak <[email protected]>
    Link: https://patch.msgid.link/1741688260-28922-1-git-send-email-shradhagupta@linux.microsoft.com
    Signed-off-by: Paolo Abeni <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 15, 2025
If "try_verify_in_tasklet" is set for dm-verity, DM_BUFIO_CLIENT_NO_SLEEP
is enabled for dm-bufio. However, when bufio tries to evict buffers, there
is a chance to trigger scheduling in spin_lock_bh, the following warning
is hit:

BUG: sleeping function called from invalid context at drivers/md/dm-bufio.c:2745
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 123, name: kworker/2:2
preempt_count: 201, expected: 0
RCU nest depth: 0, expected: 0
4 locks held by kworker/2:2/123:
 #0: ffff88800a2d1548 ((wq_completion)dm_bufio_cache){....}-{0:0}, at: process_one_work+0xe46/0x1970
 #1: ffffc90000d97d20 ((work_completion)(&dm_bufio_replacement_work)){....}-{0:0}, at: process_one_work+0x763/0x1970
 #2: ffffffff8555b528 (dm_bufio_clients_lock){....}-{3:3}, at: do_global_cleanup+0x1ce/0x710
 #3: ffff88801d5820b8 (&c->spinlock){....}-{2:2}, at: do_global_cleanup+0x2a5/0x710
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 2 UID: 0 PID: 123 Comm: kworker/2:2 Not tainted 6.16.0-rc3-g90548c634bd0 #305 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: dm_bufio_cache do_global_cleanup
Call Trace:
 <TASK>
 dump_stack_lvl+0x53/0x70
 __might_resched+0x360/0x4e0
 do_global_cleanup+0x2f5/0x710
 process_one_work+0x7db/0x1970
 worker_thread+0x518/0xea0
 kthread+0x359/0x690
 ret_from_fork+0xf3/0x1b0
 ret_from_fork_asm+0x1a/0x30
 </TASK>

That can be reproduced by:

  veritysetup format --data-block-size=4096 --hash-block-size=4096 /dev/vda /dev/vdb
  SIZE=$(blockdev --getsz /dev/vda)
  dmsetup create myverity -r --table "0 $SIZE verity 1 /dev/vda /dev/vdb 4096 4096 <data_blocks> 1 sha256 <root_hash> <salt> 1 try_verify_in_tasklet"
  mount /dev/dm-0 /mnt -o ro
  echo 102400 > /sys/module/dm_bufio/parameters/max_cache_size_bytes
  [read files in /mnt]

Cc: [email protected]	# v6.4+
Fixes: 450e8de ("dm bufio: improve concurrent IO performance")
Signed-off-by: Wang Shuai <[email protected]>
Signed-off-by: Sheng Yong <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
jainanmol84 added a commit that referenced this pull request Jul 15, 2025
jira VULN-71367
cve CVE-2022-50066
commit-author Chia-Lin Kao (AceLan) <[email protected]>
commit 2ba5e47
upstream-diff There was a merge conflict due to the CentOS 7 tree being old but it was minor.

The final update statement of the for loop exceeds the array range, the dereference of self->aq_vec[i] is not checked and then leads to the index out of range error.
Also fixed this kind of coding style in other for loop.

[   97.937604] UBSAN: array-index-out-of-bounds in drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1404:48
[   97.937607] index 8 is out of range for type 'aq_vec_s *[8]'
[   97.937608] CPU: 38 PID: 3767 Comm: kworker/u256:18 Not tainted 5.19.0+ #2
[   97.937610] Hardware name: Dell Inc. Precision 7865 Tower/, BIOS 1.0.0 06/12/2022
[   97.937611] Workqueue: events_unbound async_run_entry_fn
[   97.937616] Call Trace:
[   97.937617]  <TASK>
[   97.937619]  dump_stack_lvl+0x49/0x63
[   97.937624]  dump_stack+0x10/0x16
[   97.937626]  ubsan_epilogue+0x9/0x3f
[   97.937627]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[   97.937629]  ? __scm_send+0x348/0x440
[   97.937632]  ? aq_vec_stop+0x72/0x80 [atlantic]
[   97.937639]  aq_nic_stop+0x1b6/0x1c0 [atlantic]
[   97.937644]  aq_suspend_common+0x88/0x90 [atlantic]
[   97.937648]  aq_pm_suspend_poweroff+0xe/0x20 [atlantic]
[   97.937653]  pci_pm_suspend+0x7e/0x1a0
[   97.937655]  ? pci_pm_suspend_noirq+0x2b0/0x2b0
[   97.937657]  dpm_run_callback+0x54/0x190
[   97.937660]  __device_suspend+0x14c/0x4d0
[   97.937661]  async_suspend+0x23/0x70
[   97.937663]  async_run_entry_fn+0x33/0x120
[   97.937664]  process_one_work+0x21f/0x3f0
[   97.937666]  worker_thread+0x4a/0x3c0
[   97.937668]  ? process_one_work+0x3f0/0x3f0
[   97.937669]  kthread+0xf0/0x120
[   97.937671]  ? kthread_complete_and_exit+0x20/0x20
[   97.937672]  ret_from_fork+0x22/0x30
[   97.937676]  </TASK>

v2. fixed "warning: variable 'aq_vec' set but not used"

v3. simplified a for loop

Fixes: 97bde5c ("net: ethernet: aquantia: Support for NIC-specific code")
	Signed-off-by: Chia-Lin Kao (AceLan) <[email protected]>
	Acked-by: Sudarsana Reddy Kalluru <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 2ba5e47)
	Signed-off-by: Anmol Jain <[email protected]>
	Resolve merge conflicts
        Kept upstream logic and adjusted local error accordingly

Update aq_nic.c

Update aq_nic.c
jainanmol84 added a commit that referenced this pull request Jul 15, 2025
jira VULN-71367
cve CVE-2022-50066
commit-author Chia-Lin Kao (AceLan) <[email protected]>
commit 2ba5e47
upstream-diff There was a merge conflict due to the CentOS 7 tree being old but it was minor.

The final update statement of the for loop exceeds the array range, the dereference of self->aq_vec[i] is not checked and then leads to the index out of range error.
Also fixed this kind of coding style in other for loop.

[   97.937604] UBSAN: array-index-out-of-bounds in drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1404:48
[   97.937607] index 8 is out of range for type 'aq_vec_s *[8]'
[   97.937608] CPU: 38 PID: 3767 Comm: kworker/u256:18 Not tainted 5.19.0+ #2
[   97.937610] Hardware name: Dell Inc. Precision 7865 Tower/, BIOS 1.0.0 06/12/2022
[   97.937611] Workqueue: events_unbound async_run_entry_fn
[   97.937616] Call Trace:
[   97.937617]  <TASK>
[   97.937619]  dump_stack_lvl+0x49/0x63
[   97.937624]  dump_stack+0x10/0x16
[   97.937626]  ubsan_epilogue+0x9/0x3f
[   97.937627]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[   97.937629]  ? __scm_send+0x348/0x440
[   97.937632]  ? aq_vec_stop+0x72/0x80 [atlantic]
[   97.937639]  aq_nic_stop+0x1b6/0x1c0 [atlantic]
[   97.937644]  aq_suspend_common+0x88/0x90 [atlantic]
[   97.937648]  aq_pm_suspend_poweroff+0xe/0x20 [atlantic]
[   97.937653]  pci_pm_suspend+0x7e/0x1a0
[   97.937655]  ? pci_pm_suspend_noirq+0x2b0/0x2b0
[   97.937657]  dpm_run_callback+0x54/0x190
[   97.937660]  __device_suspend+0x14c/0x4d0
[   97.937661]  async_suspend+0x23/0x70
[   97.937663]  async_run_entry_fn+0x33/0x120
[   97.937664]  process_one_work+0x21f/0x3f0
[   97.937666]  worker_thread+0x4a/0x3c0
[   97.937668]  ? process_one_work+0x3f0/0x3f0
[   97.937669]  kthread+0xf0/0x120
[   97.937671]  ? kthread_complete_and_exit+0x20/0x20
[   97.937672]  ret_from_fork+0x22/0x30
[   97.937676]  </TASK>

v2. fixed "warning: variable 'aq_vec' set but not used"

v3. simplified a for loop

Fixes: 97bde5c ("net: ethernet: aquantia: Support for NIC-specific code")
	Signed-off-by: Chia-Lin Kao (AceLan) <[email protected]>
	Acked-by: Sudarsana Reddy Kalluru <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 2ba5e47)
	Signed-off-by: Anmol Jain <[email protected]>
	Resolve merge conflicts
        Kept upstream logic and adjusted local error accordingly

Update aq_nic.c

Update aq_nic.c
github-actions bot pushed a commit that referenced this pull request Jul 16, 2025
JIRA: https://issues.redhat.com/browse/RHEL-89169

commit a104042
Author: Remi Pommarel <[email protected]>
Date:   Mon Mar 24 17:28:20 2025 +0100

    wifi: mac80211: Update skb's control block key in ieee80211_tx_dequeue()
    
    The ieee80211 skb control block key (set when skb was queued) could have
    been removed before ieee80211_tx_dequeue() call. ieee80211_tx_dequeue()
    already called ieee80211_tx_h_select_key() to get the current key, but
    the latter do not update the key in skb control block in case it is
    NULL. Because some drivers actually use this key in their TX callbacks
    (e.g. ath1{1,2}k_mac_op_tx()) this could lead to the use after free
    below:
    
      BUG: KASAN: slab-use-after-free in ath11k_mac_op_tx+0x590/0x61c
      Read of size 4 at addr ffffff803083c248 by task kworker/u16:4/1440
    
      CPU: 3 UID: 0 PID: 1440 Comm: kworker/u16:4 Not tainted 6.13.0-ge128f627f404 #2
      Hardware name: HW (DT)
      Workqueue: bat_events batadv_send_outstanding_bcast_packet
      Call trace:
       show_stack+0x14/0x1c (C)
       dump_stack_lvl+0x58/0x74
       print_report+0x164/0x4c0
       kasan_report+0xac/0xe8
       __asan_report_load4_noabort+0x1c/0x24
       ath11k_mac_op_tx+0x590/0x61c
       ieee80211_handle_wake_tx_queue+0x12c/0x1c8
       ieee80211_queue_skb+0xdcc/0x1b4c
       ieee80211_tx+0x1ec/0x2bc
       ieee80211_xmit+0x224/0x324
       __ieee80211_subif_start_xmit+0x85c/0xcf8
       ieee80211_subif_start_xmit+0xc0/0xec4
       dev_hard_start_xmit+0xf4/0x28c
       __dev_queue_xmit+0x6ac/0x318c
       batadv_send_skb_packet+0x38c/0x4b0
       batadv_send_outstanding_bcast_packet+0x110/0x328
       process_one_work+0x578/0xc10
       worker_thread+0x4bc/0xc7c
       kthread+0x2f8/0x380
       ret_from_fork+0x10/0x20
    
      Allocated by task 1906:
       kasan_save_stack+0x28/0x4c
       kasan_save_track+0x1c/0x40
       kasan_save_alloc_info+0x3c/0x4c
       __kasan_kmalloc+0xac/0xb0
       __kmalloc_noprof+0x1b4/0x380
       ieee80211_key_alloc+0x3c/0xb64
       ieee80211_add_key+0x1b4/0x71c
       nl80211_new_key+0x2b4/0x5d8
       genl_family_rcv_msg_doit+0x198/0x240
      <...>
    
      Freed by task 1494:
       kasan_save_stack+0x28/0x4c
       kasan_save_track+0x1c/0x40
       kasan_save_free_info+0x48/0x94
       __kasan_slab_free+0x48/0x60
       kfree+0xc8/0x31c
       kfree_sensitive+0x70/0x80
       ieee80211_key_free_common+0x10c/0x174
       ieee80211_free_keys+0x188/0x46c
       ieee80211_stop_mesh+0x70/0x2cc
       ieee80211_leave_mesh+0x1c/0x60
       cfg80211_leave_mesh+0xe0/0x280
       cfg80211_leave+0x1e0/0x244
      <...>
    
    Reset SKB control block key before calling ieee80211_tx_h_select_key()
    to avoid that.
    
    Fixes: bb42f2d ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue")
    Signed-off-by: Remi Pommarel <[email protected]>
    Link: https://patch.msgid.link/06aa507b853ca385ceded81c18b0a6dd0f081bc8.1742833382.git.repk@triplefau.lt
    Signed-off-by: Johannes Berg <[email protected]>

Signed-off-by: Jose Ignacio Tornos Martinez <[email protected]>
shreeya-patel98 added a commit that referenced this pull request Jul 16, 2025
jira VULN-40133
cve CVE-2024-50126
commit-author Dmitry Antipov <[email protected]>
commit b22db8b
upstream-diff The following commits cause merge conflicts since they are
    not present in the fips-9-compliant/5.14.0-284.30.1 :-
    a54fc09 ("net/sched: taprio: allow user input of per-tc max SDU")
    9dd6ad6 ("net/sched: refactor mqprio qopt reconstruction to a library function")

Fix possible use-after-free in 'taprio_dump()' by adding RCU
read-side critical section there. Never seen on x86 but
found on a KASAN-enabled arm64 system when investigating
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa:

[T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0
[T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862
[T15862]
[T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2
[T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024
[T15862] Call trace:
[T15862]  dump_backtrace+0x20c/0x220
[T15862]  show_stack+0x2c/0x40
[T15862]  dump_stack_lvl+0xf8/0x174
[T15862]  print_report+0x170/0x4d8
[T15862]  kasan_report+0xb8/0x1d4
[T15862]  __asan_report_load4_noabort+0x20/0x2c
[T15862]  taprio_dump+0xa0c/0xbb0
[T15862]  tc_fill_qdisc+0x540/0x1020
[T15862]  qdisc_notify.isra.0+0x330/0x3a0
[T15862]  tc_modify_qdisc+0x7b8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Allocated by task 15857:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_alloc_info+0x40/0x60
[T15862]  __kasan_kmalloc+0xd4/0xe0
[T15862]  __kmalloc_cache_noprof+0x194/0x334
[T15862]  taprio_change+0x45c/0x2fe0
[T15862]  tc_modify_qdisc+0x6a8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Freed by task 6192:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_free_info+0x4c/0x80
[T15862]  poison_slab_object+0x110/0x160
[T15862]  __kasan_slab_free+0x3c/0x74
[T15862]  kfree+0x134/0x3c0
[T15862]  taprio_free_sched_cb+0x18c/0x220
[T15862]  rcu_core+0x920/0x1b7c
[T15862]  rcu_core_si+0x10/0x1c
[T15862]  handle_softirqs+0x2e8/0xd64
[T15862]  __do_softirq+0x14/0x20

Fixes: 18cdd2f ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex")
	Acked-by: Vinicius Costa Gomes <[email protected]>
	Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>

(cherry picked from commit b22db8b)
	Signed-off-by: Shreeya Patel <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 17, 2025
JIRA: https://issues.redhat.com/browse/RHEL-80096

commit 3e64bb2
Author: Shradha Gupta <[email protected]>
Date:   Tue Mar 11 03:17:40 2025 -0700

    net: mana: cleanup mana struct after debugfs_remove()

    When on a MANA VM hibernation is triggered, as part of hibernate_snapshot(),
    mana_gd_suspend() and mana_gd_resume() are called. If during this
    mana_gd_resume(), a failure occurs with HWC creation, mana_port_debugfs
    pointer does not get reinitialized and ends up pointing to older,
    cleaned-up dentry.
    Further in the hibernation path, as part of power_down(), mana_gd_shutdown()
    is triggered. This call, unaware of the failures in resume, tries to cleanup
    the already cleaned up  mana_port_debugfs value and hits the following bug:

    [  191.359296] mana 7870:00:00.0: Shutdown was called
    [  191.359918] BUG: kernel NULL pointer dereference, address: 0000000000000098
    [  191.360584] #PF: supervisor write access in kernel mode
    [  191.361125] #PF: error_code(0x0002) - not-present page
    [  191.361727] PGD 1080ea067 P4D 0
    [  191.362172] Oops: Oops: 0002 [#1] SMP NOPTI
    [  191.362606] CPU: 11 UID: 0 PID: 1674 Comm: bash Not tainted 6.14.0-rc5+ #2
    [  191.363292] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
    [  191.364124] RIP: 0010:down_write+0x19/0x50
    [  191.364537] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 de cd ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 16 65 48 8b 05 88 24 4c 6a 48 89 43 08 48 8b 5d
    [  191.365867] RSP: 0000:ff45fbe0c1c037b8 EFLAGS: 00010246
    [  191.366350] RAX: 0000000000000000 RBX: 0000000000000098 RCX: ffffff8100000000
    [  191.366951] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098
    [  191.367600] RBP: ff45fbe0c1c037c0 R08: 0000000000000000 R09: 0000000000000001
    [  191.368225] R10: ff45fbe0d2b01000 R11: 0000000000000008 R12: 0000000000000000
    [  191.368874] R13: 000000000000000b R14: ff43dc27509d67c0 R15: 0000000000000020
    [  191.369549] FS:  00007dbc5001e740(0000) GS:ff43dc663f380000(0000) knlGS:0000000000000000
    [  191.370213] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  191.370830] CR2: 0000000000000098 CR3: 0000000168e8e002 CR4: 0000000000b73ef0
    [  191.371557] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  191.372192] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    [  191.372906] Call Trace:
    [  191.373262]  <TASK>
    [  191.373621]  ? show_regs+0x64/0x70
    [  191.374040]  ? __die+0x24/0x70
    [  191.374468]  ? page_fault_oops+0x290/0x5b0
    [  191.374875]  ? do_user_addr_fault+0x448/0x800
    [  191.375357]  ? exc_page_fault+0x7a/0x160
    [  191.375971]  ? asm_exc_page_fault+0x27/0x30
    [  191.376416]  ? down_write+0x19/0x50
    [  191.376832]  ? down_write+0x12/0x50
    [  191.377232]  simple_recursive_removal+0x4a/0x2a0
    [  191.377679]  ? __pfx_remove_one+0x10/0x10
    [  191.378088]  debugfs_remove+0x44/0x70
    [  191.378530]  mana_detach+0x17c/0x4f0
    [  191.378950]  ? __flush_work+0x1e2/0x3b0
    [  191.379362]  ? __cond_resched+0x1a/0x50
    [  191.379787]  mana_remove+0xf2/0x1a0
    [  191.380193]  mana_gd_shutdown+0x3b/0x70
    [  191.380642]  pci_device_shutdown+0x3a/0x80
    [  191.381063]  device_shutdown+0x13e/0x230
    [  191.381480]  kernel_power_off+0x35/0x80
    [  191.381890]  hibernate+0x3c6/0x470
    [  191.382312]  state_store+0xcb/0xd0
    [  191.382734]  kobj_attr_store+0x12/0x30
    [  191.383211]  sysfs_kf_write+0x3e/0x50
    [  191.383640]  kernfs_fop_write_iter+0x140/0x1d0
    [  191.384106]  vfs_write+0x271/0x440
    [  191.384521]  ksys_write+0x72/0xf0
    [  191.384924]  __x64_sys_write+0x19/0x20
    [  191.385313]  x64_sys_call+0x2b0/0x20b0
    [  191.385736]  do_syscall_64+0x79/0x150
    [  191.386146]  ? __mod_memcg_lruvec_state+0xe7/0x240
    [  191.386676]  ? __lruvec_stat_mod_folio+0x79/0xb0
    [  191.387124]  ? __pfx_lru_add+0x10/0x10
    [  191.387515]  ? queued_spin_unlock+0x9/0x10
    [  191.387937]  ? do_anonymous_page+0x33c/0xa00
    [  191.388374]  ? __handle_mm_fault+0xcf3/0x1210
    [  191.388805]  ? __count_memcg_events+0xbe/0x180
    [  191.389235]  ? handle_mm_fault+0xae/0x300
    [  191.389588]  ? do_user_addr_fault+0x559/0x800
    [  191.390027]  ? irqentry_exit_to_user_mode+0x43/0x230
    [  191.390525]  ? irqentry_exit+0x1d/0x30
    [  191.390879]  ? exc_page_fault+0x86/0x160
    [  191.391235]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  191.391745] RIP: 0033:0x7dbc4ff1c574
    [  191.392111] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
    [  191.393412] RSP: 002b:00007ffd95a23ab8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
    [  191.393990] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007dbc4ff1c574
    [  191.394594] RDX: 0000000000000005 RSI: 00005a6eeadb0ce0 RDI: 0000000000000001
    [  191.395215] RBP: 00007ffd95a23ae0 R08: 00007dbc50003b20 R09: 0000000000000000
    [  191.395805] R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000005
    [  191.396404] R13: 00005a6eeadb0ce0 R14: 00007dbc500045c0 R15: 00007dbc50001ee0
    [  191.396987]  </TASK>

    To fix this, we explicitly set such mana debugfs variables to NULL after
    debugfs_remove() is called.

    Fixes: 6607c17 ("net: mana: Enable debugfs files for MANA device")
    Cc: [email protected]
    Signed-off-by: Shradha Gupta <[email protected]>
    Reviewed-by: Haiyang Zhang <[email protected]>
    Reviewed-by: Michal Kubiak <[email protected]>
    Link: https://patch.msgid.link/1741688260-28922-1-git-send-email-shradhagupta@linux.microsoft.com
    Signed-off-by: Paolo Abeni <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 17, 2025
JIRA: https://issues.redhat.com/browse/RHEL-89168

commit a104042
Author: Remi Pommarel <[email protected]>
Date:   Mon Mar 24 17:28:20 2025 +0100

    wifi: mac80211: Update skb's control block key in ieee80211_tx_dequeue()
    
    The ieee80211 skb control block key (set when skb was queued) could have
    been removed before ieee80211_tx_dequeue() call. ieee80211_tx_dequeue()
    already called ieee80211_tx_h_select_key() to get the current key, but
    the latter do not update the key in skb control block in case it is
    NULL. Because some drivers actually use this key in their TX callbacks
    (e.g. ath1{1,2}k_mac_op_tx()) this could lead to the use after free
    below:
    
      BUG: KASAN: slab-use-after-free in ath11k_mac_op_tx+0x590/0x61c
      Read of size 4 at addr ffffff803083c248 by task kworker/u16:4/1440
    
      CPU: 3 UID: 0 PID: 1440 Comm: kworker/u16:4 Not tainted 6.13.0-ge128f627f404 #2
      Hardware name: HW (DT)
      Workqueue: bat_events batadv_send_outstanding_bcast_packet
      Call trace:
       show_stack+0x14/0x1c (C)
       dump_stack_lvl+0x58/0x74
       print_report+0x164/0x4c0
       kasan_report+0xac/0xe8
       __asan_report_load4_noabort+0x1c/0x24
       ath11k_mac_op_tx+0x590/0x61c
       ieee80211_handle_wake_tx_queue+0x12c/0x1c8
       ieee80211_queue_skb+0xdcc/0x1b4c
       ieee80211_tx+0x1ec/0x2bc
       ieee80211_xmit+0x224/0x324
       __ieee80211_subif_start_xmit+0x85c/0xcf8
       ieee80211_subif_start_xmit+0xc0/0xec4
       dev_hard_start_xmit+0xf4/0x28c
       __dev_queue_xmit+0x6ac/0x318c
       batadv_send_skb_packet+0x38c/0x4b0
       batadv_send_outstanding_bcast_packet+0x110/0x328
       process_one_work+0x578/0xc10
       worker_thread+0x4bc/0xc7c
       kthread+0x2f8/0x380
       ret_from_fork+0x10/0x20
    
      Allocated by task 1906:
       kasan_save_stack+0x28/0x4c
       kasan_save_track+0x1c/0x40
       kasan_save_alloc_info+0x3c/0x4c
       __kasan_kmalloc+0xac/0xb0
       __kmalloc_noprof+0x1b4/0x380
       ieee80211_key_alloc+0x3c/0xb64
       ieee80211_add_key+0x1b4/0x71c
       nl80211_new_key+0x2b4/0x5d8
       genl_family_rcv_msg_doit+0x198/0x240
      <...>
    
      Freed by task 1494:
       kasan_save_stack+0x28/0x4c
       kasan_save_track+0x1c/0x40
       kasan_save_free_info+0x48/0x94
       __kasan_slab_free+0x48/0x60
       kfree+0xc8/0x31c
       kfree_sensitive+0x70/0x80
       ieee80211_key_free_common+0x10c/0x174
       ieee80211_free_keys+0x188/0x46c
       ieee80211_stop_mesh+0x70/0x2cc
       ieee80211_leave_mesh+0x1c/0x60
       cfg80211_leave_mesh+0xe0/0x280
       cfg80211_leave+0x1e0/0x244
      <...>
    
    Reset SKB control block key before calling ieee80211_tx_h_select_key()
    to avoid that.
    
    Fixes: bb42f2d ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue")
    Signed-off-by: Remi Pommarel <[email protected]>
    Link: https://patch.msgid.link/06aa507b853ca385ceded81c18b0a6dd0f081bc8.1742833382.git.repk@triplefau.lt
    Signed-off-by: Johannes Berg <[email protected]>

Signed-off-by: Jose Ignacio Tornos Martinez <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 17, 2025
JIRA: https://issues.redhat.com/browse/RHEL-93612
CVE: CVE-2025-22026
Conflicts: several conflicts
	patching file fs/nfsd/nfsctl.c
	Hunk #1 FAILED at 2202.
	Due to missing 86ab08b "SUNRPC: replace program list with program
	array" -- a part of a large patch series
	Due to missing 73598a0 "nfsd: don't allocate the versions array."
	-- a part of the large patch series
	Hunk #2 FAILED at 2213.
	because there is no LOCALIO functionality but specifically
	fa49838 "nfsd: add LOCALIO support"
	checking file fs/nfsd/stats.h
	Hunk #1 FAILED at 10.
	Due to missing 7d12cce "fs: nfsd: use group allocation/free of
	per-cpu counters API"

commit 930b64c
Author: Jeff Layton <[email protected]>
Date: Thu, 6 Feb 2025 13:12:13 -0500

    Currently, nfsd_proc_stat_init() ignores the return value of
    svc_proc_register(). If the procfile creation fails, then the kernel
    will WARN when it tries to remove the entry later.

    Fix nfsd_proc_stat_init() to return the same type of pointer as
    svc_proc_register(), and fix up nfsd_net_init() to check that and fail
    the nfsd_net construction if it occurs.

    svc_proc_register() can fail if the dentry can't be allocated, or if an
    identical dentry already exists. The second case is pretty unlikely in
    the nfsd_net construction codepath, so if this happens, return -ENOMEM.

    Reported-by: [email protected]
    Closes: https://lore.kernel.org/linux-nfs/[email protected]/
    Cc: [email protected] # v6.9
    Signed-off-by: Jeff Layton <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>

Signed-off-by: Olga Kornievskaia <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 17, 2025
JIRA: https://issues.redhat.com/browse/RHEL-102648

When I run the NVME over TCP test in virtme-ng, I get the following
"suspicious RCU usage" warning in nvme_mpath_add_sysfs_link():

'''
[    5.024557][   T44] nvmet: Created nvm controller 1 for subsystem nqn.2025-06.org.nvmexpress.mptcp for NQN nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77.
[    5.027401][  T183] nvme nvme0: creating 2 I/O queues.
[    5.029017][  T183] nvme nvme0: mapped 2/0/0 default/read/poll queues.
[    5.032587][  T183] nvme nvme0: new ctrl: NQN "nqn.2025-06.org.nvmexpress.mptcp", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77
[    5.042214][   T25]
[    5.042440][   T25] =============================
[    5.042579][   T25] WARNING: suspicious RCU usage
[    5.042705][   T25] 6.16.0-rc3+ #23 Not tainted
[    5.042812][   T25] -----------------------------
[    5.042934][   T25] drivers/nvme/host/multipath.c:1203 RCU-list traversed in non-reader section!!
[    5.043111][   T25]
[    5.043111][   T25] other info that might help us debug this:
[    5.043111][   T25]
[    5.043341][   T25]
[    5.043341][   T25] rcu_scheduler_active = 2, debug_locks = 1
[    5.043502][   T25] 3 locks held by kworker/u9:0/25:
[    5.043615][   T25]  #0: ffff888008730948 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x7ed/0x1350
[    5.043830][   T25]  #1: ffffc900001afd40 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0xcf3/0x1350
[    5.044084][   T25]  #2: ffff888013ee0020 (&head->srcu){.+.+}-{0:0}, at: nvme_mpath_add_sysfs_link.part.0+0xb4/0x3a0
[    5.044300][   T25]
[    5.044300][   T25] stack backtrace:
[    5.044439][   T25] CPU: 0 UID: 0 PID: 25 Comm: kworker/u9:0 Not tainted 6.16.0-rc3+ #23 PREEMPT(full)
[    5.044441][   T25] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    5.044442][   T25] Workqueue: async async_run_entry_fn
[    5.044445][   T25] Call Trace:
[    5.044446][   T25]  <TASK>
[    5.044449][   T25]  dump_stack_lvl+0x6f/0xb0
[    5.044453][   T25]  lockdep_rcu_suspicious.cold+0x4f/0xb1
[    5.044457][   T25]  nvme_mpath_add_sysfs_link.part.0+0x2fb/0x3a0
[    5.044459][   T25]  ? queue_work_on+0x90/0xf0
[    5.044461][   T25]  ? lockdep_hardirqs_on+0x78/0x110
[    5.044466][   T25]  nvme_mpath_set_live+0x1e9/0x4f0
[    5.044470][   T25]  nvme_mpath_add_disk+0x240/0x2f0
[    5.044472][   T25]  ? __pfx_nvme_mpath_add_disk+0x10/0x10
[    5.044475][   T25]  ? add_disk_fwnode+0x361/0x580
[    5.044480][   T25]  nvme_alloc_ns+0x81c/0x17c0
[    5.044483][   T25]  ? kasan_quarantine_put+0x104/0x240
[    5.044487][   T25]  ? __pfx_nvme_alloc_ns+0x10/0x10
[    5.044495][   T25]  ? __pfx_nvme_find_get_ns+0x10/0x10
[    5.044496][   T25]  ? rcu_read_lock_any_held+0x45/0xa0
[    5.044498][   T25]  ? validate_chain+0x232/0x4f0
[    5.044503][   T25]  nvme_scan_ns+0x4c8/0x810
[    5.044506][   T25]  ? __pfx_nvme_scan_ns+0x10/0x10
[    5.044508][   T25]  ? find_held_lock+0x2b/0x80
[    5.044512][   T25]  ? ktime_get+0x16d/0x220
[    5.044517][   T25]  ? kvm_clock_get_cycles+0x18/0x30
[    5.044520][   T25]  ? __pfx_nvme_scan_ns_async+0x10/0x10
[    5.044522][   T25]  async_run_entry_fn+0x97/0x560
[    5.044523][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044526][   T25]  process_one_work+0xd3c/0x1350
[    5.044532][   T25]  ? __pfx_process_one_work+0x10/0x10
[    5.044536][   T25]  ? assign_work+0x16c/0x240
[    5.044539][   T25]  worker_thread+0x4da/0xd50
[    5.044545][   T25]  ? __pfx_worker_thread+0x10/0x10
[    5.044546][   T25]  kthread+0x356/0x5c0
[    5.044548][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044549][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044552][   T25]  ? __lock_release.isra.0+0x5d/0x180
[    5.044553][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044555][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044557][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044559][   T25]  ret_from_fork+0x218/0x2e0
[    5.044561][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044562][   T25]  ret_from_fork_asm+0x1a/0x30
[    5.044570][   T25]  </TASK>
'''

This patch uses sleepable RCU version of helper list_for_each_entry_srcu()
instead of list_for_each_entry_rcu() to fix it.

Fixes: 4dbd2b2 ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nilay Shroff <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
(cherry picked from commit d681107)
Signed-off-by: Chris Leech <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 18, 2025
JIRA: https://issues.redhat.com/browse/RHEL-102650

When I run the NVME over TCP test in virtme-ng, I get the following
"suspicious RCU usage" warning in nvme_mpath_add_sysfs_link():

'''
[    5.024557][   T44] nvmet: Created nvm controller 1 for subsystem nqn.2025-06.org.nvmexpress.mptcp for NQN nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77.
[    5.027401][  T183] nvme nvme0: creating 2 I/O queues.
[    5.029017][  T183] nvme nvme0: mapped 2/0/0 default/read/poll queues.
[    5.032587][  T183] nvme nvme0: new ctrl: NQN "nqn.2025-06.org.nvmexpress.mptcp", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77
[    5.042214][   T25]
[    5.042440][   T25] =============================
[    5.042579][   T25] WARNING: suspicious RCU usage
[    5.042705][   T25] 6.16.0-rc3+ #23 Not tainted
[    5.042812][   T25] -----------------------------
[    5.042934][   T25] drivers/nvme/host/multipath.c:1203 RCU-list traversed in non-reader section!!
[    5.043111][   T25]
[    5.043111][   T25] other info that might help us debug this:
[    5.043111][   T25]
[    5.043341][   T25]
[    5.043341][   T25] rcu_scheduler_active = 2, debug_locks = 1
[    5.043502][   T25] 3 locks held by kworker/u9:0/25:
[    5.043615][   T25]  #0: ffff888008730948 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x7ed/0x1350
[    5.043830][   T25]  #1: ffffc900001afd40 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0xcf3/0x1350
[    5.044084][   T25]  #2: ffff888013ee0020 (&head->srcu){.+.+}-{0:0}, at: nvme_mpath_add_sysfs_link.part.0+0xb4/0x3a0
[    5.044300][   T25]
[    5.044300][   T25] stack backtrace:
[    5.044439][   T25] CPU: 0 UID: 0 PID: 25 Comm: kworker/u9:0 Not tainted 6.16.0-rc3+ #23 PREEMPT(full)
[    5.044441][   T25] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    5.044442][   T25] Workqueue: async async_run_entry_fn
[    5.044445][   T25] Call Trace:
[    5.044446][   T25]  <TASK>
[    5.044449][   T25]  dump_stack_lvl+0x6f/0xb0
[    5.044453][   T25]  lockdep_rcu_suspicious.cold+0x4f/0xb1
[    5.044457][   T25]  nvme_mpath_add_sysfs_link.part.0+0x2fb/0x3a0
[    5.044459][   T25]  ? queue_work_on+0x90/0xf0
[    5.044461][   T25]  ? lockdep_hardirqs_on+0x78/0x110
[    5.044466][   T25]  nvme_mpath_set_live+0x1e9/0x4f0
[    5.044470][   T25]  nvme_mpath_add_disk+0x240/0x2f0
[    5.044472][   T25]  ? __pfx_nvme_mpath_add_disk+0x10/0x10
[    5.044475][   T25]  ? add_disk_fwnode+0x361/0x580
[    5.044480][   T25]  nvme_alloc_ns+0x81c/0x17c0
[    5.044483][   T25]  ? kasan_quarantine_put+0x104/0x240
[    5.044487][   T25]  ? __pfx_nvme_alloc_ns+0x10/0x10
[    5.044495][   T25]  ? __pfx_nvme_find_get_ns+0x10/0x10
[    5.044496][   T25]  ? rcu_read_lock_any_held+0x45/0xa0
[    5.044498][   T25]  ? validate_chain+0x232/0x4f0
[    5.044503][   T25]  nvme_scan_ns+0x4c8/0x810
[    5.044506][   T25]  ? __pfx_nvme_scan_ns+0x10/0x10
[    5.044508][   T25]  ? find_held_lock+0x2b/0x80
[    5.044512][   T25]  ? ktime_get+0x16d/0x220
[    5.044517][   T25]  ? kvm_clock_get_cycles+0x18/0x30
[    5.044520][   T25]  ? __pfx_nvme_scan_ns_async+0x10/0x10
[    5.044522][   T25]  async_run_entry_fn+0x97/0x560
[    5.044523][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044526][   T25]  process_one_work+0xd3c/0x1350
[    5.044532][   T25]  ? __pfx_process_one_work+0x10/0x10
[    5.044536][   T25]  ? assign_work+0x16c/0x240
[    5.044539][   T25]  worker_thread+0x4da/0xd50
[    5.044545][   T25]  ? __pfx_worker_thread+0x10/0x10
[    5.044546][   T25]  kthread+0x356/0x5c0
[    5.044548][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044549][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044552][   T25]  ? __lock_release.isra.0+0x5d/0x180
[    5.044553][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044555][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044557][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044559][   T25]  ret_from_fork+0x218/0x2e0
[    5.044561][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044562][   T25]  ret_from_fork_asm+0x1a/0x30
[    5.044570][   T25]  </TASK>
'''

This patch uses sleepable RCU version of helper list_for_each_entry_srcu()
instead of list_for_each_entry_rcu() to fix it.

Fixes: 4dbd2b2 ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nilay Shroff <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
(cherry picked from commit d681107)
Signed-off-by: Chris Leech <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 18, 2025
…void Priority Inversion in SRIOV

commit dc0297f upstream.

RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[  253.013423] =============================
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE
[  253.013464] -----------------------------
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  253.013877]  #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  <TASK>
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  </TASK>

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180 ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao <[email protected]>
Cc: Jingwen Chen <[email protected]>
Cc: Victor Skvortsov <[email protected]>
Cc: Zhigang Luo <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
[ Minor context change fixed. ]
Signed-off-by: Wenshan Lan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 18, 2025
…-flight

commit ecf371f upstream.

Reject migration of SEV{-ES} state if either the source or destination VM
is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the
section between incrementing created_vcpus and online_vcpus.  The bulk of
vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs
in parallel, and so sev_info.es_active can get toggled from false=>true in
the destination VM after (or during) svm_vcpu_create(), resulting in an
SEV{-ES} VM effectively having a non-SEV{-ES} vCPU.

The issue manifests most visibly as a crash when trying to free a vCPU's
NULL VMSA page in an SEV-ES VM, but any number of things can go wrong.

  BUG: unable to handle page fault for address: ffffebde00000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] SMP KASAN NOPTI
  CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G     U     O        6.15.0-smp-DEV #2 NONE
  Tainted: [U]=USER, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
  RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline]
  RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline]
  RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline]
  RIP: 0010:PageHead include/linux/page-flags.h:866 [inline]
  RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067
  Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0
  RSP: 0018:ffff8984551978d0 EFLAGS: 00010246
  RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000
  RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000
  R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000
  R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000
  FS:  0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0
  DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169
   svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515
   kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396
   kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline]
   kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490
   kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895
   kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310
   kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369
   __fput+0x3e4/0x9e0 fs/file_table.c:465
   task_work_run+0x1a9/0x220 kernel/task_work.c:227
   exit_task_work include/linux/task_work.h:40 [inline]
   do_exit+0x7f0/0x25b0 kernel/exit.c:953
   do_group_exit+0x203/0x2d0 kernel/exit.c:1102
   get_signal+0x1357/0x1480 kernel/signal.c:3034
   arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218
   do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f87a898e969
   </TASK>
  Modules linked in: gq(O)
  gsmi: Log Shutdown Reason 0x03
  CR2: ffffebde00000000
  ---[ end trace 0000000000000000 ]---

Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing
the host is likely desirable due to the VMSA being consumed by hardware.
E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a
bogus VMSA page.  Accessing PFN 0 is "fine"-ish now that it's sequestered
away thanks to L1TF, but panicking in this scenario is preferable to
potentially running with corrupted state.

Reported-by: Alexander Potapenko <[email protected]>
Tested-by: Alexander Potapenko <[email protected]>
Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration")
Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration")
Cc: [email protected]
Cc: James Houghton <[email protected]>
Cc: Peter Gonda <[email protected]>
Reviewed-by: Liam Merwick <[email protected]>
Tested-by: Liam Merwick <[email protected]>
Reviewed-by: James Houghton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
shreeya-patel98 added a commit that referenced this pull request Jul 18, 2025
jira VULN-8620
cve CVE-2024-50126
commit-author Dmitry Antipov <[email protected]>
commit b22db8b

Fix possible use-after-free in 'taprio_dump()' by adding RCU
read-side critical section there. Never seen on x86 but
found on a KASAN-enabled arm64 system when investigating
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa:

[T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0
[T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862
[T15862]
[T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2
[T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024
[T15862] Call trace:
[T15862]  dump_backtrace+0x20c/0x220
[T15862]  show_stack+0x2c/0x40
[T15862]  dump_stack_lvl+0xf8/0x174
[T15862]  print_report+0x170/0x4d8
[T15862]  kasan_report+0xb8/0x1d4
[T15862]  __asan_report_load4_noabort+0x20/0x2c
[T15862]  taprio_dump+0xa0c/0xbb0
[T15862]  tc_fill_qdisc+0x540/0x1020
[T15862]  qdisc_notify.isra.0+0x330/0x3a0
[T15862]  tc_modify_qdisc+0x7b8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Allocated by task 15857:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_alloc_info+0x40/0x60
[T15862]  __kasan_kmalloc+0xd4/0xe0
[T15862]  __kmalloc_cache_noprof+0x194/0x334
[T15862]  taprio_change+0x45c/0x2fe0
[T15862]  tc_modify_qdisc+0x6a8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Freed by task 6192:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_free_info+0x4c/0x80
[T15862]  poison_slab_object+0x110/0x160
[T15862]  __kasan_slab_free+0x3c/0x74
[T15862]  kfree+0x134/0x3c0
[T15862]  taprio_free_sched_cb+0x18c/0x220
[T15862]  rcu_core+0x920/0x1b7c
[T15862]  rcu_core_si+0x10/0x1c
[T15862]  handle_softirqs+0x2e8/0xd64
[T15862]  __do_softirq+0x14/0x20

Fixes: 18cdd2f ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex")
	Acked-by: Vinicius Costa Gomes <[email protected]>
	Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>

(cherry picked from commit b22db8b)
	Signed-off-by: Shreeya Patel <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 19, 2025
If the PHY driver uses another PHY internally (e.g. in case of eUSB2,
repeaters are represented as PHYs), then it would trigger the following
lockdep splat because all PHYs use a single static lockdep key and thus
lockdep can not identify whether there is a dependency or not and
reports a false positive.

Make PHY subsystem use dynamic lockdep keys, assigning each driver a
separate key. This way lockdep can correctly identify dependency graph
between mutexes.

 ============================================
 WARNING: possible recursive locking detected
 6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 Not tainted
 --------------------------------------------
 kworker/u51:0/78 is trying to acquire lock:
 ffff0008116554f0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c

 but task is already holding lock:
 ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&phy->mutex);
   lock(&phy->mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by kworker/u51:0/78:
  #0: ffff000800010948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x18c/0x5ec
  #1: ffff80008036bdb0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1b4/0x5ec
  #2: ffff0008094ac8f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x38/0x188
  #3: ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c

 stack backtrace:
 CPU: 0 UID: 0 PID: 78 Comm: kworker/u51:0 Not tainted 6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 PREEMPT
 Hardware name: Qualcomm CRD, BIOS 6.0.240904.BOOT.MXF.2.4-00528.1-HAMOA-1 09/ 4/2024
 Workqueue: events_unbound deferred_probe_work_func
 Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  print_deadlock_bug+0x258/0x348
  __lock_acquire+0x10fc/0x1f84
  lock_acquire+0x1c8/0x338
  __mutex_lock+0xb8/0x59c
  mutex_lock_nested+0x24/0x30
  phy_init+0x4c/0x12c
  snps_eusb2_hsphy_init+0x54/0x1a0
  phy_init+0xe0/0x12c
  dwc3_core_init+0x450/0x10b4
  dwc3_core_probe+0xce4/0x15fc
  dwc3_probe+0x64/0xb0
  platform_probe+0x68/0xc4
  really_probe+0xbc/0x298
  __driver_probe_device+0x78/0x12c
  driver_probe_device+0x3c/0x160
  __device_attach_driver+0xb8/0x138
  bus_for_each_drv+0x84/0xe0
  __device_attach+0x9c/0x188
  device_initial_probe+0x14/0x20
  bus_probe_device+0xac/0xb0
  deferred_probe_work_func+0x8c/0xc8
  process_one_work+0x208/0x5ec
  worker_thread+0x1c0/0x368
  kthread+0x14c/0x20c
  ret_from_fork+0x10/0x20

Fixes: 3584f63 ("phy: qcom: phy-qcom-snps-eusb2: Add support for eUSB2 repeater")
Fixes: e246355 ("phy: amlogic: Add Amlogic AXG PCIE PHY Driver")
Reviewed-by: Neil Armstrong <[email protected]>
Reviewed-by: Abel Vesa <[email protected]>
Reported-by: Johan Hovold <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Reviewed-by: Johan Hovold <[email protected]>
Tested-by: Johan Hovold <[email protected]>
Signed-off-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 19, 2025
This reverts commit 7796c97.

This patch broke Dragonboard 845c (sdm845). I see:

    Unexpected kernel BRK exception at EL1
    Internal error: BRK handler: 00000000f20003e8 [#1]  SMP
    pc : qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom]
    lr : snd_soc_dai_set_channel_map+0x34/0x78
    Call trace:
     qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom] (P)
     sdm845_dai_init+0x18c/0x2e0 [snd_soc_sdm845]
     snd_soc_link_init+0x28/0x6c
     snd_soc_bind_card+0x5f4/0xb0c
     snd_soc_register_card+0x148/0x1a4
     devm_snd_soc_register_card+0x50/0xb0
     sdm845_snd_platform_probe+0x124/0x148 [snd_soc_sdm845]
     platform_probe+0x6c/0xd0
     really_probe+0xc0/0x2a4
     __driver_probe_device+0x7c/0x130
     driver_probe_device+0x40/0x118
     __device_attach_driver+0xc4/0x108
     bus_for_each_drv+0x8c/0xf0
     __device_attach+0xa4/0x198
     device_initial_probe+0x18/0x28
     bus_probe_device+0xb8/0xbc
     deferred_probe_work_func+0xac/0xfc
     process_one_work+0x244/0x658
     worker_thread+0x1b4/0x360
     kthread+0x148/0x228
     ret_from_fork+0x10/0x20
    Kernel panic - not syncing: BRK handler: Fatal exception

Dan has also reported following issues with the original patch
https://lore.kernel.org/all/[email protected]/

Bug #1:
The zeroeth element of ctrl->pconfig[] is supposed to be unused.  We
start counting at 1.  However this code sets ctrl->pconfig[0].ch_mask = 128.

Bug #2:
There are SLIM_MAX_TX_PORTS (16) elements in tx_ch[] array but only
QCOM_SDW_MAX_PORTS + 1 (15) in the ctrl->pconfig[] array so it corrupts
memory like Yongqin Liu pointed out.

Bug 3:
Like Jie Gan pointed out, it erases all the tx information with the rx
information.

Cc: [email protected] # v6.15+
Signed-off-by: Amit Pundir <[email protected]>
Acked-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 19, 2025
 into HEAD

KVM/riscv fixes for 6.16, take #2

- Disable vstimecmp before exiting to user-space
- Move HGEI[E|P] CSR access to IMSIC virtualization
shreeya-patel98 added a commit that referenced this pull request Jul 21, 2025
jira VULN-8620
cve CVE-2024-50126
commit-author Dmitry Antipov <[email protected]>
commit b22db8b

Fix possible use-after-free in 'taprio_dump()' by adding RCU
read-side critical section there. Never seen on x86 but
found on a KASAN-enabled arm64 system when investigating
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa:

[T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0
[T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862
[T15862]
[T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2
[T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024
[T15862] Call trace:
[T15862]  dump_backtrace+0x20c/0x220
[T15862]  show_stack+0x2c/0x40
[T15862]  dump_stack_lvl+0xf8/0x174
[T15862]  print_report+0x170/0x4d8
[T15862]  kasan_report+0xb8/0x1d4
[T15862]  __asan_report_load4_noabort+0x20/0x2c
[T15862]  taprio_dump+0xa0c/0xbb0
[T15862]  tc_fill_qdisc+0x540/0x1020
[T15862]  qdisc_notify.isra.0+0x330/0x3a0
[T15862]  tc_modify_qdisc+0x7b8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Allocated by task 15857:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_alloc_info+0x40/0x60
[T15862]  __kasan_kmalloc+0xd4/0xe0
[T15862]  __kmalloc_cache_noprof+0x194/0x334
[T15862]  taprio_change+0x45c/0x2fe0
[T15862]  tc_modify_qdisc+0x6a8/0x1838
[T15862]  rtnetlink_rcv_msg+0x3c8/0xc20
[T15862]  netlink_rcv_skb+0x1f8/0x3d4
[T15862]  rtnetlink_rcv+0x28/0x40
[T15862]  netlink_unicast+0x51c/0x790
[T15862]  netlink_sendmsg+0x79c/0xc20
[T15862]  __sock_sendmsg+0xe0/0x1a0
[T15862]  ____sys_sendmsg+0x6c0/0x840
[T15862]  ___sys_sendmsg+0x1ac/0x1f0
[T15862]  __sys_sendmsg+0x110/0x1d0
[T15862]  __arm64_sys_sendmsg+0x74/0xb0
[T15862]  invoke_syscall+0x88/0x2e0
[T15862]  el0_svc_common.constprop.0+0xe4/0x2a0
[T15862]  do_el0_svc+0x44/0x60
[T15862]  el0_svc+0x50/0x184
[T15862]  el0t_64_sync_handler+0x120/0x12c
[T15862]  el0t_64_sync+0x190/0x194
[T15862]
[T15862] Freed by task 6192:
[T15862]  kasan_save_stack+0x3c/0x70
[T15862]  kasan_save_track+0x20/0x3c
[T15862]  kasan_save_free_info+0x4c/0x80
[T15862]  poison_slab_object+0x110/0x160
[T15862]  __kasan_slab_free+0x3c/0x74
[T15862]  kfree+0x134/0x3c0
[T15862]  taprio_free_sched_cb+0x18c/0x220
[T15862]  rcu_core+0x920/0x1b7c
[T15862]  rcu_core_si+0x10/0x1c
[T15862]  handle_softirqs+0x2e8/0xd64
[T15862]  __do_softirq+0x14/0x20

Fixes: 18cdd2f ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex")
	Acked-by: Vinicius Costa Gomes <[email protected]>
	Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>

(cherry picked from commit b22db8b)
	Signed-off-by: Shreeya Patel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants