Skip to content

Conversation

@PlaidCat
Copy link
Collaborator

@PlaidCat PlaidCat commented Nov 6, 2025

General Process:

Checking Rebuild Commits for Potentially missing commits:

kernel-4.18.0-553.82.1.el8_10

[jmaple@devbox kernel-src-tree]$ cat ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567757
Number of commits in rpm: 155
Number of commits matched with upstream: 145 (93.55%)
Number of commits in upstream but not in rpm: 567612
Number of commits NOT found in upstream: 10 (6.45%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.82.1.el8_10 for kernel-4.18.0-553.82.1.el8_10
Clean Cherry Picks: 74 (51.03%)
Empty Cherry Picks: 71 (48.97%)
_______________________________

__EMPTY COMMITS__________________________
3fcc2b887a1ba4c1f45319cd8c54daa263ecbc36 ext4: refactor ext4_da_map_blocks()
acf795dc161f3cf481db20f05db4250714e375e5 ext4: convert to exclusive lock while inserting delalloc extents
8e4e5cdf2fdeb99445a468b6b6436ad79b9ecb30 ext4: factor out a common helper to query extent map
0ea6560abb3bac1ffcfa4bf6b2c4d344fdc27b3c ext4: check the extent status again before inserting delalloc block
402e38e6b71f5739119ca3107f375e112d63c7c5 ext4: prevent stale extent cache entries caused by concurrent I/O writeback
f5456b5d67cf812fd31fe3e130ca216b2e0908e5 gfs2: Clean up revokes on normal withdraws
e320050eb75e914aa5e12de2a9ab830c9a2ce311 gfs2: No more gfs2_find_jhead caching
183eea2ee5ba968ca7c31f04a0f01fd3e5c1d014 cifs: reconnect only the connection and not smb session where possible
080dc5e5656c1cc1cdefb501b9b645a07519f763 cifs: take cifs_tcp_ses_lock for status checks
1913e1116a3174648cf2e6faedf29204f31cc438 cifs: fix hang on cifs_get_next_mid()
73f9bfbe3d818bb52266d5c9f3ba57d97842ffe7 cifs: maintain a state machine for tcp/smb/tcon sessions
bda487ac4bebf871255cc6f23e16f702cea0ca7c cifs: avoid race during socket reconnect between send and recv
3663c9045f51a7ad635a0785adef07c21b79b560 cifs: check reconnects for channels of active tcons too
a05885ce13bd5ec9602551e32dfb1a4f26bfa542 cifs: fix the connection state transitions with multichannel
88b024f556fcd5bf1288c6333016f576cfa5f539 cifs: protect all accesses to chan_* with chan_lock
8a409cda978e212661b8c032e1b08b3b0b0f9d36 cifs: remove unused variable ses_selected
c1604da708d345a1ca1cf6a5537d503b14aa4787 cifs: make status checks in version independent callers
47de760655f329ce4b3d3e6276557220956d8c38 cifs: update tcpStatus during negotiate and sess setup
ba978e83255a759a4a07257a46ca6396a8b81787 cifs: cifs_ses_mark_for_reconnect should also update reconnect bits
a81da65fbae6436e1e2f415532b8aacc3274d840 cifs: call cifs_reconnect when a connection is marked
52492ff5c583036306bc422a83e246c971af387a cifs: call helper functions for marking channels for reconnect
2a05137a0575b7d1006bdf4c1beeee9e391e22a0 cifs: mark sessions for reconnection in helper function
e3ee9fb22652f228225c352bd4fabec330cac5f0 smb3: fix incorrect session setup check for multiuser mounts
dca65818c80cf06e0f08ba2cf94060a5236e73c2 cifs: use a different reconnect helper for non-cifsd threads
fdf59eb548e51bce81382c39f1a5fd4cb9403b78 smb3: cleanup and clarify status of tree connections
687127c81ad32c8900a3fedbc7ed8f686ca95855 cifs: fix potential race with cifsd thread
fb39d30e227233498c8debe6a9fe3e7cf575c85f cifs: force new session setup and tcon for dfs
1a6a41d4cedd9b302e2200e6f0e3c44dbbe13689 cifs: do not use tcpStatus after negotiate completes
a96c94481f5993eac2271f9fb4d009b7dc076c24 cifs: fix incorrect use of list iterator after the loop
dd3cd8709ed5f4ae8998e0cd44c05bd26bc879e8 cifs: use new enum for ses_status
5752bf645f9dd7db600651f726eb04a97c9f597f cifs: avoid parallel session setups on same channel
cc391b694ff085f62f133e6b8f864d43a8e69dfd cifs: fix potential deadlock in direct reclaim
8da33fd11c05b7c64ef6456970f2fce61851806e cifs: avoid deadlocks while updating iface
af3a6d1018f02c6dc8388f1f3785a559c7ab5961 cifs: update cifs_ses::ip_addr after failover
50bd7d5a647bdf533575111c5335f49707c2ce2f cifs: fix race condition with delayed threads
d7d7a66aacd6fd8ca57baf08a7bac5421282f6f8 cifs: avoid use of global locks for high contention data
aea02fc40a7fa6ac2c16e3c3a6f1d0fd7e6faaba cifs: fix wrong unlock before return from cifs_tree_connect()
68ed14496b032b0c9ef21b38ee45c6c8f3a18ff1 cifs: remove unused server parameter from calc_smb_size()
e909d054bdea75ef1ec48c18c5936affdaecbb2c cifs: Fix xid leak in cifs_ses_add_channel()
23d9b9b757e8007204d8f71448ab55d5ef2ae8e5 cifs: avoid unnecessary iteration of tcp sessions
25cf01b7c9200d6ace5a59125d8166435dd9dea7 cifs: set correct status of tcon ipc when reconnecting
39a154fc2d172a3a5865e5a9fa2a2983eb7a99ac cifs: protect access of TCP_Server_Info::{dstaddr,hostname}
3c0070f54b3128de498c2dd9934a21f0dd867111 cifs: prevent data race in smb2_reconnect()
0e9bd27b2a635d54665fcc1d6398a5f6aeb6b0cb cifs: get rid of dns resolve worker
ea90708d3cf3d0d92c02afe445ad463fb3c6bf10 cifs: use the least loaded channel for sending requests
df57109bd50b9ed6911f3c2aa914189fe4c1fe2c cifs: use tcon allocation functions even for dummy tcon
e77978de4765229e09c8fabcf4f8419ff367317f cifs: update ip_addr for ses only for primary chan setup
1bcd548d935a33c6fc58331405eb1b82fd6150de cifs: prevent data race in cifs_reconnect_tcon()
05ce0448c3f36febd8db0ee0e9e16557f3ab5ee8 cifs: generate signkey for the channel that's reconnecting
bc962159e8e326af634a506508034a375bf2b858 cifs: avoid race conditions with parallel reconnects
c24bb1a87dc3f2d77d410eaac2c6a295961bf50e cifs: fix missing unload_nls() in smb2_reconnect()
6cc041e90c178955219dcee4030bd5423f800f10 cifs: avoid races in parallel reconnects in smb1
4f5d5b33fc400911d6e1f49095522b361d9cbe13 cifs: double lock in cifs_reconnect_tcon()
943fb67b090212f1d3789eb7796b1c9045c62fd6 cifs: missing lock when updating session status
5bff9f741af60b143a5ae73417a8ec47fd5ff2f4 cifs: protect session status check in smb2_reconnect()
326a8d04f147e2bf393f6f9cdb74126ee6900607 cifs: do all necessary checks for credits within or before locking
99f280700b4cc02d5f141b8d15f8e9fad0418f65 cifs: fix session state check in reconnect to avoid use-after-free issue
ff7d80a9f2711bf3d9fe1cfb70b3fd15c50584b7 cifs: fix session state transition to avoid use-after-free issue
c3326a61cdbf3ce1273d9198b6cbf90965d7e029 cifs: reconnect helper should set reconnect for the right channel
d9a6d78096056a3cb5c5f07a730ab92f2f9ac4e6 cifs: force interface update before a fresh session setup
0c51cc6f2cb0108e7d49805f6e089cd85caab279 cifs: handle cases where a channel is closed
a6d8fb54a515f0546ffdb7870102b1238917e567 cifs: distribute channels across interfaces based on speed
fa1d0508bdd4a68c5e40f85f635712af8c12f180 cifs: account for primary channel in the interface list
7257bcf3bdc785eabc4eef1f329a59815b032508 cifs: cifs_chan_is_iface_active should be called with chan_lock held
78e727e58e54efca4c23863fbd9e16e9d2d83f81 cifs: update iface_last_update on each query-and-update
24a9799aa8efecd0eb55a75e35f9d8e6400063aa smb: client: fix UAF in smb2_reconnect_server()
343d7fe6df9e247671440a932b6a73af4fa86d95 smb: client: fix use-after-free of signing key
c1846893991f3b4ec8a0cc12219ada153f0814d6 cifs: update dstaddr whenever channel iface is updated
711741f94ac3cf9f4e3aa73aa171e76d188c0819 smb: client: fix potential deadlock when reconnecting channels
66d590b828b1fd9fa337047ae58fe1c4c6f43609 cifs: deal with the channel loading lag while picking channels
9d5eff7821f6d70f7d1b4d8a60680fba4de868a7 cifs: reset iface weights when we cannot find a candidate

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
smb: client: fix missing timestamp updates after utime(2)
mm: hugetlb: conditionally disable tlb_remove_table_sync_one() in huge_pmd_unshare()
kernel: extend rh_waived to cope better with the CVE mitigations case
Add support to rh_waived cmdline boot parameter

BUILD

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" $(ls -t kbuild* | head -n1)
/mnt/code/kernel-src-tree-build
Running make mrproper...
  CLEAN   scripts/basic
  CLEAN   scripts/kconfig
[TIMER]{MRPROPER}: 5s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-48e11f31ca38"
Making olddefconfig
--
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
--
  LD [M]  sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  sound/virtio/virtio_snd.ko
  LD [M]  sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1439s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL arch/x86/crypto/camellia-x86_64.ko
--
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL sound/xen/snd_xen_front.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-48e11f31ca38+
[TIMER]{MODULES}: 14s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-48e11f31ca38+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 21s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-48e11f31ca38+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 5s
[TIMER]{BUILD}: 1439s
[TIMER]{MODULES}: 14s
[TIMER]{INSTALL}: 21s
[TIMER]{TOTAL} 1485s
Rebooting in 10 seconds

KSelfTest

[jmaple@devbox code]$ ~/workspace/auto_kernel_history_rebuild/Rocky10/rocky10/code/get_kselftest_diff.sh
kselftest.4.18.0-rocky8_10_rebuild-9646b4b50868+.log
207
kselftest.4.18.0-rocky8_10_rebuild-baea35f64da5+.log
207
kselftest.4.18.0-rocky8_10_rebuild-99b4f48215a2+.log
207
kselftest.4.18.0-rocky8_10_rebuild-48e11f31ca38+.log
207
Before: kselftest.4.18.0-rocky8_10_rebuild-99b4f48215a2+.log
After: kselftest.4.18.0-rocky8_10_rebuild-48e11f31ca38+.log
Diff:
No differences found.

jira LE-4669
cve CVE-2023-53226
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Polaris Pi <[email protected]>
commit 1195852

Make sure mwifiex_process_mgmt_packet,
mwifiex_process_sta_rx_packet and mwifiex_process_uap_rx_packet,
mwifiex_uap_queue_bridged_pkt and mwifiex_process_rx_packet
not out-of-bounds access the skb->data buffer.

Fixes: 2dbaf75 ("mwifiex: report received management frames to cfg80211")
	Signed-off-by: Polaris Pi <[email protected]>
	Reviewed-by: Matthew Wang <[email protected]>
	Reviewed-by: Brian Norris <[email protected]>
	Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 1195852)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
cve CVE-2023-53226
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Polaris Pi <[email protected]>
commit 2785851

Add missed return in mwifiex_uap_queue_bridged_pkt() and
mwifiex_process_rx_packet().

Fixes: 1195852 ("wifi: mwifiex: Fix OOB and integer underflow when rx packets")
	Signed-off-by: Polaris Pi <[email protected]>
	Reported-by: Dmitry Antipov <[email protected]>
	Acked-by: Brian Norris <[email protected]>
	Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 2785851)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
cve CVE-2023-53226
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Pin-yen Lin <[email protected]>
commit aef7a03

Only skip the code path trying to access the rfc1042 headers when the
buffer is too small, so the driver can still process packets without
rfc1042 headers.

Fixes: 1195852 ("wifi: mwifiex: Fix OOB and integer underflow when rx packets")
	Signed-off-by: Pin-yen Lin <[email protected]>
	Acked-by: Brian Norris <[email protected]>
	Reviewed-by: Matthew Wang <[email protected]>
	Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit aef7a03)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
cve CVE-2023-53257
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Johannes Berg <[email protected]>
commit 19e4a47

Before checking the action code, check that it even
exists in the frame.

	Reported-by: [email protected]
	Signed-off-by: Johannes Berg <[email protected]>
(cherry picked from commit 19e4a47)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Zhang Yi <[email protected]>
commit 3fcc2b8
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/3fcc2b88.failed

Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary
parameters and branches, no logic changes.

	Signed-off-by: Zhang Yi <[email protected]>
	Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Theodore Ts'o <[email protected]>
(cherry picked from commit 3fcc2b8)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/ext4/inode.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Zhang Yi <[email protected]>
commit acf795d
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/acf795dc.failed

ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem
when inserting delalloc extents, it could be raced by another querying
path of ext4_map_blocks() without i_rwsem, .e.g buffered read path.
Suppose we buffered read a file containing just a hole, and without any
cached extents tree, then it is raced by another delayed buffered write
to the same area or the near area belongs to the same hole, and the new
delalloc extent could be overwritten to a hole extent.

 pread()                           pwrite()
  filemap_read_folio()
   ext4_mpage_readpages()
    ext4_map_blocks()
     down_read(i_data_sem)
     ext4_ext_determine_hole()
     //find hole
     ext4_ext_put_gap_in_cache()
      ext4_es_find_extent_range()
      //no delalloc extent
                                    ext4_da_map_blocks()
                                     down_read(i_data_sem)
                                     ext4_insert_delayed_block()
                                     //insert delalloc extent
      ext4_es_insert_extent()
      //overwrite delalloc extent to hole

This race could lead to inconsistent delalloc extents tree and
incorrect reserved space counter. Fix this by converting to hold
i_data_sem in exclusive mode when adding a new delalloc extent in
ext4_da_map_blocks().

	Cc: [email protected]
	Signed-off-by: Zhang Yi <[email protected]>
	Suggested-by: Jan Kara <[email protected]>
	Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Theodore Ts'o <[email protected]>
(cherry picked from commit acf795d)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/ext4/inode.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Zhang Yi <[email protected]>
commit 8e4e5cd
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/8e4e5cdf.failed

Factor out a new common helper ext4_map_query_blocks() from the
ext4_da_map_blocks(), it query and return the extent map status on the
inode's extent path, no logic changes.

	Signed-off-by: Zhang Yi <[email protected]>
	Reviewed-by: Jan Kara <[email protected]>
	Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Theodore Ts'o <[email protected]>
(cherry picked from commit 8e4e5cd)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/ext4/inode.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Zhang Yi <[email protected]>
commit 0ea6560
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/0ea6560a.failed

ext4_da_map_blocks looks up for any extent entry in the extent status
tree (w/o i_data_sem) and then the looks up for any ondisk extent
mapping (with i_data_sem in read mode).

If it finds a hole in the extent status tree or if it couldn't find any
entry at all, it then takes the i_data_sem in write mode to add a da
entry into the extent status tree. This can actually race with page
mkwrite & fallocate path.

Note that this is ok between
1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the
   folio lock
2. ext4 buffered write path v/s ext4 fallocate because of the inode
   lock.

But this can race between ext4_page_mkwrite() & ext4 fallocate path

ext4_page_mkwrite()             ext4_fallocate()
 block_page_mkwrite()
  ext4_da_map_blocks()
   //find hole in extent status tree
                                 ext4_alloc_file_blocks()
                                  ext4_map_blocks()
                                   //allocate block and unwritten extent
   ext4_insert_delayed_block()
    ext4_da_reserve_space()
     //reserve one more block
    ext4_es_insert_delayed_block()
     //drop unwritten extent and add delayed extent by mistake

Then, the delalloc extent is wrong until writeback and the extra
reserved block can't be released any more and it triggers below warning:

 EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!

Fix the problem by looking up extent status tree again while the
i_data_sem is held in write mode. If it still can't find any entry, then
we insert a new da entry into the extent status tree.

	Cc: [email protected]
	Signed-off-by: Zhang Yi <[email protected]>
	Reviewed-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Theodore Ts'o <[email protected]>
(cherry picked from commit 0ea6560)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/ext4/inode.c
…ce()

jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Zhang Yi <[email protected]>
commit 53ce42a

When removing space, we should use EXT4_EX_NOCACHE because we don't
need to cache extents, and we should also use EXT4_EX_NOFAIL to prevent
metadata inconsistencies that may arise from memory allocation failures.
While ext4_ext_remove_space() already uses these two flags in most
places, they are missing in ext4_ext_search_right() and
read_extent_tree_block() calls. Unify the flags to ensure consistent
behavior throughout the extent removal process.

	Signed-off-by: Zhang Yi <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Theodore Ts'o <[email protected]>
(cherry picked from commit 53ce42a)
	Signed-off-by: Jonathan Maple <[email protected]>
…teback

jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Zhang Yi <[email protected]>
commit 402e38e
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/402e38e6.failed

Currently, in the I/O writeback path, ext4_map_blocks() may attempt to
cache additional unrelated extents in the extent status tree without
holding the inode's i_rwsem and the mapping's invalidate_lock. This can
lead to stale extent status entries remaining in certain scenarios,
potentially causing data corruption.

For example, when performing a collapse range in ext4_collapse_range(),
it clears the extent cache and dirty pages before removing blocks and
shifting extents. It also holds the i_data_sem during these two
operations. However, both ext4_ext_remove_space() and
ext4_ext_shift_extents() may briefly release the i_data_sem if journal
credits are insufficient (ext4_datasem_ensure_credits()). If another
writeback process writes dirty pages from other regions during this
interval, it may cache extents that are about to be modified. Unless
ext4_collapse_range() explicitly clears the extent cache again, these
cached entries can become stale and inconsistent with the actual
extents.

     0 a  n       b      c         m
     | |  |       |      |         |
    [www][wwwwww][wwwwwwww]...[wwwww][wwww]...
          |                           |
          N                           M

Assume that block a is dirty. The collapse range operation is removing
data from n to m and drops i_data_sem immediately after removing the
extent from b to c. At the same time, a concurrent writeback begins to
write back block a; it will reloads the extent from [n, b) into the
extent status tree since it does not hold the i_rwsem or the
invalidate_lock. After the collapse range operation, it left the stale
extent [n, b), which points logical block n to N, but the actual
physical block of n should be M.

Similarly, both ext4_insert_range() and ext4_truncate() have the same
problem. ext4_punch_hole() survived since it re-add a hole extent entry
after removing space since commit 9f11182 ("ext4: add a hole extent
entry in cache after punch").

In most cases, during dirty page writeback, the block mapping
information is likely to be found in the extent cache, making it less
necessary to search for physical extents. Consequently, loading
unrelated extent caches during writeback appears to be ineffective.
Therefore, fix this by adds EXT4_EX_NOCACHE in the writeback path to
prevent caching of unrelated extents, eliminating this potential source
of corruption.

	Signed-off-by: Zhang Yi <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Theodore Ts'o <[email protected]>
(cherry picked from commit 402e38e)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/ext4/fast_commit.c
#	fs/ext4/inode.c
jira LE-4669
cve CVE-2025-39864
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Dmitry Antipov <[email protected]>
commit 26e8444

Following bss_free() quirk introduced in commit 776b358
("cfg80211: track hidden SSID networks properly"), adjust
cfg80211_update_known_bss() to free the last beacon frame
elements only if they're not shared via the corresponding
'hidden_beacon_bss' pointer.

	Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=30754ca335e6fb7e3092
Fixes: 3ab8227 ("cfg80211: refactor cfg80211_bss_update")
	Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Johannes Berg <[email protected]>
(cherry picked from commit 26e8444)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 703a4af

Move gfs2_log_pointers_init to recovery.c: there is no need for inlining
this function.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 703a4af)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Bob Peterson <[email protected]>
commit 9d9b160

We must not call gfs2_consist (which does a file system withdraw) from
the freeze glock's freeze_go_xmote_bh function because the withdraw
will try to use the freeze glock, thus causing a glock recursion error.

This patch changes freeze_go_xmote_bh to call function
gfs2_assert_withdraw_delayed instead of gfs2_consist to avoid recursion.

	Signed-off-by: Bob Peterson <[email protected]>
(cherry picked from commit 9d9b160)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 8a43d21

Move the initialization of sdp->sd_log_sequence and
sdp->sd_log_flush_head inside gfs2_log_pointers_init().  Use
gfs2_replay_incr_blk().

Before this change, the log head lookup code in freeze_go_xmote_bh()
didn't update sdp->sd_log_flush_head.  This is now fixed, but the code
in freeze_go_xmote_bh() appears to be pretty useless in the first place:
on a frozen filesystem, the log head will not change.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 8a43d21)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 2ebb94a

In function clean_journal(), update @Head to point at the log header
that indicates successful recovery:  this is where logging needs to
resume.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 2ebb94a)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit b66f723

In gfs2_make_fs_rw(), make sure to call gfs2_consist() to report an
inconsistency and mark the filesystem as withdrawn when
gfs2_find_jhead() fails.

At the end of gfs2_make_fs_rw(), when we discover that the filesystem
has been withdrawn, make sure we report an error.  This also replaces
the gfs2_withdrawn() check after gfs2_find_jhead().

	Reported-by: Tetsuo Handa <[email protected]>
	Cc: [email protected]
	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit b66f723)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 93bd5ed

Currently at mount time, the recovery code looks up the current log head
and, if necessary, replays the log and writes a recovery header to
indicate that the log is clean.  It does that for each log that may need
recovery.  We also know that our own log will always be checked as part
of that process.  Then, the mount code looks up the log head of our own
log again.

The double log head lookup can be costly, but more importantly, it is
unnecessary because we can trivially compute the position of the log
head after recovery; all we need to do for that is bump the position and
lh_sequence by one when writing a recovery header.

With that in mind, move the call to gfs2_log_pointers_init() into
gfs2_recover_func() and get rid of the double lookup in
gfs2_make_fs_rw().

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 93bd5ed)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Bob Peterson <[email protected]>
commit f5456b5
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/f5456b5d.failed

Before this patch, the system ail lists were cleaned up if the logd
process withdrew, but on other withdraws, they were not cleaned up.
This included the cleaning up of the revokes as well.

This patch reorganizes things a bit so that all withdraws (not just logd)
clean up the ail lists, including any pending revokes.

	Signed-off-by: Bob Peterson <[email protected]>
	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit f5456b5)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/log.h
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit e320050
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/e320050e.failed

We are no longer calling gfs2_find_jhead() on the same log twice, so
there is no more reason for keeping the log contents cached across those
calls.  In addition, log head lookup and log header writing didn't go
through the same address space and so the caching wasn't even fully
working, anyway.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit e320050)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/gfs2/lops.h
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Vasily Averin <[email protected]>
commit 961c613

veth netdevice defines own rx queues and allocates array containing
up to 4095 ~750-bytes-long 'struct veth_rq' elements. Such allocation
is quite huge and should be accounted to memcg.

	Signed-off-by: Vasily Averin <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 961c613)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Jakub Kicinski <[email protected]>
commit 1ce7d30

struct veth_rq is pretty large, 832B total without debug
options enabled. Since commit under Fixes we try to pre-allocate
enough queues for every possible CPU. Miao Wang reports that
this may lead to order-5 allocations which will fail in production.

Let the allocation fallback to vmalloc() and try harder.
These are the same flags we pass to netdev queue allocation.

Reported-and-tested-by: Miao Wang <[email protected]>
Fixes: 9d3684c ("veth: create by default nr_possible_cpus queues")
Link: https://lore.kernel.org/all/[email protected]/
	Signed-off-by: Jakub Kicinski <[email protected]>
	Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit 1ce7d30)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Steve French <[email protected]>
commit 65af8f0

Applications that create and extend and write to a file do not
expect to see 0 allocation size.  When file is extended,
set its allocation size to a plausible value until we have a
chance to query the server for it.  When the file is cached
this will prevent showing an impossible number of allocated
blocks (like 0).  This fixes e.g. xfstests 614 which does

    1) create a file and set its size to 64K
    2) mmap write 64K to the file
    3) stat -c %b for the file (to query the number of allocated blocks)

It was failing because we returned 0 blocks.  Even though we would
return the correct cached file size, we returned an impossible
allocation size.

	Signed-off-by: Steve French <[email protected]>
CC: <[email protected]>
	Reviewed-by: Aurelien Aptel <[email protected]>
(cherry picked from commit 65af8f0)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 45a4546

For AES256 encryption (GCM and CCM), we need to adjust the size of a few
fields to 32 bytes instead of 16 to accommodate the larger keys.

Also, the L value supplied to the key generator needs to be changed from
to 256 when these algorithms are used.

Keeping the ioctl struct for dumping keys of the same size for now.
Will send out a different patch for that one.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Reviewed-by: Ronnie Sahlberg <[email protected]>
CC: <[email protected]> # v5.10+
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 45a4546)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit d1a931c

We needed a way to identify the channels under the smb session
which are in reconnect, so that the traffic to other channels
can continue. So I replaced the bool need_reconnect with
a bitmask identifying all the channels that need reconnection
(named chans_need_reconnect). When a channel needs reconnection,
the bit corresponding to the index of the server in ses->chans
is used to set this bitmask. Checking if no channels or all
the channels need reconnect then becomes very easy.

Also wrote some helper macros for checking and setting the bits.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit d1a931c)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit f486ef8

We use the concept of "binding" when one of the secondary channel
is in the process of connecting/reconnecting to the server. Till this
binding process completes, and the channel is bound to an existing session,
we redirect traffic from other established channels on the binding channel,
effectively blocking all traffic till individual channels get reconnected.

With my last set of commits, we can get rid of this binding serialization.
We now have a bitmap of connection states for each channel. We will use
this bitmap instead for tracking channel status.

Having a bitmap also now enables us to keep the session alive, as long
as even a single channel underneath is alive.

Unfortunately, this also meant that we need to supply the tcp connection
info for the channel during all negotiate and session setup functions.
These changes have resulted in a slightly bigger code churn.
However, I expect perf and robustness improvements in the mchan scenario
after this change.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit f486ef8)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 66eb0c6

Use ses->chans_need_reconnect bitmask to print the connection
status of each channel under an SMB session.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 66eb0c6)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 2e0fa29

chan_count keeps track of the total number of channels.
Since at least the primary channel will always be connected,
this value can never go below 1. Warn if that happens.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 2e0fa29)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 183eea2
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/183eea2e.failed

With the new per-channel bitmask for reconnect, we have an option to
reconnect the tcp session associated with the channel without reconnecting
the smb session. i.e. if there are still channels to operate on, we can
continue to use the smb session and tcon.

However, there are cases where it makes sense to reconnect the smb session
even when there are active channels underneath. For example for
SMB session expiry.

With this patch, we'll have an option to do either, and use the correct
option for specific cases.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 183eea2)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 080dc5e
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/080dc5e5.failed

While checking/updating status for tcp ses, smb ses or tcon,
we take GlobalMid_Lock. This doesn't make any sense.
Replaced it with cifs_tcp_ses_lock.

Ideally, we should take a spin lock per struct.
But since tcp ses, smb ses and tcon objects won't add up to a lot,
I think there should not be too much contention.

Also, in few other places, these are checked without locking.
Added locking for these.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 080dc5e)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Enzo Matsumiya <[email protected]>
commit 1913e11
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/1913e111.failed

Mount will hang if using SMB1 and DFS.

This is because every call to get_next_mid() will, unconditionally,
mark tcpStatus to CifsNeedReconnect before even establishing the
initial connect, because "reconnect" variable was not initialized.

Initializing "reconnect" to false fix this issue.

Fixes: 220c5bc25d87 ("cifs: take cifs_tcp_ses_lock for status checks")
	Signed-off-by: Enzo Matsumiya <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 1913e11)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/smb1ops.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Winston Wen <[email protected]>
commit 99f2807
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/99f28070.failed

Don't collect exiting session in smb2_reconnect_server(), because it
will be released soon.

Note that the exiting session will stay in server->smb_ses_list until
it complete the cifs_free_ipc() and logoff() and then delete itself
from the list.

	Signed-off-by: Winston Wen <[email protected]>
	Reviewed-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 99f2807)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/smb2pdu.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Winston Wen <[email protected]>
commit 66be5c4

Chech the session state and skip it if it's exiting.

	Signed-off-by: Winston Wen <[email protected]>
	Reviewed-by: Shyam Prasad N <[email protected]>
	Cc: [email protected]
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 66be5c4)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit ac615db

We do not log the session id in crypt_setup when a matching
session is not found. Printing the session id helps debugging
here. This change does just that.

This change also changes this log to FYI, since it is normal to
see then during a reconnect. Doing the same for a similar log
in case of signed connections.

The plan is to have a tracepoint for this event, so that we will
be able to see this event if need be. That will be done as
another change.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit ac615db)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Winston Wen <[email protected]>
commit ff7d80a
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/ff7d80a9.failed

We switch session state to SES_EXITING without cifs_tcp_ses_lock now,
it may lead to potential use-after-free issue.

Consider the following execution processes:

Thread 1:
__cifs_put_smb_ses()
    spin_lock(&cifs_tcp_ses_lock)
    if (--ses->ses_count > 0)
        spin_unlock(&cifs_tcp_ses_lock)
        return
    spin_unlock(&cifs_tcp_ses_lock)
        ---> **GAP**
    spin_lock(&ses->ses_lock)
    if (ses->ses_status == SES_GOOD)
        ses->ses_status = SES_EXITING
    spin_unlock(&ses->ses_lock)

Thread 2:
cifs_find_smb_ses()
    spin_lock(&cifs_tcp_ses_lock)
    list_for_each_entry(ses, ...)
        spin_lock(&ses->ses_lock)
        if (ses->ses_status == SES_EXITING)
            spin_unlock(&ses->ses_lock)
            continue
        ...
        spin_unlock(&ses->ses_lock)
    if (ret)
        cifs_smb_ses_inc_refcount(ret)
    spin_unlock(&cifs_tcp_ses_lock)

If thread 1 is preempted in the gap and thread 2 start executing, thread 2
will get the session, and soon thread 1 will switch the session state to
SES_EXITING and start releasing it, even though thread 1 had increased the
session's refcount and still uses it.

So switch session state under cifs_tcp_ses_lock to eliminate this gap.

	Signed-off-by: Winston Wen <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit ff7d80a)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 05844bd

We store the last updated time for interface list while
parsing the interfaces. This change is to just print that
info in DebugData.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 05844bd)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
cve CVE-2023-52752
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Paulo Alcantara <[email protected]>
commit d328c09

Skip SMB sessions that are being teared down
(e.g. @ses->ses_status == SES_EXITING) in cifs_debug_data_proc_show()
to avoid use-after-free in @SES.

This fixes the following GPF when reading from /proc/fs/cifs/DebugData
while mounting and umounting

  [ 816.251274] general protection fault, probably for non-canonical
  address 0x6b6b6b6b6b6b6d81: 0000 [#1] PREEMPT SMP NOPTI
  ...
  [  816.260138] Call Trace:
  [  816.260329]  <TASK>
  [  816.260499]  ? die_addr+0x36/0x90
  [  816.260762]  ? exc_general_protection+0x1b3/0x410
  [  816.261126]  ? asm_exc_general_protection+0x26/0x30
  [  816.261502]  ? cifs_debug_tcon+0xbd/0x240 [cifs]
  [  816.261878]  ? cifs_debug_tcon+0xab/0x240 [cifs]
  [  816.262249]  cifs_debug_data_proc_show+0x516/0xdb0 [cifs]
  [  816.262689]  ? seq_read_iter+0x379/0x470
  [  816.262995]  seq_read_iter+0x118/0x470
  [  816.263291]  proc_reg_read_iter+0x53/0x90
  [  816.263596]  ? srso_alias_return_thunk+0x5/0x7f
  [  816.263945]  vfs_read+0x201/0x350
  [  816.264211]  ksys_read+0x75/0x100
  [  816.264472]  do_syscall_64+0x3f/0x90
  [  816.264750]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
  [  816.265135] RIP: 0033:0x7fd5e669d381

	Cc: [email protected]
	Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit d328c09)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit c3326a6
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/c3326a61.failed

We introduced a helper function to be used by non-cifsd threads to
mark the connection for reconnect. For multichannel, when only
a particular channel needs to be reconnected, this had a bug.

This change fixes that by marking that particular channel
for reconnect.

Fixes: dca6581 ("cifs: use a different reconnect helper for non-cifsd threads")
	Cc: [email protected]
	Signed-off-by: Shyam Prasad N <[email protected]>
	Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit c3326a6)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 6e5e64c

If the mount command has specified multichannel as a mount option,
but multichannel is found to be unsupported by the server at the time
of mount, we set chan_max to 1. Which means that the user needs to
remount the share if the server starts supporting multichannel.

This change removes this reset. What it means is that if the user
specified multichannel or max_channels during mount, and at this
time, multichannel is not supported, but the server starts supporting
it at a later point, the client will be capable of scaling out the
number of channels.

	Cc: [email protected]
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 6e5e64c)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit d9a6d78
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/d9a6d780.failed

During a session reconnect, it is possible that the
server moved to another physical server (happens in case
of Azure files). So at this time, force a query of server
interfaces again (in case of multichannel session), such
that the secondary channels connect to the right
IP addresses (possibly updated now).

	Cc: [email protected]
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit d9a6d78)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 0c51cc6
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/0c51cc6f.failed

So far, SMB multichannel could only scale up, but not
scale down the number of channels. In this series of
patch, we now allow the client to deal with the case
of multichannel disabled on the server when the share
is mounted. With that change, we now need the ability
to scale down the channels.

This change allows the client to deal with cases of
missing channels more gracefully.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 0c51cc6)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
#	fs/cifs/sess.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit a6d8fb5
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/a6d8fb54.failed

Today, if the server interfaces RSS capable, we simply
choose the fastest interface to setup a channel. This is not
a scalable approach, and does not make a lot of attempt to
distribute the connections.

This change does a weighted distribution of channels across
all the available server interfaces, where the weight is
a function of the advertised interface speed.

Also make sure that we don't mix rdma and non-rdma for channels.

	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit a6d8fb5)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/sess.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit fa1d050
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/fa1d0508.failed

The refcounting of server interfaces should account
for the primary channel too. Although this is not
strictly necessary, doing so will account for the primary
channel in DebugData.

	Cc: [email protected]
	Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit fa1d050)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/sess.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 7257bcf
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/7257bcf3.failed

cifs_chan_is_iface_active checks the channels of a session to see
if the associated iface is active. This should always happen
with chan_lock held. However, these two callers of this function
were missing this locking.

This change makes sure the function calls are protected with
proper locking.

Fixes: b54034a ("cifs: during reconnect, update interface if necessary")
Fixes: fa1d050 ("cifs: account for primary channel in the interface list")
	Cc: [email protected]
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 7257bcf)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
#	fs/cifs/smb2ops.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 09eeb07

parse_server_interfaces should be in complete charge of maintaining
the iface_list linked list. Today, iface entries are removed
from the list only when the last refcount is dropped.
i.e. in release_iface. However, this can result in undercounting
of refcount if the server stops advertising interfaces (which
Azure SMB server does).

This change puts parse_server_interfaces in full charge of
maintaining the iface_list. So if an empty list is returned
by the server, the entries in the list will immediately be
removed. This way, a following call to the same function will
not find entries in the list.

Fixes: aa45dad ("cifs: change iface_list from array to sorted linked list")
	Cc: [email protected]
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 09eeb07)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 78e727e
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/78e727e5.failed

iface_last_update was an unused field when it was introduced.
Later, when we had periodic update of server interface list,
this field was used regularly to decide when to update next.

However, with the new logic of updating the interfaces, it
becomes crucial that this field be updated whenever
parse_server_interfaces runs successfully.

This change updates this field when either the server does
not support query of interfaces; so that we do not query
the interfaces repeatedly. It also updates the field when
the function reaches the end.

Fixes: aa45dad ("cifs: change iface_list from array to sorted linked list")
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 78e727e)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/smb2ops.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 6aac002

After the interface selection policy change to do a weighted
round robin, each iface maintains a weight_fulfilled. When the
weight_fulfilled reaches the total weight for the iface, we know
that the weights can be reset and ifaces can be allocated from
scratch again.

During channel allocation failures on a particular channel,
weight_fulfilled is not incremented. If a few interfaces are
inactive, we could end up in a situation where the active
interfaces are all allocated for the total_weight, and inactive
ones are all that remain. This can cause a situation where
no more channels can be allocated further.

This change fixes it by increasing weight_fulfilled, even when
channel allocation failure happens. This could mean that if
there are temporary failures in channel allocation, the iface
weights may not strictly be adhered to. But that's still okay.

Fixes: a6d8fb5 ("cifs: distribute channels across interfaces based on speed")
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 6aac002)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
cve CVE-2024-35870
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Paulo Alcantara <[email protected]>
commit 24a9799
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/24a9799a.failed

The UAF bug is due to smb2_reconnect_server() accessing a session that
is already being teared down by another thread that is executing
__cifs_put_smb_ses().  This can happen when (a) the client has
connection to the server but no session or (b) another thread ends up
setting @ses->ses_status again to something different than
SES_EXITING.

To fix this, we need to make sure to unconditionally set
@ses->ses_status to SES_EXITING and prevent any other threads from
setting a new status while we're still tearing it down.

The following can be reproduced by adding some delay to right after
the ipc is freed in __cifs_put_smb_ses() - which will give
smb2_reconnect_server() worker a chance to run and then accessing
@ses->ipc:

kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10
[disconnect srv]
ls /mnt/1 &>/dev/null
sleep 30
kdestroy
[reconnect srv]
sleep 10
umount /mnt/1
...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0
Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad
de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75
7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8
RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83
RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800
RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000
R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000
FS: 0000000000000000(0000) GS:ffff888157c00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? die_addr+0x36/0x90
 ? exc_general_protection+0x1c1/0x3f0
 ? asm_exc_general_protection+0x26/0x30
 ? __list_del_entry_valid_or_report+0x33/0xf0
 __cifs_put_smb_ses+0x1ae/0x500 [cifs]
 smb2_reconnect_server+0x4ed/0x710 [cifs]
 process_one_work+0x205/0x6b0
 worker_thread+0x191/0x360
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe2/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>

	Cc: [email protected]
	Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 24a9799)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/connect.c
jira LE-4669
cve CVE-2024-53179
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Paulo Alcantara <[email protected]>
commit 343d7fe
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/343d7fe6.failed

Customers have reported use-after-free in @ses->auth_key.response with
SMB2.1 + sign mounts which occurs due to following race:

task A                         task B
cifs_mount()
 dfs_mount_share()
  get_session()
   cifs_mount_get_session()    cifs_send_recv()
    cifs_get_smb_ses()          compound_send_recv()
     cifs_setup_session()        smb2_setup_request()
      kfree_sensitive()           smb2_calc_signature()
                                   crypto_shash_setkey() *UAF*

Fix this by ensuring that we have a valid @ses->auth_key.response by
checking whether @ses->ses_status is SES_GOOD or SES_EXITING with
@ses->ses_lock held.  After commit 24a9799 ("smb: client: fix UAF
in smb2_reconnect_server()"), we made sure to call ->logoff() only
when @SES was known to be good (e.g. valid ->auth_key.response), so
it's safe to access signing key when @ses->ses_status == SES_EXITING.

	Cc: [email protected]
	Reported-by: Jay Shin <[email protected]>
	Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 343d7fe)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/smb2transport.c
jira LE-4669
cve CVE-2025-21725
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Paulo Alcantara <[email protected]>
commit be7a6a7

It isn't guaranteed that NETWORK_INTERFACE_INFO::LinkSpeed will always
be set by the server, so the client must handle any values and then
prevent oopses like below from happening:

Oops: divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 1323 Comm: cat Not tainted 6.13.0-rc7 #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-3.fc41
04/01/2014
RIP: 0010:cifs_debug_data_proc_show+0xa45/0x1460 [cifs] Code: 00 00 48
89 df e8 3b cd 1b c1 41 f6 44 24 2c 04 0f 84 50 01 00 00 48 89 ef e8
e7 d0 1b c1 49 8b 44 24 18 31 d2 49 8d 7c 24 28 <48> f7 74 24 18 48 89
c3 e8 6e cf 1b c1 41 8b 6c 24 28 49 8d 7c 24
RSP: 0018:ffffc90001817be0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88811230022c RCX: ffffffffc041bd99
RDX: 0000000000000000 RSI: 0000000000000567 RDI: ffff888112300228
RBP: ffff888112300218 R08: fffff52000302f5f R09: ffffed1022fa58ac
R10: ffff888117d2c566 R11: 00000000fffffffe R12: ffff888112300200
R13: 000000012a15343f R14: 0000000000000001 R15: ffff888113f2db58
FS: 00007fe27119e740(0000) GS:ffff888148600000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe2633c5000 CR3: 0000000124da0000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die_body.cold+0x19/0x27
 ? die+0x2e/0x50
 ? do_trap+0x159/0x1b0
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? do_error_trap+0x90/0x130
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? exc_divide_error+0x39/0x50
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? asm_exc_divide_error+0x1a/0x20
 ? cifs_debug_data_proc_show+0xa39/0x1460 [cifs]
 ? cifs_debug_data_proc_show+0xa45/0x1460 [cifs]
 ? seq_read_iter+0x42e/0x790
 seq_read_iter+0x19a/0x790
 proc_reg_read_iter+0xbe/0x110
 ? __pfx_proc_reg_read_iter+0x10/0x10
 vfs_read+0x469/0x570
 ? do_user_addr_fault+0x398/0x760
 ? __pfx_vfs_read+0x10/0x10
 ? find_held_lock+0x8a/0xa0
 ? __pfx_lock_release+0x10/0x10
 ksys_read+0xd3/0x170
 ? __pfx_ksys_read+0x10/0x10
 ? __rcu_read_unlock+0x50/0x270
 ? mark_held_locks+0x1a/0x90
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe271288911
Code: 00 48 8b 15 01 25 10 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8
20 ad 01 00 f3 0f 1e fa 80 3d b5 a7 10 00 00 74 13 31 c0 0f 05 <48> 3d
00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
RSP: 002b:00007ffe87c079d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007fe271288911
RDX: 0000000000040000 RSI: 00007fe2633c6000 RDI: 0000000000000003
RBP: 00007ffe87c07a00 R08: 0000000000000000 R09: 00007fe2713e6380
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
R13: 00007fe2633c6000 R14: 0000000000000003 R15: 0000000000000000
 </TASK>

Fix this by setting cifs_server_iface::speed to a sane value (1Gbps)
by default when link speed is unset.

	Cc: Shyam Prasad N <[email protected]>
	Cc: Tom Talpey <[email protected]>
Fixes: a6d8fb5 ("cifs: distribute channels across interfaces based on speed")
	Reported-by: Frank Sorenson <[email protected]>
	Reported-by: Jay Shin <[email protected]>
	Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit be7a6a7)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit c184689
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/c1846893.failed

When the server interface info changes (more common in clustered
servers like Azure Files), the per-channel iface gets updated.
However, this did not update the corresponding dstaddr. As a result
these channels will still connect (or try connecting) to older addresses.

Fixes: b54034a ("cifs: during reconnect, update interface if necessary")
	Cc: <[email protected]>
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit c184689)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/sess.c
jira LE-4669
cve CVE-2025-38244
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Paulo Alcantara <[email protected]>
commit 711741f
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/711741f9.failed

Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order
and prevent the following deadlock from happening

======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc3-build2+ #1301 Tainted: G S      W
------------------------------------------------------
cifsd/6055 is trying to acquire lock:
ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200

but task is already holding lock:
ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&ret_buf->chan_lock){+.+.}-{3:3}:
       validate_chain+0x1cf/0x270
       __lock_acquire+0x60e/0x780
       lock_acquire.part.0+0xb4/0x1f0
       _raw_spin_lock+0x2f/0x40
       cifs_setup_session+0x81/0x4b0
       cifs_get_smb_ses+0x771/0x900
       cifs_mount_get_session+0x7e/0x170
       cifs_mount+0x92/0x2d0
       cifs_smb3_do_mount+0x161/0x460
       smb3_get_tree+0x55/0x90
       vfs_get_tree+0x46/0x180
       do_new_mount+0x1b0/0x2e0
       path_mount+0x6ee/0x740
       do_mount+0x98/0xe0
       __do_sys_mount+0x148/0x180
       do_syscall_64+0xa4/0x260
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&ret_buf->ses_lock){+.+.}-{3:3}:
       validate_chain+0x1cf/0x270
       __lock_acquire+0x60e/0x780
       lock_acquire.part.0+0xb4/0x1f0
       _raw_spin_lock+0x2f/0x40
       cifs_match_super+0x101/0x320
       sget+0xab/0x270
       cifs_smb3_do_mount+0x1e0/0x460
       smb3_get_tree+0x55/0x90
       vfs_get_tree+0x46/0x180
       do_new_mount+0x1b0/0x2e0
       path_mount+0x6ee/0x740
       do_mount+0x98/0xe0
       __do_sys_mount+0x148/0x180
       do_syscall_64+0xa4/0x260
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}:
       check_noncircular+0x95/0xc0
       check_prev_add+0x115/0x2f0
       validate_chain+0x1cf/0x270
       __lock_acquire+0x60e/0x780
       lock_acquire.part.0+0xb4/0x1f0
       _raw_spin_lock+0x2f/0x40
       cifs_signal_cifsd_for_reconnect+0x134/0x200
       __cifs_reconnect+0x8f/0x500
       cifs_handle_standard+0x112/0x280
       cifs_demultiplex_thread+0x64d/0xbc0
       kthread+0x2f7/0x310
       ret_from_fork+0x2a/0x230
       ret_from_fork_asm+0x1a/0x30

other info that might help us debug this:

Chain exists of:
  &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ret_buf->chan_lock);
                               lock(&ret_buf->ses_lock);
                               lock(&ret_buf->chan_lock);
  lock(&tcp_ses->srv_lock);

 *** DEADLOCK ***

3 locks held by cifsd/6055:
 #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200
 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200
 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200

	Cc: [email protected]
	Reported-by: David Howells <[email protected]>
Fixes: d7d7a66 ("cifs: avoid use of global locks for high contention data")
	Reviewed-by: David Howells <[email protected]>
	Tested-by: David Howells <[email protected]>
	Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
	Signed-off-by: David Howells <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 711741f)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/cifsglob.h
#	fs/cifs/connect.c
jira LE-4669
cve CVE-2024-35999
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Steve French <[email protected]>
commit 8094a60

Coverity spotted a place where we should have been holding the
channel lock when accessing the ses channel index.

Addresses-Coverity: 1582039 ("Data race condition (MISSING_LOCK)")
	Cc: [email protected]
	Reviewed-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 8094a60)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 66d590b
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/66d590b8.failed

Our current approach to select a channel for sending requests is this:
1. iterate all channels to find the min and max queue depth
2. if min and max are not the same, pick the channel with min depth
3. if min and max are same, round robin, as all channels are equally loaded

The problem with this approach is that there's a lag between selecting
a channel and sending the request (that increases the queue depth on the channel).
While these numbers will eventually catch up, there could be a skew in the
channel usage, depending on the application's I/O parallelism and the server's
speed of handling requests.

With sufficient parallelism, this lag can artificially increase the queue depth,
thereby impacting the performance negatively.

This change will change the step 1 above to start the iteration from the last
selected channel. This is to reduce the skew in channel usage even in the presence
of this lag.

Fixes: ea90708 ("cifs: use the least loaded channel for sending requests")
	Cc: <[email protected]>
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 66d590b)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/transport.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 9d5eff7
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/9d5eff78.failed

We now do a weighted selection of server interfaces when allocating
new channels. The weights are decided based on the speed advertised.
The fulfilled weight for an interface is a counter that is used to
track the interface selection. It should be reset back to zero once
all interfaces fulfilling their weight.

In cifs_chan_update_iface, this reset logic was missing. As a result
when the server interface list changes, the client may not be able
to find a new candidate for other channels after all interfaces have
been fulfilled.

Fixes: a6d8fb5 ("cifs: distribute channels across interfaces based on speed")
	Cc: <[email protected]>
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 9d5eff7)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	fs/cifs/sess.c
jira LE-4669
Rebuild_History Non-Buildable kernel-4.18.0-553.82.1.el8_10
commit-author Shyam Prasad N <[email protected]>
commit 29954d5

My last change in this area introduced a change which
accounted for primary channel in the interface ref count.
However, it did not reduce this ref count on deallocation
of the primary channel. i.e. during umount.

Fixing this leak here, by dropping this ref count for
primary channel while freeing up the session.

Fixes: fa1d050 ("cifs: account for primary channel in the interface list")
	Cc: [email protected]
	Reported-by: Paulo Alcantara <[email protected]>
	Signed-off-by: Shyam Prasad N <[email protected]>
	Signed-off-by: Steve French <[email protected]>
(cherry picked from commit 29954d5)
	Signed-off-by: Jonathan Maple <[email protected]>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567757
Number of commits in rpm: 155
Number of commits matched with upstream: 145 (93.55%)
Number of commits in upstream but not in rpm: 567612
Number of commits NOT found in upstream: 10 (6.45%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.82.1.el8_10 for kernel-4.18.0-553.82.1.el8_10
Clean Cherry Picks: 74 (51.03%)
Empty Cherry Picks: 71 (48.97%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.82.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
@PlaidCat PlaidCat requested a review from a team November 6, 2025 22:05
@PlaidCat PlaidCat self-assigned this Nov 6, 2025
Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

Copy link

@jdieter jdieter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants