Skip to content

Commit b3b9dc5

Browse files
committed
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
jira LE-1907 cve CVE-2024-26906 Rebuild_History Non-Buildable kernel-4.18.0-553.8.1.el8_10 commit-author Hou Tao <[email protected]> commit 32019c6 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.8.1.el8_10/32019c65.failed When trying to use copy_from_kernel_nofault() to read vsyscall page through a bpf program, the following oops was reported: BUG: unable to handle page fault for address: ffffffffff600000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:copy_from_kernel_nofault+0x6f/0x110 ...... Call Trace: <TASK> ? copy_from_kernel_nofault+0x6f/0x110 bpf_probe_read_kernel+0x1d/0x50 bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d trace_call_bpf+0xc5/0x1c0 perf_call_bpf_enter.isra.0+0x69/0xb0 perf_syscall_enter+0x13e/0x200 syscall_trace_enter+0x188/0x1c0 do_syscall_64+0xb5/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> ...... ---[ end trace 0000000000000000 ]--- The oops is triggered when: 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall page and invokes copy_from_kernel_nofault() which in turn calls __get_user_asm(). 2) Because the vsyscall page address is not readable from kernel space, a page fault exception is triggered accordingly. 3) handle_page_fault() considers the vsyscall page address as a user space address instead of a kernel space address. This results in the fix-up setup by bpf not being applied and a page_fault_oops() is invoked due to SMAP. Considering handle_page_fault() has already considered the vsyscall page address as a userspace address, fix the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Originally-by: Thomas Gleixner <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com Reported-by: xingwei lee <[email protected]> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com Signed-off-by: Hou Tao <[email protected]> Reviewed-by: Sohil Mehta <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> (cherry picked from commit 32019c6) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # arch/x86/mm/maccess.c
1 parent a5b59c0 commit b3b9dc5

File tree

1 file changed

+145
-0
lines changed

1 file changed

+145
-0
lines changed
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
2+
3+
jira LE-1907
4+
cve CVE-2024-26906
5+
Rebuild_History Non-Buildable kernel-4.18.0-553.8.1.el8_10
6+
commit-author Hou Tao <[email protected]>
7+
commit 32019c659ecfe1d92e3bf9fcdfbb11a7c70acd58
8+
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
9+
Will be included in final tarball splat. Ref for failed cherry-pick at:
10+
ciq/ciq_backports/kernel-4.18.0-553.8.1.el8_10/32019c65.failed
11+
12+
When trying to use copy_from_kernel_nofault() to read vsyscall page
13+
through a bpf program, the following oops was reported:
14+
15+
BUG: unable to handle page fault for address: ffffffffff600000
16+
#PF: supervisor read access in kernel mode
17+
#PF: error_code(0x0000) - not-present page
18+
PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0
19+
Oops: 0000 [#1] PREEMPT SMP PTI
20+
CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58
21+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
22+
RIP: 0010:copy_from_kernel_nofault+0x6f/0x110
23+
......
24+
Call Trace:
25+
<TASK>
26+
? copy_from_kernel_nofault+0x6f/0x110
27+
bpf_probe_read_kernel+0x1d/0x50
28+
bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d
29+
trace_call_bpf+0xc5/0x1c0
30+
perf_call_bpf_enter.isra.0+0x69/0xb0
31+
perf_syscall_enter+0x13e/0x200
32+
syscall_trace_enter+0x188/0x1c0
33+
do_syscall_64+0xb5/0xe0
34+
entry_SYSCALL_64_after_hwframe+0x6e/0x76
35+
</TASK>
36+
......
37+
---[ end trace 0000000000000000 ]---
38+
39+
The oops is triggered when:
40+
41+
1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
42+
page and invokes copy_from_kernel_nofault() which in turn calls
43+
__get_user_asm().
44+
45+
2) Because the vsyscall page address is not readable from kernel space,
46+
a page fault exception is triggered accordingly.
47+
48+
3) handle_page_fault() considers the vsyscall page address as a user
49+
space address instead of a kernel space address. This results in the
50+
fix-up setup by bpf not being applied and a page_fault_oops() is invoked
51+
due to SMAP.
52+
53+
Considering handle_page_fault() has already considered the vsyscall page
54+
address as a userspace address, fix the problem by disallowing vsyscall
55+
page read for copy_from_kernel_nofault().
56+
57+
Originally-by: Thomas Gleixner <[email protected]>
58+
Reported-by: [email protected]
59+
Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com
60+
Reported-by: xingwei lee <[email protected]>
61+
Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com
62+
Signed-off-by: Hou Tao <[email protected]>
63+
Reviewed-by: Sohil Mehta <[email protected]>
64+
Acked-by: Thomas Gleixner <[email protected]>
65+
Link: https://lore.kernel.org/r/[email protected]
66+
Signed-off-by: Alexei Starovoitov <[email protected]>
67+
(cherry picked from commit 32019c659ecfe1d92e3bf9fcdfbb11a7c70acd58)
68+
Signed-off-by: Jonathan Maple <[email protected]>
69+
70+
# Conflicts:
71+
# arch/x86/mm/maccess.c
72+
diff --cc arch/x86/mm/maccess.c
73+
index e3b7882e4a9a,42115ac079cf..000000000000
74+
--- a/arch/x86/mm/maccess.c
75+
+++ b/arch/x86/mm/maccess.c
76+
@@@ -3,36 -3,41 +3,61 @@@
77+
#include <linux/uaccess.h>
78+
#include <linux/kernel.h>
79+
80+
+ #include <asm/vsyscall.h>
81+
+
82+
#ifdef CONFIG_X86_64
83+
-bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
84+
+static __always_inline bool invalid_probe_range(u64 vaddr)
85+
{
86+
- unsigned long vaddr = (unsigned long)unsafe_src;
87+
-
88+
/*
89+
- * Do not allow userspace addresses. This disallows
90+
- * normal userspace and the userspace guard page:
91+
+ * Range covering the highest possible canonical userspace address
92+
+ * as well as non-canonical address range. For the canonical range
93+
+ * we also need to include the userspace guard page.
94+
*/
95+
++<<<<<<< HEAD
96+
+ return vaddr < TASK_SIZE_MAX + PAGE_SIZE ||
97+
+ !__is_canonical_address(vaddr, boot_cpu_data.x86_virt_bits);
98+
++=======
99+
+ if (vaddr < TASK_SIZE_MAX + PAGE_SIZE)
100+
+ return false;
101+
+
102+
+ /*
103+
+ * Reading from the vsyscall page may cause an unhandled fault in
104+
+ * certain cases. Though it is at an address above TASK_SIZE_MAX, it is
105+
+ * usually considered as a user space address.
106+
+ */
107+
+ if (is_vsyscall_vaddr(vaddr))
108+
+ return false;
109+
+
110+
+ /*
111+
+ * Allow everything during early boot before 'x86_virt_bits'
112+
+ * is initialized. Needed for instruction decoding in early
113+
+ * exception handlers.
114+
+ */
115+
+ if (!boot_cpu_data.x86_virt_bits)
116+
+ return true;
117+
+
118+
+ return __is_canonical_address(vaddr, boot_cpu_data.x86_virt_bits);
119+
++>>>>>>> 32019c659ecf (x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault())
120+
}
121+
#else
122+
-bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
123+
+static __always_inline bool invalid_probe_range(u64 vaddr)
124+
{
125+
- return (unsigned long)unsafe_src >= TASK_SIZE_MAX;
126+
+ return vaddr < TASK_SIZE_MAX;
127+
}
128+
#endif
129+
+
130+
+long probe_kernel_read_strict(void *dst, const void *src, size_t size)
131+
+{
132+
+ if (unlikely(invalid_probe_range((unsigned long)src)))
133+
+ return -EFAULT;
134+
+
135+
+ return __probe_kernel_read(dst, src, size);
136+
+}
137+
+
138+
+long strncpy_from_unsafe_strict(char *dst, const void *unsafe_addr, long count)
139+
+{
140+
+ if (unlikely(invalid_probe_range((unsigned long)unsafe_addr)))
141+
+ return -EFAULT;
142+
+
143+
+ return __strncpy_from_unsafe(dst, unsafe_addr, count);
144+
+}
145+
* Unmerged path arch/x86/mm/maccess.c

0 commit comments

Comments
 (0)