Skip to content

Conversation

@NewEraCracker
Copy link

@NewEraCracker NewEraCracker commented Nov 19, 2016

Fixes standlone compilation. Thanks fly to Grarak, eng.stk and Franco.

haokexin and others added 3 commits November 26, 2016 17:12
The arm64 kernel builds fine without the libgcc. Actually it should not
be used at all in the kernel. The following are the reasons indicated
by Russell King:

  Although libgcc is part of the compiler, libgcc is built with the
  expectation that it will be running in userland - it expects to link
  to a libc.  That's why you can't build libgcc without having the glibc
  headers around.

  [...]

  Meanwhile, having the kernel build the compiler support functions that
  it needs ensures that (a) we know what compiler support functions are
  being used, (b) we know the implementation of those support functions
  are sane for use in the kernel, (c) we can build them with appropriate
  compiler flags for best performance, and (d) we remove an unnecessary
  dependency on the build toolchain.

Signed-off-by: Kevin Hao <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: engstk <[email protected]>
Signed-off-by: NewEraCracker <[email protected]>
@NewEraCracker
Copy link
Author

More fixes have been added. Tested and working.

@NewEraCracker
Copy link
Author

Allows easy building with either "android-ndk-r13b-linux-x86_64.zip" or "gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnu.tar.xz" toolchains.

mstsirkin and others added 4 commits December 6, 2016 23:30
virtio wants to read bitwise types from userspace using get_user.  At the
moment this triggers sparse errors, since the value is passed through an
integer.

Fix that up using __force.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Will Deacon <[email protected]>

Bug: 31432001
Change-Id: I1a941e8079cd603456d1ae034a9e708520b26b65
(cherry picked from commit 58fff51)
Signed-off-by: Sami Tolvanen <[email protected]>
Signed-off-by: engstk <[email protected]>
AArch64 toolchains suffer from the following bug:

$ cat blah.S
1:
	.inst	0x01020304
	.if ((. - 1b) != 4)
		.error	"blah"
	.endif
$ aarch64-linux-gnu-gcc -c blah.S
blah.S: Assembler messages:
blah.S:3: Error: non-constant expression in ".if" statement

which precludes the use of msr_s and co as part of alternatives.

We workaround this issue by not directly testing the labels
themselves, but by moving the current output pointer by a value
that should always be zero. If this value is not null, then
we will trigger a backward move, which is expclicitely forbidden.
This triggers the error we're after:

  AS      arch/arm64/kvm/hyp.o
arch/arm64/kvm/hyp.S: Assembler messages:
arch/arm64/kvm/hyp.S:1377: Error: attempt to move .org backwards
scripts/Makefile.build:294: recipe for target 'arch/arm64/kvm/hyp.o' failed
make[1]: *** [arch/arm64/kvm/hyp.o] Error 1
Makefile:946: recipe for target 'arch/arm64/kvm' failed

Not pretty, but at least works on the current toolchains.

Acked-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Park Ju Hyung <[email protected]>
Signed-off-by: engstk <[email protected]>
Change-Id: I710d6e534ac0a5a94f2a3cf38c987ddf74841446
Signed-off-by: Francisco Franco <[email protected]>
Change-Id: I1df5b2e03ec613fdd97ce52cf87a3d87e600ac04
Signed-off-by: Francisco Franco <[email protected]>
Villemoes and others added 5 commits December 16, 2016 17:21
strncpy_from_user.c only needs EXPORT_SYMBOL, so just include compiler.h
and export.h instead of the whole module.h machinery.

Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: engstk <[email protected]>
* otherwise there is miscommunication between kernel and our DSP

Signed-off-by: Alex Naidis <[email protected]>
Signed-off-by: engstk <[email protected]>
Change-Id: Ie2e60f58fb809c0173adcfab838da19e17775117

Signed-off-by: engstk <[email protected]>
* extends 9979312490b0b6b3edaf3785543b08be2c7f04dd
* found by arter97

Change-Id: I838298628db13417f77a0532150cb4f47a74141d
Signed-off-by: Alex Naidis <[email protected]>
Signed-off-by: engstk <[email protected]>
Vikram reported that his ARM64 compiler managed to 'optimize' away the
preempt_count manipulations in code like:

	preempt_enable_no_resched();
	put_user();
	preempt_disable();

Irrespective of that fact that that is horrible code that should be
fixed for many reasons, it does highlight a deficiency in the generic
preempt_count manipulators. As it is never right to combine/elide
preempt_count manipulations like this.

Therefore sprinkle some volatile in the two generic accessors to
ensure the compiler is aware of the fact that the preempt_count is
observed outside of the regular program-order view and thus cannot be
optimized away like this.

x86; the only arch not using the generic code is not affected as we
do all this in asm in order to use the segment base per-cpu stuff.

Reported-by: Vikram Mulukutla <[email protected]>
Tested-by: Vikram Mulukutla <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: a787870 ("sched, arch: Create asm/preempt.h")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Bug: 30147418
(cherry picked from commit 2e636d5e66c35dfcbaf617aa8fa963f6847478fe)
Signed-off-by: Patrick Tjin <[email protected]>
Change-Id: I88dec230b1fb14e4bf6e1497d088198feae1cf74
Signed-off-by: Francisco Franco <[email protected]>
@NewEraCracker NewEraCracker changed the title Fix standalone compiling Fix standalone compiling (OnePlus 3 Marshmallow) Jan 20, 2017
@NewEraCracker NewEraCracker deleted the oneplus3/6.0.1-bldfix branch March 19, 2017 20:20
NewEraCracker pushed a commit to NewEraCracker/android_kernel_oneplus_msm8996 that referenced this pull request Apr 20, 2017
commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 OnePlusOSS#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 OnePlusOSS#9 [] tcp_rcv_established at ffffffff81580b64
OnePlusOSS#10 [] tcp_v4_do_rcv at ffffffff8158b54a
OnePlusOSS#11 [] tcp_v4_rcv at ffffffff8158cd02
OnePlusOSS#12 [] ip_local_deliver_finish at ffffffff815668f4
OnePlusOSS#13 [] ip_local_deliver at ffffffff81566bd9
OnePlusOSS#14 [] ip_rcv_finish at ffffffff8156656d
OnePlusOSS#15 [] ip_rcv at ffffffff81566f06
OnePlusOSS#16 [] __netif_receive_skb_core at ffffffff8152b3a2
OnePlusOSS#17 [] __netif_receive_skb at ffffffff8152b608
OnePlusOSS#18 [] netif_receive_skb at ffffffff8152b690
OnePlusOSS#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
OnePlusOSS#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
OnePlusOSS#21 [] net_rx_action at ffffffff8152bac2
OnePlusOSS#22 [] __do_softirq at ffffffff81084b4f
OnePlusOSS#23 [] call_softirq at ffffffff8164845c
OnePlusOSS#24 [] do_softirq at ffffffff81016fc5
OnePlusOSS#25 [] irq_exit at ffffffff81084ee5
OnePlusOSS#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
NewEraCracker pushed a commit to NewEraCracker/android_kernel_oneplus_msm8996 that referenced this pull request Jul 13, 2017
commit 4dfce57db6354603641132fac3c887614e3ebe81 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 OnePlusOSS#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
OnePlusOSS#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
OnePlusOSS#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
OnePlusOSS#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
OnePlusOSS#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
OnePlusOSS#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
OnePlusOSS#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
OnePlusOSS#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
OnePlusOSS#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
OnePlusOSS#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
OnePlusOSS#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
OnePlusOSS#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
OnePlusOSS#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
OnePlusOSS#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Martinusbe referenced this pull request in GZR-Kernels/kernel_oneplus_msm8996 Jul 24, 2017
commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Signed-off-by: Martinusbe <[email protected]>
Martinusbe referenced this pull request in GZR-Kernels/kernel_oneplus_msm8996 Jul 24, 2017
commit 4dfce57db6354603641132fac3c887614e3ebe81 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Signed-off-by: Martinusbe <[email protected]>
nathanchance pushed a commit to android-linux-stable/op3 that referenced this pull request May 30, 2018
[ Upstream commit 2bbea6e117357d17842114c65e9a9cf2d13ae8a3 ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 OnePlusOSS#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 OnePlusOSS#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 OnePlusOSS#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 OnePlusOSS#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 OnePlusOSS#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 OnePlusOSS#8 [ffff880078447da8] mount_bdev at ffffffff81202570
 OnePlusOSS#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
OnePlusOSS#10 [ffff880078447e28] mount_fs at ffffffff81202d09
OnePlusOSS#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
OnePlusOSS#12 [ffff880078447ea8] do_mount at ffffffff81220fee
OnePlusOSS#13 [ffff880078447f28] sys_mount at ffffffff812218d6
OnePlusOSS#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 OnePlusOSS#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 OnePlusOSS#4 [ffff8800784179d0] down_read at ffffffff8168cde0
 OnePlusOSS#5 [ffff8800784179e8] get_super at ffffffff81201cc7
 OnePlusOSS#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 OnePlusOSS#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 OnePlusOSS#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 OnePlusOSS#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
OnePlusOSS#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
OnePlusOSS#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
OnePlusOSS#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
OnePlusOSS#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
OnePlusOSS#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
OnePlusOSS#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
OnePlusOSS#16 [ffff880078417d00] do_last at ffffffff8120d53d
OnePlusOSS#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
OnePlusOSS#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
OnePlusOSS#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
OnePlusOSS#20 [ffff880078417f70] sys_open at ffffffff811fde4e
OnePlusOSS#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nathanchance pushed a commit to android-linux-stable/op3 that referenced this pull request Jan 26, 2019
[ Upstream commit ae460c115b7aa50c9a36cf78fced07b27962c9d0 ]

On our AT91SAM9260 board we use the same sdio bus for wifi and for the
sd card slot. This caused the atmel-mci to give the following splat on
the serial console:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 538 at drivers/mmc/host/atmel-mci.c:859 atmci_send_command+0x24/0x44
  Modules linked in:
  CPU: 0 PID: 538 Comm: mmcqd/0 Not tainted 4.14.76 OnePlusOSS#14
  Hardware name: Atmel AT91SAM9
  [<c000fccc>] (unwind_backtrace) from [<c000d3dc>] (show_stack+0x10/0x14)
  [<c000d3dc>] (show_stack) from [<c0017644>] (__warn+0xd8/0xf4)
  [<c0017644>] (__warn) from [<c0017704>] (warn_slowpath_null+0x1c/0x24)
  [<c0017704>] (warn_slowpath_null) from [<c033bb9c>] (atmci_send_command+0x24/0x44)
  [<c033bb9c>] (atmci_send_command) from [<c033e984>] (atmci_start_request+0x1f4/0x2dc)
  [<c033e984>] (atmci_start_request) from [<c033f3b4>] (atmci_request+0xf0/0x164)
  [<c033f3b4>] (atmci_request) from [<c0327108>] (mmc_start_request+0x280/0x2d0)
  [<c0327108>] (mmc_start_request) from [<c032800c>] (mmc_start_areq+0x230/0x330)
  [<c032800c>] (mmc_start_areq) from [<c03366f8>] (mmc_blk_issue_rw_rq+0xc4/0x310)
  [<c03366f8>] (mmc_blk_issue_rw_rq) from [<c03372c4>] (mmc_blk_issue_rq+0x118/0x5ac)
  [<c03372c4>] (mmc_blk_issue_rq) from [<c033781c>] (mmc_queue_thread+0xc4/0x118)
  [<c033781c>] (mmc_queue_thread) from [<c002daf8>] (kthread+0x100/0x118)
  [<c002daf8>] (kthread) from [<c000a580>] (ret_from_fork+0x14/0x34)
  ---[ end trace 594371ddfa284bd6 ]---

This is:
  WARN_ON(host->cmd);

This was fixed on our board by letting atmci_request_end determine what
state we are in. Instead of unconditionally setting it to STATE_IDLE on
STATE_END_REQUEST.

Signed-off-by: Jonas Danielsson <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
ppajda pushed a commit to ppajda/android_kernel_oneplus_msm8996 that referenced this pull request Sep 8, 2019
commit cf3591ef832915892f2499b7e54b51d4c578b28c upstream.

Revert the commit bd293d071ffe65e645b4d8104f9d8fe15ea13862. The proper
fix has been made available with commit d0a255e795ab ("loop: set
PF_MEMALLOC_NOIO for the worker thread").

Note that the fix offered by commit bd293d071ffe doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   OnePlusOSS#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   OnePlusOSS#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   OnePlusOSS#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   OnePlusOSS#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   OnePlusOSS#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   OnePlusOSS#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   OnePlusOSS#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   OnePlusOSS#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   OnePlusOSS#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  OnePlusOSS#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  OnePlusOSS#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  OnePlusOSS#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  OnePlusOSS#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  OnePlusOSS#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Change-Id: I6cd6b254cca8cb8e72c197d44b91790c1a799554
ppajda pushed a commit to ppajda/android_kernel_oneplus_msm8996 that referenced this pull request Jun 23, 2020
…nded

commit 331dcf421c34d227784d07943eb01e4023a42b0a upstream.

If the i2c device is already runtime suspended, if qup_i2c_suspend is
executed during suspend-to-idle or suspend-to-ram it will result in the
following splat:

WARNING: CPU: 3 PID: 1593 at drivers/clk/clk.c:476 clk_core_unprepare+0x80/0x90
Modules linked in:

CPU: 3 PID: 1593 Comm: bash Tainted: G        W       4.8.0-rc3 OnePlusOSS#14
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
PC is at clk_core_unprepare+0x80/0x90
LR is at clk_unprepare+0x28/0x40
pc : [<ffff0000086eecf0>] lr : [<ffff0000086f0c58>] pstate: 60000145
Call trace:
 clk_core_unprepare+0x80/0x90
 qup_i2c_disable_clocks+0x2c/0x68
 qup_i2c_suspend+0x10/0x20
 platform_pm_suspend+0x24/0x68
 ...

This patch fixes the issue by executing qup_i2c_pm_suspend_runtime
conditionally in qup_i2c_suspend.

Signed-off-by: Sudeep Holla <[email protected]>
Reviewed-by: Andy Gross <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Change-Id: Idf53f82075dedbb8a9da9d4004ee0cdd26a23377
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants