Suspend/Resume and S0ix implementation #57

ranj063 · 2018-07-25T05:07:47Z

This patchset deals with implementation the suspend/resume flow for PM in the SOF driver.
The main differences from the previous pull request are as follows:

Fixed a bunch of mem leaks
stick to attaching the PM callbacks for the pci/acpi/spi device as originally intended and allow
runtime_pm for these devices.
Modify load_firmware() to indicate first boot

Still only tested on up squared board with a single pipeline. Multiple pipelines testing is WIP.

ranj063 · 2018-07-25T05:17:28Z

@plbossart @lgirdwood This is the new series for suspend/resume implementation. Could you please help review?

ranj063 · 2018-07-26T05:31:08Z

@plbossart @lgirdwood Update for 07/25:

I have tested this PR with a 2 pipeline topology on the up squared board containing a playback pipeline and a dmic capture pipeline and it worked without any issues.

lgirdwood

I've only got some minor formatting and indentation nitpicks that can be fixed later incrementally.

lgirdwood · 2018-07-26T11:26:22Z

sound/soc/sof/topology.c

newline can be removed here and maybe other places.

lgirdwood · 2018-07-26T11:27:22Z

sound/soc/sof/topology.c

dito, newline between set and check ret.

lgirdwood · 2018-07-26T11:32:50Z

sound/soc/sof/topology.c

lgirdwood · 2018-07-26T11:34:54Z

sound/soc/sof/topology.c

I've noticed a lot of these where params are spread over three lines where they should fit int two. It makes it easier for me to read if they are on two (and closer to the conditional check).

plbossart

please fix compilation issues, thanks!

plbossart · 2018-07-26T17:14:27Z

sound/soc/sof/topology.c

the second patch does not compile, please make sure each patch can be compiled to avoid breaking git bisect
/data/pbossart/ktest/sof-dev/sound/soc/sof/topology.c: In function ‘snd_sof_free_topology’:
/data/pbossart/ktest/sof-dev/sound/soc/sof/topology.c:2130:48: error: ‘struct snd_sof_dev’ has no member named ‘route_list’; did you mean ‘pcm_list’?

plbossart · 2018-07-26T17:17:03Z

sound/soc/sof/topology.c

/data/pbossart/ktest/sof-dev/sound/soc/sof/topology.c: In function ‘sof_widget_ready’:
/data/pbossart/ktest/sof-dev/sound/soc/sof/topology.c:1421:5: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (ret < 0 || reply.rhdr.error < 0) {

plbossart · 2018-07-26T17:21:08Z

sound/soc/sof/pm.c

/data/pbossart/ktest/sof-dev/sound/soc/sof/pm.c: In function ‘sof_suspend_streams’:
/data/pbossart/ktest/sof-dev/sound/soc/sof/pm.c:177:9: error: ‘struct snd_sof_pcm’ has no member named ‘restore_stream’
spcm->restore_stream[dir] = 1;
^~
/data/pbossart/ktest/sof-dev/sound/soc/sof/pm.c:194:9: error: ‘struct snd_sof_pcm’ has no member named ‘restore_stream’
spcm->restore_stream[dir] = 1;
^~

ranj063 · 2018-07-26T19:56:59Z

@plbossart let me look into the compilation issue. I might have to re-order the commits.

ranj063 · 2018-07-26T21:27:51Z

@plbossart @lgirdwood I have addressed both of your comments in the latest update

Add suspend/resume callbacks for APL. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

…ions Create an route_list in snd_sof_dev that will be used to store the dapm routes while parsing topology. This list will be used for restoring the pipeline connections during suspend/resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Save the ipc comp data so that the pipeline components can be restored during PM resume. This patch also handles deleting the DAPM routes before the pipeline widgets are unloaded and the associated data is freed. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Move the code to send ipc for initializing trace into a separate function that can also be called during suspend/resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

This will be called during resume to send ipc for pipeline completion. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Decrement device usage counter for both the sof device and the pci/acpi/spi device. Without this change the runtime_usage count never reaches 0 and the device will not suspend. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

set the kcontrol cmd which will be used to send the correct ipc command to restore volume control value during resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Allow runtime_pm to be enabled after probe is finished for pci/acpi/spi devices. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Modify the signature for load_firmware to add a flag indicating if this is the first time the DSP is being booted up. Based on the platform specific implementation, this flag can be used to decide whether the firmware can be booted from memory in self-retention or to be downloaded again. Note that this patch does not change the implementation of any platform specific load_firmware() method. It merely adds a flag that can be used to modify the implementation later if needed. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

This patch adds the changes required to save the pcm hw_params that will be used to restart streams during system resume. It also implements the flow for pcm resume trigger and handles the stop/pause_release triggersafter resume, There are 3 possible situations when the system resumes from sleep: 1. If the stream was running at suspend, the hw_params is restored and the stream started from the last know host dma position. 2. If the stream was paused at suspend and the user undoes pause after resume, the SNDRV_PCM_TRIGGER_RESUME does not get invoked for such streams. So these streams need to marked for hw_params to be restored at resume and started from the paused host dma position. 3. If the stream was paused at suspend and the user stops playback after resume, the trigger callback method should return without any further action because the stream has not been set up after resume anyway. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

This is the initial implementation for PM and runtime PM callbacks in the SOF driver. The suspend callback includes: suspend all pcm's stream that are running, send CTX_SAVE ipc, drop all ipc's, release trace dma and then power off the DSP. And the resume callback performs the following steps: load FW, run FW, re-initialize trace, restore pipeline, restore the kcontrol values and finally send the ctx restore ipc to the dsp. The streams that are suspended are resumed by the ALSA resume trigger. If the streams are paused during system suspend, they are marked explicitly so they can be restored during PAUSE_RELEASE. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Update the dai config with the pdm config information after parsing the pdm tokens. This will be needed to restore the dai config after suspend/resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Suspend/resume callbacks are currently set only for APL. This patch will prevent errors on other platforms for which the callbacks are not implemented as yet. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Previously, dai config was updated only for one DAI with the same name as the link name. But in the case of duplex pcm's, there will be two DAI's with the same name and both of them need to be updated. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

Skip adding virtual routes to the route list. This will prevent sending the routes to the FW during system resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ranj063 · 2018-08-08T18:21:25Z

@plbossart I have updated this pull request based on the findings on suspend/resume tests on chrome. Could I please request you to review the changes?

The main changes are:

configure all DAI's with the same name
Skip virtual routes from being saved to the route list
modified pcm trigger to account for resuming active pcm streams in chrome.

naveen-manohar · 2018-08-09T05:51:23Z

@plbossart, @ranj063 I have verified D0ix functionality with the mentioned 15 patches(6b799ab)

lgirdwood · 2018-08-09T12:07:37Z

@plbossart I'll merge this in upstream v2 when merged.

plbossart · 2018-08-09T16:40:07Z

@lgirdwood I think we should do a partial merge. There is an awful amount of code related to topology handling that could do in the first upstream batch (and would work without suspend/resume). It may be a good thing to split this set in two, one as groundwork and the second as suspend/resume/runtime_pm proper.

plbossart · 2018-08-09T16:59:02Z

This patch set has become too big to review in details, so I'll merge it and we'll have to deal with issues in follow-up patches.

Hangbin Liu says: ==================== fix dev null pointer dereference when send packets larger than mtu in collect_md mode When we send a packet larger than PMTU, we need to reply with icmp_send(ICMP_FRAG_NEEDED) or icmpv6_send(ICMPV6_PKT_TOOBIG). But with collect_md mode, kernel will crash while accessing the dst dev as __metadata_dst_init() init dst->dev to NULL by default. Here is what the code path looks like, for GRE: - ip6gre_tunnel_xmit - ip6gre_xmit_ipv4 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmp_send - net = dev_net(rt->dst.dev); <-- here - ip6gre_xmit_ipv6 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmpv6_send ... - decode_session4 - oif = skb_dst(skb)->dev->ifindex; <-- here - decode_session6 - oif = skb_dst(skb)->dev->ifindex; <-- here We could not fix it in __metadata_dst_init() as there is no dev supplied. Look in to the __icmp_send()/decode_session{4,6} code we could find the dst dev is actually not needed. In __icmp_send(), we could get the net by skb->dev. For decode_session{4,6}, as it was called by xfrm_decode_session_reverse() in this scenario, the oif is not used by fl4->flowi4_oif = reverse ? skb->skb_iif : oif; The reproducer is easy: ovs-vsctl add-br br0 ip link set br0 up ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=$dst_addr ip link set gre0 up ip addr add ${local_gre6}/64 dev br0 ping6 $remote_gre6 -s 1500 The kernel will crash like [40595.821651] BUG: kernel NULL pointer dereference, address: 0000000000000108 [40595.822411] #PF: supervisor read access in kernel mode [40595.822949] #PF: error_code(0x0000) - not-present page [40595.823492] PGD 0 P4D 0 [40595.823767] Oops: 0000 [#1] SMP PTI [40595.824139] CPU: 0 PID: 2831 Comm: handler12 Not tainted 5.2.0 #57 [40595.824788] Hardware name: Red Hat KVM, BIOS 1.11.1-3.module+el8.1.0+2983+b2ae9c0a 04/01/2014 [40595.825680] RIP: 0010:__xfrm_decode_session+0x6b/0x930 [40595.826219] Code: b7 c0 00 00 00 b8 06 00 00 00 66 85 d2 0f b7 ca 48 0f 45 c1 44 0f b6 2c 06 48 8b 47 58 48 83 e0 fe 0f 84 f4 04 00 00 48 8b 00 <44> 8b 80 08 01 00 00 41 f6 c4 01 4c 89 e7 ba 58 00 00 00 0f 85 47 [40595.828155] RSP: 0018:ffffc90000a73438 EFLAGS: 00010286 [40595.828705] RAX: 0000000000000000 RBX: ffff8881329d7100 RCX: 0000000000000000 [40595.829450] RDX: 0000000000000000 RSI: ffff8881339e70ce RDI: ffff8881329d7100 [40595.830191] RBP: ffffc90000a73470 R08: 0000000000000000 R09: 000000000000000a [40595.830936] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a73490 [40595.831682] R13: 000000000000002c R14: ffff888132ff1301 R15: ffff8881329d7100 [40595.832427] FS: 00007f5bfcfd6700(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000 [40595.833266] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40595.833883] CR2: 0000000000000108 CR3: 000000013a368000 CR4: 00000000000006f0 [40595.834633] Call Trace: [40595.835392] ? rt6_multipath_hash+0x4c/0x390 [40595.835853] icmpv6_route_lookup+0xcb/0x1d0 [40595.836296] ? icmpv6_xrlim_allow+0x3e/0x140 [40595.836751] icmp6_send+0x537/0x840 [40595.837125] icmpv6_send+0x20/0x30 [40595.837494] tnl_update_pmtu.isra.27+0x19d/0x2a0 [ip_tunnel] [40595.838088] ip_md_tunnel_xmit+0x1b6/0x510 [ip_tunnel] [40595.838633] gre_tap_xmit+0x10c/0x160 [ip_gre] [40595.839103] dev_hard_start_xmit+0x93/0x200 [40595.839551] sch_direct_xmit+0x101/0x2d0 [40595.839967] __dev_queue_xmit+0x69f/0x9c0 [40595.840399] do_execute_actions+0x1717/0x1910 [openvswitch] [40595.840987] ? validate_set.isra.12+0x2f5/0x3d0 [openvswitch] [40595.841596] ? reserve_sfa_size+0x31/0x130 [openvswitch] [40595.842154] ? __ovs_nla_copy_actions+0x1b4/0xad0 [openvswitch] [40595.842778] ? __kmalloc_reserve.isra.50+0x2e/0x80 [40595.843285] ? should_failslab+0xa/0x20 [40595.843696] ? __kmalloc+0x188/0x220 [40595.844078] ? __alloc_skb+0x97/0x270 [40595.844472] ovs_execute_actions+0x47/0x120 [openvswitch] [40595.845041] ovs_packet_cmd_execute+0x27d/0x2b0 [openvswitch] [40595.845648] genl_family_rcv_msg+0x3a8/0x430 [40595.846101] genl_rcv_msg+0x47/0x90 [40595.846476] ? __alloc_skb+0x83/0x270 [40595.846866] ? genl_family_rcv_msg+0x430/0x430 [40595.847335] netlink_rcv_skb+0xcb/0x100 [40595.847777] genl_rcv+0x24/0x40 [40595.848113] netlink_unicast+0x17f/0x230 [40595.848535] netlink_sendmsg+0x2ed/0x3e0 [40595.848951] sock_sendmsg+0x4f/0x60 [40595.849323] ___sys_sendmsg+0x2bd/0x2e0 [40595.849733] ? sock_poll+0x6f/0xb0 [40595.850098] ? ep_scan_ready_list.isra.14+0x20b/0x240 [40595.850634] ? _cond_resched+0x15/0x30 [40595.851032] ? ep_poll+0x11b/0x440 [40595.851401] ? _copy_to_user+0x22/0x30 [40595.851799] __sys_sendmsg+0x58/0xa0 [40595.852180] do_syscall_64+0x5b/0x190 [40595.852574] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40595.853105] RIP: 0033:0x7f5c00038c7d [40595.853489] Code: c7 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e f7 ff ff 48 89 04 24 b8 2e 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 d7 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01 [40595.855443] RSP: 002b:00007f5bfcf73c00 EFLAGS: 00003293 ORIG_RAX: 000000000000002e [40595.856244] RAX: ffffffffffffffda RBX: 00007f5bfcf74a60 RCX: 00007f5c00038c7d [40595.856990] RDX: 0000000000000000 RSI: 00007f5bfcf73c60 RDI: 0000000000000015 [40595.857736] RBP: 0000000000000004 R08: 0000000000000b7c R09: 0000000000000110 [40595.858613] R10: 0001000800050004 R11: 0000000000003293 R12: 000055c2d8329da0 [40595.859401] R13: 00007f5bfcf74120 R14: 0000000000000347 R15: 00007f5bfcf73c60 [40595.860185] Modules linked in: ip_gre ip_tunnel gre openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc bochs_drm ttm drm_kms_helper drm pcspkr joydev i2c_piix4 qemu_fw_cfg xfs libcrc32c virtio_net net_failover serio_raw failover ata_generic virtio_blk pata_acpi floppy [40595.863155] CR2: 0000000000000108 [40595.863551] ---[ end trace 22209bbcacb4addd ]--- v4: Julian Anastasov remind skb->dev also could be NULL in icmp_send. We'd better still use dst.dev and do a check to avoid crash. v3: only replace pkg to packets in cover letter. So I didn't update the version info in the follow up patches. v2: fix it in __icmp_send() and decode_session{4,6} separately instead of updating shared dst dev in {ip_md, ip6}_tunnel_xmit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

The debug_pagealloc functionality is useful to catch buggy page allocator users that cause e.g. use after free or double free. When page inconsistency is detected, debugging is often simpler by knowing the call stack of process that last allocated and freed the page. When page_owner is also enabled, we record the allocation stack trace, but not freeing. This patch therefore adds recording of freeing process stack trace to page owner info, if both page_owner and debug_pagealloc are configured and enabled. With only page_owner enabled, this info is not useful for the memory leak debugging use case. dump_page() is adjusted to print the info. An example result of calling __free_pages() twice may look like this (note the page last free stack trace): BUG: Bad page state in process bash pfn:13d8f8 page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1affff800000000() raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 page dumped because: nonzero _refcount page_owner tracks the page as freed page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL) prep_new_page+0x143/0x150 get_page_from_freelist+0x289/0x380 __alloc_pages_nodemask+0x13c/0x2d0 khugepaged+0x6e/0xc10 kthread+0xf9/0x130 ret_from_fork+0x3a/0x50 page last free stack trace: free_pcp_prepare+0x134/0x1e0 free_unref_page+0x18/0x90 khugepaged+0x7b/0xc10 kthread+0xf9/0x130 ret_from_fork+0x3a/0x50 Modules linked in: CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ thesofproject#57 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x85/0xc0 bad_page.cold+0xba/0xbf rmqueue_pcplist.isra.0+0x6c5/0x6d0 rmqueue+0x2d/0x810 get_page_from_freelist+0x191/0x380 __alloc_pages_nodemask+0x13c/0x2d0 __get_free_pages+0xd/0x30 __pud_alloc+0x2c/0x110 copy_page_range+0x4f9/0x630 dup_mmap+0x362/0x480 dup_mm+0x68/0x110 copy_process+0x19e1/0x1b40 _do_fork+0x73/0x310 __x64_sys_clone+0x75/0x80 do_syscall_64+0x6e/0x1e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f10af854a10 ... Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Here's the KASAN report: BUG: KASAN: use-after-free in skcipher_crypt_done+0xe8/0x1a8 Read of size 1 at addr ffff00002304001c by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc1-00162-gfcb90d5 thesofproject#57 Hardware name: LS1046A RDB Board (DT) Call trace: dump_backtrace+0x0/0x260 show_stack+0x14/0x20 dump_stack+0xe8/0x144 print_address_description.isra.11+0x64/0x348 __kasan_report+0x11c/0x230 kasan_report+0xc/0x18 __asan_load1+0x5c/0x68 skcipher_crypt_done+0xe8/0x1a8 caam_jr_dequeue+0x390/0x608 tasklet_action_common.isra.13+0x1ec/0x230 tasklet_action+0x24/0x30 efi_header_end+0x1a4/0x370 irq_exit+0x114/0x128 __handle_domain_irq+0x80/0xe0 gic_handle_irq+0x50/0xa0 el1_irq+0xb8/0x180 _raw_spin_unlock_irq+0x2c/0x78 finish_task_switch+0xa4/0x2f8 __schedule+0x3a4/0x890 schedule_idle+0x28/0x50 do_idle+0x22c/0x338 cpu_startup_entry+0x24/0x40 rest_init+0xf8/0x10c arch_call_rest_init+0xc/0x14 start_kernel+0x774/0x7b4 Allocated by task 263: save_stack+0x24/0xb0 __kasan_kmalloc.isra.10+0xc4/0xe0 kasan_kmalloc+0xc/0x18 __kmalloc+0x178/0x2b8 skcipher_edesc_alloc+0x21c/0x1018 skcipher_encrypt+0x84/0x150 crypto_skcipher_encrypt+0x50/0x68 test_skcipher_vec_cfg+0x4d4/0xc10 test_skcipher_vec+0xf8/0x1d8 alg_test_skcipher+0xec/0x230 alg_test.part.44+0x114/0x4a0 alg_test+0x1c/0x60 cryptomgr_test+0x34/0x58 kthread+0x1b8/0x1c0 ret_from_fork+0x10/0x18 Freed by task 0: save_stack+0x24/0xb0 __kasan_slab_free+0x10c/0x188 kasan_slab_free+0x10/0x18 kfree+0x7c/0x298 skcipher_crypt_done+0xe0/0x1a8 caam_jr_dequeue+0x390/0x608 tasklet_action_common.isra.13+0x1ec/0x230 tasklet_action+0x24/0x30 efi_header_end+0x1a4/0x370 The buggy address belongs to the object at ffff000023040000 which belongs to the cache dma-kmalloc-512 of size 512 The buggy address is located 28 bytes inside of 512-byte region [ffff000023040000, ffff000023040200) The buggy address belongs to the page: page:fffffe00006c1000 refcount:1 mapcount:0 mapping:ffff00093200c400 index:0x0 compound_mapcount: 0 flags: 0xffff00000010200(slab|head) raw: 0ffff00000010200 dead000000000100 dead000000000122 ffff00093200c400 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff00002303ff00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff00002303ff80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff000023040000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff000023040080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff000023040100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: ee38767 ("crypto: caam - support crypto_engine framework for SKCIPHER algorithms") Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

…le_activate In case if isi.nr_pages is 0, we are making sis->pages (which is unsigned int) a huge value in iomap_swapfile_activate() by assigning -1. This could cause a kernel crash in kernel v4.18 (with below signature). Or could lead to unknown issues on latest kernel if the fake big swap gets used. Fix this issue by returning -EINVAL in case of nr_pages is 0, since it is anyway a invalid swapfile. Looks like this issue will be hit when we have pagesize < blocksize type of configuration. I was able to hit the issue in case of a tiny swap file with below test script. https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/scripts/swap-issue.sh kernel crash analysis on v4.18 ============================== On v4.18 kernel, it causes a kernel panic, since sis->pages becomes a huge value and isi.nr_extents is 0. When 0 is returned it is considered as a swapfile over NFS and SWP_FILE is set (sis->flags |= SWP_FILE). Then when swapoff was getting called it was calling a_ops->swap_deactivate() if (sis->flags & SWP_FILE) is true. Since a_ops->swap_deactivate() is NULL in case of XFS, it causes below panic. Panic signature on v4.18 kernel: ======================================= root@qemu:/home/qemu# [ 8291.723351] XFS (loop2): Unmounting Filesystem [ 8292.123104] XFS (loop2): Mounting V5 Filesystem [ 8292.132451] XFS (loop2): Ending clean mount [ 8292.263362] Adding 4294967232k swap on /mnt1/test/swapfile. Priority:-2 extents:1 across:274877906880k [ 8292.277834] Unable to handle kernel paging request for instruction fetch [ 8292.278677] Faulting instruction address: 0x00000000 cpu 0x19: Vector: 400 (Instruction Access) at [c0000009dd5b7ad0] pc: 0000000000000000 lr: c0000000003eb9dc: destroy_swap_extents+0xfc/0x120 sp: c0000009dd5b7d50 msr: 8000000040009033 current = 0xc0000009b6710080 paca = 0xc00000003ffcb280 irqmask: 0x03 irq_happened: 0x01 pid = 5604, comm = swapoff Linux version 4.18.0 (riteshh@xxxxxxx) (gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04)) #57 SMP Wed Mar 3 01:33:04 CST 2021 enter ? for help [link register ] c0000000003eb9dc destroy_swap_extents+0xfc/0x120 [c0000009dd5b7d50] c0000000025a7058 proc_poll_event+0x0/0x4 (unreliable) [c0000009dd5b7da0] c0000000003f0498 sys_swapoff+0x3f8/0x910 [c0000009dd5b7e30] c00000000000bbe4 system_call+0x5c/0x70 Exception: c01 (System Call) at 00007ffff7d208d8 Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> [djwong: rework the comment to provide more details] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>

Since priv->rx_mapping[i] is maped in moxart_mac_open(), we should unmap it from moxart_mac_stop(). Fixes 2 warnings. 1. During error unwinding in moxart_mac_probe(): "goto init_fail;", then moxart_mac_free_memory() calls dma_unmap_single() with priv->rx_mapping[i] pointers zeroed. WARNING: CPU: 0 PID: 1 at kernel/dma/debug.c:963 check_unmap+0x704/0x980 DMA-API: moxart-ethernet 92000000.mac: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=1600 bytes] CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0+ thesofproject#60 Hardware name: Generic DT based system unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x34/0x44 dump_stack_lvl from __warn+0xbc/0x1f0 __warn from warn_slowpath_fmt+0x94/0xc8 warn_slowpath_fmt from check_unmap+0x704/0x980 check_unmap from debug_dma_unmap_page+0x8c/0x9c debug_dma_unmap_page from moxart_mac_free_memory+0x3c/0xa8 moxart_mac_free_memory from moxart_mac_probe+0x190/0x218 moxart_mac_probe from platform_probe+0x48/0x88 platform_probe from really_probe+0xc0/0x2e4 2. After commands: ip link set dev eth0 down ip link set dev eth0 up WARNING: CPU: 0 PID: 55 at kernel/dma/debug.c:570 add_dma_entry+0x204/0x2ec DMA-API: moxart-ethernet 92000000.mac: cacheline tracking EEXIST, overlapping mappings aren't supported CPU: 0 PID: 55 Comm: ip Not tainted 5.19.0+ thesofproject#57 Hardware name: Generic DT based system unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x34/0x44 dump_stack_lvl from __warn+0xbc/0x1f0 __warn from warn_slowpath_fmt+0x94/0xc8 warn_slowpath_fmt from add_dma_entry+0x204/0x2ec add_dma_entry from dma_map_page_attrs+0x110/0x328 dma_map_page_attrs from moxart_mac_open+0x134/0x320 moxart_mac_open from __dev_open+0x11c/0x1ec __dev_open from __dev_change_flags+0x194/0x22c __dev_change_flags from dev_change_flags+0x14/0x44 dev_change_flags from devinet_ioctl+0x6d4/0x93c devinet_ioctl from inet_ioctl+0x1ac/0x25c v1 -> v2: Extraneous change removed. Fixes: 6c821bd ("net: Add MOXA ART SoCs ethernet driver") Signed-off-by: Sergei Antonov <saproj@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220819110519.1230877-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

…) to avoid crash [ Upstream commit 68b99e9 ] When CPU 0 is offline and intel_powerclamp is used to inject idle, it generates kernel BUG: BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687 caller is debug_smp_processor_id+0x17/0x20 CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ thesofproject#57 Call Trace: <TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 check_preemption_disabled+0xdd/0xe0 debug_smp_processor_id+0x17/0x20 powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp] ... ... Here CPU 0 is the control CPU by default and changed to the current CPU, if CPU 0 offlined. This check has to be performed under cpus_read_lock(), hence the above warning. Use get_cpu() instead of smp_processor_id() to avoid this BUG. Suggested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

ranj063 force-pushed the topic/suspend-new branch from c6264aa to b7f25d3 Compare July 26, 2018 06:35

lgirdwood approved these changes Jul 26, 2018

View reviewed changes

plbossart requested changes Jul 26, 2018

View reviewed changes

ranj063 force-pushed the topic/suspend-new branch from b7f25d3 to 30e05b1 Compare July 26, 2018 21:27

ranj063 added 15 commits August 8, 2018 10:04

ASoC: SOF: add suspend/resume callbacks for APL

2ebcbfd

Add suspend/resume callbacks for APL. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ASoC: SOF: move ipc for initializing trace into a separate function

b032e5d

Move the code to send ipc for initializing trace into a separate function that can also be called during suspend/resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ASoC: SOF: make sof_complete_pipeline non static

97a8409

This will be called during resume to send ipc for pipeline completion. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ASoC: SOF: set kcontrol cmd for pga widget

c4e0db0

set the kcontrol cmd which will be used to send the correct ipc command to restore volume control value during resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ASoC: SOF: allow runtime_pm for pci/acpi/spi device

2fb5764

Allow runtime_pm to be enabled after probe is finished for pci/acpi/spi devices. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ASoC: SOF: update dai config after parsing pdm tokens

067df4e

Update the dai config with the pdm config information after parsing the pdm tokens. This will be needed to restore the dai config after suspend/resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ASoC: SOF: do not add virtual routes to route list

6b799ab

Skip adding virtual routes to the route list. This will prevent sending the routes to the FW during system resume. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>

ranj063 force-pushed the topic/suspend-new branch from 30e05b1 to 6b799ab Compare August 8, 2018 18:17

plbossart merged commit 2de67aa into thesofproject:topic/sof-dev Aug 9, 2018

ranj063 deleted the topic/suspend-new branch March 22, 2019 17:08

RanderWang mentioned this pull request Aug 31, 2023

Add fw panic support for IPC4 #4369

Merged

Suspend/Resume and S0ix implementation #57

Suspend/Resume and S0ix implementation #57

Uh oh!

Conversation

ranj063 commented Jul 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranj063 commented Jul 25, 2018

Uh oh!

ranj063 commented Jul 26, 2018

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

lgirdwood Jul 26, 2018

Choose a reason for hiding this comment

Uh oh!

lgirdwood Jul 26, 2018

Choose a reason for hiding this comment

Uh oh!

lgirdwood Jul 26, 2018

Choose a reason for hiding this comment

Uh oh!

lgirdwood Jul 26, 2018

Choose a reason for hiding this comment

Uh oh!

plbossart left a comment

Choose a reason for hiding this comment

Uh oh!

plbossart Jul 26, 2018

Choose a reason for hiding this comment

Uh oh!

plbossart Jul 26, 2018

Choose a reason for hiding this comment

Uh oh!

plbossart Jul 26, 2018

Choose a reason for hiding this comment

Uh oh!

ranj063 commented Jul 26, 2018

Uh oh!

ranj063 commented Jul 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranj063 commented Aug 8, 2018

Uh oh!

naveen-manohar commented Aug 9, 2018

Uh oh!

lgirdwood commented Aug 9, 2018

Uh oh!

plbossart commented Aug 9, 2018

Uh oh!

plbossart commented Aug 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ranj063 commented Jul 25, 2018 •

edited

Loading

ranj063 commented Jul 26, 2018 •

edited

Loading