-
Notifications
You must be signed in to change notification settings - Fork 5.2k
drm/vc4: Hold pm_runtime for vc4. #4706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I just answered your mail, but let's keep the discussion here. Here's my mail: So we used to have it after the pm_runtime_resume_and_get in I guess we could move the call to reset to runtime_resume, and do the Would that work? |
I was just responding to your email :-) I'd need to look at what CEC does with the block. Looking at vc4_hdmi_reset, it sets VC4_HD_M_ENABLE, which is the "HDMI Peripheral Global Enable". Reset state is 0. So without that things will be going wrong. |
I guess, but how would that work if we get it on first bind, it doesn't work, we give it back and we take it back in a subsequent bind?
It's largely due to the HSM completely stalling the CPU when it's shut down though if we access the registers. If we revert this, we'd have to make the same initialisation over and over in detect, cec, alsa, etc. |
We need to be clear on the differences between 283x vs 2711. 2711 has no power domain gating defined, so the block is on and retains state. We reset at the start of day, and turn the HSM clock on and off to ensure we can access the domain, however the block retains the programmed state. 283x has power domains defined, so pm_runtime powers the block up and down, losing all the state. (It almost feels like the power-domain is closer to the reset controller on 2711, but it's not). I can't find a definitive statement as to whether CEC should work when video is not being sent, but I don't see why it shouldn't. None of the apps I'm aware of disable the video output on CEC TV power off, and the TV I've got seems to stop doing CEC things when in standby. I would expect that CEC will require the HSM clock on 283x to be active, as that is what drives the CEC block. (2711 uses the 27MHz clock for CEC, and that's always enabled). 283x we never change the HSM clock, so we won't hit a situation of trying to change an enabled clock. I'm thinking another variant flag |
I think that part of the issue is that the driver with the call to reset and the call to vc4_hdmi_cec_init has an expectation that the controller will be in a certain state, but shutting down the power domain on 283x breaks that expectation.
That's one way to fix it. The other would be to reset and do the register initialisation found in vc4_hdmi_cec_init in the runtime pm hook. This would keep the driver simpler and we wouldn't have to reflect on what the refcount is. |
But that means that CEC can not work on Pi0-3 if the video side of the pipeline is not active, as the power domain will be disabled. |
We should be ok in that case as well, since we call pm_runtime_get in CEC .adap_enable (https://github.com/raspberrypi/linux/blob/rpi-5.10.y/drivers/gpu/drm/vc4/vc4_hdmi.c#L2054), so if the device isn't powered it will power the power domain and call vc4_hdmi_runtime_resume |
Well I'm glad someone understands this stuff! :-) It's not just the HDMI_CEC_CNTRL_1 register in vc4_hdmi_cec_init, it wants the reset process from vc4_hdmi_reset too. |
@mripard Updated to reset and reinitialise from pm_runtime_resume hook. |
Groan,
The status doesn't become ACTIVE until resume completes, so all the calls in the resume function trigger the WARN. It looks like we can use |
It makes sense indeed |
It still complains with pm_runtime_status_suspended. Due to There is a comment against both those pm_runtime functions of
I don't think there can be a second pm_runtime call coming in at the same time as the resume or suspend calls are being executed, but I was just noting it. |
Comment made just now on linux-media
This goes above my knowledge of pm_runtime. As we know we're going to have the power domain on in pm_runtime_resume, split vc4_hdmi_[write|read] into a wrapper that checks the state, and a function that does the work? When called from runtime_resume, go direct to the function to do the work? |
I'm not sure what Sakari meant either, to be honest, I don't know runtime_pm well enough. Your solution would definitely work. We also have the option of removing that check if it turns out it doesn't work well enough. The original intent was to at least have some logging before the system would stall completely if we were to access a register while the HSM clock or power domain was still disabled. It's a nice-to-have feature, but it definitely shouldn't stand in the way of any new development or bug fix. |
He backtracked - https://patchwork.linuxtv.org/project/linux-media/patch/[email protected]/#133275 Still not quite sure why I get erroneous WARNs having converted to pm_runtime_suspended. Yet more logging needed. I fully understand the intent and why it's a good thing, so I'm slightly loathed to drop it unless we really can't make things work. |
I'm a numpty. There's an explicit call to vc4_hdmi_runtime_resume from vc4_hdmi_bind which is done before pm_runtime is initialised :-( |
Still not right - I was accidentally testing with composite enabled again from the dtoverlay line. We will be resetting and initialising the HDMI block more often after this, but it shouldn't be at a time that matters. |
It shouldn't be needed anymore I think |
Think I've found it - the pixel clock wasn't running for whatever reason when configured through the kernel clock driver. Add it to the firmware clock driver, and point |
Pi0-3 have power domains attached to the pm_runtime hooks for the HDMI block. Initialisation done in the reset called from bind is therefore lost if all users of the domain are suspended. The VEC shares the same lowest level clock/power gating as the HDMI block, so whilst that is enabled the block is never actually powered down, but if it isn't enabled then we lose the state. Reset and initialise the HDMI block from pm_resume. Signed-off-by: Dave Stevenson <[email protected]>
The clk-bcm2835 handling of the pixel clock does not function correctly when the HDMI power domain is disabled. The firmware supports it correctly, so add it to the firmware clock driver. Signed-off-by: Dave Stevenson <[email protected]>
The clk-bcm2835 handling of the pixel clock does not function correctly when the HDMI power domain is disabled. The firmware supports it correctly, and the firmware clock driver now supports it, so switch the vc4-hdmi driver to use the firmware clock driver. Signed-off-by: Dave Stevenson <[email protected]>
Reinstates the new handling. This reverts commit 46c99e3. Signed-off-by: Dave Stevenson <[email protected]>
Branch updated. |
I get HDMI output with and without the LGTM. |
HDMI hotplug overrides composite on Pi0-3. |
See: raspberrypi/linux#4755 kernel: DPI panel configuration See: raspberrypi/linux#4753 kernel: KMS 7" DSI panel and touchscreen fixes See: raspberrypi/linux#4750 kernel: drm/vc4: Hold pm_runtime for vc4 See: raspberrypi/linux#4706
See: raspberrypi/linux#4755 kernel: DPI panel configuration See: raspberrypi/linux#4753 kernel: KMS 7" DSI panel and touchscreen fixes See: raspberrypi/linux#4750 kernel: drm/vc4: Hold pm_runtime for vc4 See: raspberrypi/linux#4706
@6by9 this breaks hdmi output with kms on Pi3.
I've bisected rpi-update kernel and confirmed by reverting the 4 commits from head of rpi-5.10.y. It affects 5.10.y and 5.15.y trees. I was testing on 5.10.y. |
Without reverting, using |
I'm having trouble reproducing the success I had with the revert. |
How early a firmware? Before it supported shutting down DispmanX via the mailbox? |
I discovered why I couldn't reproduce the bisected regression. It seems to only occur with |
…sockopt This attempts to fix the following trace: ====================================================== WARNING: possible circular locking dependency detected 6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted ------------------------------------------------------ sco-tester/31 is trying to acquire lock: ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at: sco_sock_getsockopt+0x1fc/0xa90 but task is already holding lock: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_sock_getsockopt+0x104/0xa90 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}: lock_sock_nested+0x32/0x80 sco_connect_cfm+0x118/0x4a0 hci_sync_conn_complete_evt+0x1e6/0x3d0 hci_event_packet+0x55c/0x7c0 hci_rx_work+0x34c/0xa00 process_one_work+0x575/0x910 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x2b/0x50 -> #1 (hci_cb_list_lock){+.+.}-{3:3}: __mutex_lock+0x13b/0xcc0 hci_sync_conn_complete_evt+0x1ad/0x3d0 hci_event_packet+0x55c/0x7c0 hci_rx_work+0x34c/0xa00 process_one_work+0x575/0x910 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x2b/0x50 -> #0 (&hdev->lock){+.+.}-{3:3}: __lock_acquire+0x18cc/0x3740 lock_acquire+0x151/0x3a0 __mutex_lock+0x13b/0xcc0 sco_sock_getsockopt+0x1fc/0xa90 __sys_getsockopt+0xe9/0x190 __x64_sys_getsockopt+0x5b/0x70 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x70/0xda other info that might help us debug this: Chain exists of: &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by sco-tester/31: #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_sock_getsockopt+0x104/0xa90 Fixes: 248733e ("Bluetooth: Allow querying of supported offload codecs over SCO socket") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
…sockopt [ Upstream commit 975abc0 ] This attempts to fix the following trace: ====================================================== WARNING: possible circular locking dependency detected 6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted ------------------------------------------------------ sco-tester/31 is trying to acquire lock: ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at: sco_sock_getsockopt+0x1fc/0xa90 but task is already holding lock: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_sock_getsockopt+0x104/0xa90 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}: lock_sock_nested+0x32/0x80 sco_connect_cfm+0x118/0x4a0 hci_sync_conn_complete_evt+0x1e6/0x3d0 hci_event_packet+0x55c/0x7c0 hci_rx_work+0x34c/0xa00 process_one_work+0x575/0x910 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x2b/0x50 -> #1 (hci_cb_list_lock){+.+.}-{3:3}: __mutex_lock+0x13b/0xcc0 hci_sync_conn_complete_evt+0x1ad/0x3d0 hci_event_packet+0x55c/0x7c0 hci_rx_work+0x34c/0xa00 process_one_work+0x575/0x910 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x2b/0x50 -> #0 (&hdev->lock){+.+.}-{3:3}: __lock_acquire+0x18cc/0x3740 lock_acquire+0x151/0x3a0 __mutex_lock+0x13b/0xcc0 sco_sock_getsockopt+0x1fc/0xa90 __sys_getsockopt+0xe9/0x190 __x64_sys_getsockopt+0x5b/0x70 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x70/0xda other info that might help us debug this: Chain exists of: &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by sco-tester/31: #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_sock_getsockopt+0x104/0xa90 Fixes: 248733e ("Bluetooth: Allow querying of supported offload codecs over SCO socket") Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…sockopt [ Upstream commit 975abc0 ] This attempts to fix the following trace: ====================================================== WARNING: possible circular locking dependency detected 6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted ------------------------------------------------------ sco-tester/31 is trying to acquire lock: ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at: sco_sock_getsockopt+0x1fc/0xa90 but task is already holding lock: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_sock_getsockopt+0x104/0xa90 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}: lock_sock_nested+0x32/0x80 sco_connect_cfm+0x118/0x4a0 hci_sync_conn_complete_evt+0x1e6/0x3d0 hci_event_packet+0x55c/0x7c0 hci_rx_work+0x34c/0xa00 process_one_work+0x575/0x910 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x2b/0x50 -> #1 (hci_cb_list_lock){+.+.}-{3:3}: __mutex_lock+0x13b/0xcc0 hci_sync_conn_complete_evt+0x1ad/0x3d0 hci_event_packet+0x55c/0x7c0 hci_rx_work+0x34c/0xa00 process_one_work+0x575/0x910 worker_thread+0x89/0x6f0 kthread+0x14e/0x180 ret_from_fork+0x2b/0x50 -> #0 (&hdev->lock){+.+.}-{3:3}: __lock_acquire+0x18cc/0x3740 lock_acquire+0x151/0x3a0 __mutex_lock+0x13b/0xcc0 sco_sock_getsockopt+0x1fc/0xa90 __sys_getsockopt+0xe9/0x190 __x64_sys_getsockopt+0x5b/0x70 do_syscall_64+0x42/0x90 entry_SYSCALL_64_after_hwframe+0x70/0xda other info that might help us debug this: Chain exists of: &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); lock(&hdev->lock); *** DEADLOCK *** 1 lock held by sco-tester/31: #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_sock_getsockopt+0x104/0xa90 Fixes: 248733e ("Bluetooth: Allow querying of supported offload codecs over SCO socket") Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
BCM283[5|6|7] (aka vc4) version of the HDMI block is controlled
via the power-domain and pm_runtime. This is reset and initialised
in vc4_hdmi_reset (called from bind), but then the block is
powered down.
The VEC happens to request the same power gating, so having the
VEC enabled meant that HDMI stayed powered up, and all worked OK.
Disable the VEC, and the display pipeline dies.
Amend vc4_hdmi_reset to call pm_runtime_get, and add a new callback
to allow unbind to call pm_runtime_put appropriately.
Signed-off-by: Dave Stevenson [email protected]