Skip to content

drm/vc4: Hold pm_runtime for vc4. #4706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 2, 2021
Merged

Conversation

6by9
Copy link
Contributor

@6by9 6by9 commented Nov 17, 2021

BCM283[5|6|7] (aka vc4) version of the HDMI block is controlled
via the power-domain and pm_runtime. This is reset and initialised
in vc4_hdmi_reset (called from bind), but then the block is
powered down.
The VEC happens to request the same power gating, so having the
VEC enabled meant that HDMI stayed powered up, and all worked OK.
Disable the VEC, and the display pipeline dies.

Amend vc4_hdmi_reset to call pm_runtime_get, and add a new callback
to allow unbind to call pm_runtime_put appropriately.

Signed-off-by: Dave Stevenson [email protected]

@6by9
Copy link
Contributor Author

6by9 commented Nov 17, 2021

@mripard Thoughts please?
See email from earlier today, and sorts the issue caused by #4689 (temporarily reverted).

@mripard
Copy link
Contributor

mripard commented Nov 17, 2021

I just answered your mail, but let's keep the discussion here. Here's my mail:

So we used to have it after the pm_runtime_resume_and_get in
vc4_hdmi_encoder_pre_crtc_configure, and we had to move it back to bind
because it would reset whatever vc4_hdmi_cec_init would be doing at bind
time.

I guess we could move the call to reset to runtime_resume, and do the
register setup of vc4_hdmi_cec_init every time the adapter is enabled?

Would that work?

@6by9
Copy link
Contributor Author

6by9 commented Nov 17, 2021

I was just responding to your email :-)

I'd need to look at what CEC does with the block.
I suspect not having the power-domain on will mean that CEC isn't working when the display is disabled. Should it resume_and_get the block on 2835 so that it retains the power domain? It's a shame power domains and clocks seem to have got quite so intertwined :-(

Looking at vc4_hdmi_reset, it sets VC4_HD_M_ENABLE, which is the "HDMI Peripheral Global Enable". Reset state is 0. So without that things will be going wrong.

@mripard
Copy link
Contributor

mripard commented Nov 18, 2021

I suspect not having the power-domain on will mean that CEC isn't working when the display is disabled. Should it resume_and_get the block on 2835 so that it retains the power domain?

I guess, but how would that work if we get it on first bind, it doesn't work, we give it back and we take it back in a subsequent bind?

It's a shame power domains and clocks seem to have got quite so intertwined :-(

It's largely due to the HSM completely stalling the CPU when it's shut down though if we access the registers. If we revert this, we'd have to make the same initialisation over and over in detect, cec, alsa, etc.

@6by9
Copy link
Contributor Author

6by9 commented Nov 19, 2021

We need to be clear on the differences between 283x vs 2711.

2711 has no power domain gating defined, so the block is on and retains state. We reset at the start of day, and turn the HSM clock on and off to ensure we can access the domain, however the block retains the programmed state.

283x has power domains defined, so pm_runtime powers the block up and down, losing all the state. (It almost feels like the power-domain is closer to the reset controller on 2711, but it's not).

I can't find a definitive statement as to whether CEC should work when video is not being sent, but I don't see why it shouldn't. None of the apps I'm aware of disable the video output on CEC TV power off, and the TV I've got seems to stop doing CEC things when in standby.

I would expect that CEC will require the HSM clock on 283x to be active, as that is what drives the CEC block. (2711 uses the 27MHz clock for CEC, and that's always enabled). 283x we never change the HSM clock, so we won't hit a situation of trying to change an enabled clock.

I'm thinking another variant flag cec_requires_pm.
If set then vc4_hdmi_cec_init can pm_runtime_resume_and_get, and vc4_hdmi_cec_exit can put it. AIUI the usage is reference counted, so all the other resumes/gets and puts will have no effect.
That does break if CONFIG_DRM_VC4_HDMI_CEC is not set, so perhaps name it retain_pm, and make the pm_runtime_put_sync at the end of vc4_hdmi_bind conditional on it not being set, and add a conditional pm_runtime_put in vc4_hdmi_unbind if it is set.

@mripard
Copy link
Contributor

mripard commented Nov 23, 2021

We need to be clear on the differences between 283x vs 2711.

2711 has no power domain gating defined, so the block is on and retains state. We reset at the start of day, and turn the HSM clock on and off to ensure we can access the domain, however the block retains the programmed state.

283x has power domains defined, so pm_runtime powers the block up and down, losing all the state. (It almost feels like the power-domain is closer to the reset controller on 2711, but it's not).

I think that part of the issue is that the driver with the call to reset and the call to vc4_hdmi_cec_init has an expectation that the controller will be in a certain state, but shutting down the power domain on 283x breaks that expectation.

I can't find a definitive statement as to whether CEC should work when video is not being sent, but I don't see why it shouldn't. None of the apps I'm aware of disable the video output on CEC TV power off, and the TV I've got seems to stop doing CEC things when in standby.

I would expect that CEC will require the HSM clock on 283x to be active, as that is what drives the CEC block. (2711 uses the 27MHz clock for CEC, and that's always enabled). 283x we never change the HSM clock, so we won't hit a situation of trying to change an enabled clock.

I'm thinking another variant flag cec_requires_pm. If set then vc4_hdmi_cec_init can pm_runtime_resume_and_get, and vc4_hdmi_cec_exit can put it. AIUI the usage is reference counted, so all the other resumes/gets and puts will have no effect. That does break if CONFIG_DRM_VC4_HDMI_CEC is not set, so perhaps name it retain_pm, and make the pm_runtime_put_sync at the end of vc4_hdmi_bind conditional on it not being set, and add a conditional pm_runtime_put in vc4_hdmi_unbind if it is set.

That's one way to fix it. The other would be to reset and do the register initialisation found in vc4_hdmi_cec_init in the runtime pm hook. This would keep the driver simpler and we wouldn't have to reflect on what the refcount is.

@6by9
Copy link
Contributor Author

6by9 commented Nov 23, 2021

That's one way to fix it. The other would be to reset and do the register initialisation found in vc4_hdmi_cec_init in the runtime pm hook. This would keep the driver simpler and we wouldn't have to reflect on what the refcount is.

But that means that CEC can not work on Pi0-3 if the video side of the pipeline is not active, as the power domain will be disabled.
In reading the CEC spec I don't see anything that says that CEC needs video to be active, but also nothing that says it should if inactive. Gut feeling is that it should work in the absence of video, as it is electrically totally independent of the video link.

@mripard
Copy link
Contributor

mripard commented Nov 23, 2021

That's one way to fix it. The other would be to reset and do the register initialisation found in vc4_hdmi_cec_init in the runtime pm hook. This would keep the driver simpler and we wouldn't have to reflect on what the refcount is.

But that means that CEC can not work on Pi0-3 if the video side of the pipeline is not active, as the power domain will be disabled.

We should be ok in that case as well, since we call pm_runtime_get in CEC .adap_enable (https://github.com/raspberrypi/linux/blob/rpi-5.10.y/drivers/gpu/drm/vc4/vc4_hdmi.c#L2054), so if the device isn't powered it will power the power domain and call vc4_hdmi_runtime_resume

@6by9
Copy link
Contributor Author

6by9 commented Nov 23, 2021

We should be ok in that case as well, since we call pm_runtime_get in CEC .adap_enable (https://github.com/raspberrypi/linux/blob/rpi-5.10.y/drivers/gpu/drm/vc4/vc4_hdmi.c#L2054), so if the device isn't powered it will power the power domain and call vc4_hdmi_runtime_resume

Well I'm glad someone understands this stuff! :-)
You're right - cec adap_enable will request pm.

It's not just the HDMI_CEC_CNTRL_1 register in vc4_hdmi_cec_init, it wants the reset process from vc4_hdmi_reset too.
So Dom had moved the reset from vc4_hdmi_encoder_pre_crtc_configure to bind due to trampling CEC registers, but we're now wanting to move it and HDMI_CEC_CNTRL_1 to pm_runtime. We'll get the right place eventually!

@6by9
Copy link
Contributor Author

6by9 commented Nov 25, 2021

@mripard Updated to reset and reinitialise from pm_runtime_resume hook.
Not tested yet on all hardware, but doing so.

@6by9
Copy link
Contributor Author

6by9 commented Nov 25, 2021

Groan, vc4_hdmi_read and vc4_hdmi_write both have

WARN_ON(!pm_runtime_active(&hdmi->pdev->dev));

The status doesn't become ACTIVE until resume completes, so all the calls in the resume function trigger the WARN.

It looks like we can use WARN_ON(pm_runtime_status_suspended(&hdmi->pdev->dev)); instead, so out of the states of RPM_ACTIVE, RPM_SUSPENDING, RPM_SUSPENDED, RPM_RESUMING, only RPM_SUSPENDED will trigger the WARN.

@mripard
Copy link
Contributor

mripard commented Nov 26, 2021

It makes sense indeed

@6by9
Copy link
Contributor Author

6by9 commented Nov 26, 2021

It still complains with pm_runtime_status_suspended.

Due to pm_runtime_active looking at disable_depth as well as the state, generally you get both pm_runtime_status_suspended and pm_runtime_active reporting the same logical state in vc4_hdmi_runtime_resume or vc4_hdmi_runtime_suspend.

There is a comment against both those pm_runtime functions of

Note that the return value of this function can only be trusted if it is called under the runtime PM lock of dev or under conditions in which the runtime PM status of dev cannot change

I don't think there can be a second pm_runtime call coming in at the same time as the resume or suspend calls are being executed, but I was just noting it.

@6by9
Copy link
Contributor Author

6by9 commented Nov 26, 2021

Comment made just now on linux-media
https://patchwork.linuxtv.org/project/linux-media/patch/[email protected]/#133261

 @@ -200,6 +217,9 @@ static int __maybe_unused dw9714_vcm_suspend(struct device *dev)
>  	struct dw9714_device *dw9714_dev = sd_to_dw9714_vcm(sd);
>  	int ret, val;
>  
> +	if (pm_runtime_suspended(&client->dev))
> +		return 0;

This can't take place in a runtime PM suspend callback. You'll need to add
system suspend callback for this.

This goes above my knowledge of pm_runtime.
Guidance please @mripard.

As we know we're going to have the power domain on in pm_runtime_resume, split vc4_hdmi_[write|read] into a wrapper that checks the state, and a function that does the work? When called from runtime_resume, go direct to the function to do the work?

@mripard
Copy link
Contributor

mripard commented Nov 26, 2021

I'm not sure what Sakari meant either, to be honest, I don't know runtime_pm well enough.

Your solution would definitely work. We also have the option of removing that check if it turns out it doesn't work well enough.

The original intent was to at least have some logging before the system would stall completely if we were to access a register while the HSM clock or power domain was still disabled. It's a nice-to-have feature, but it definitely shouldn't stand in the way of any new development or bug fix.

@6by9
Copy link
Contributor Author

6by9 commented Nov 26, 2021

He backtracked - https://patchwork.linuxtv.org/project/linux-media/patch/[email protected]/#133275

Still not quite sure why I get erroneous WARNs having converted to pm_runtime_suspended. Yet more logging needed.

I fully understand the intent and why it's a good thing, so I'm slightly loathed to drop it unless we really can't make things work.

@6by9
Copy link
Contributor Author

6by9 commented Nov 26, 2021

I'm a numpty. There's an explicit call to vc4_hdmi_runtime_resume from vc4_hdmi_bind which is done before pm_runtime is initialised :-(

@6by9
Copy link
Contributor Author

6by9 commented Nov 26, 2021

Still not right - I was accidentally testing with composite enabled again from the dtoverlay line.
Fumbling in the dark as to the correct sequence of pm_runtime calls to enable it "normally". Do we need to split things apart with pm_runtime_get_noresume and pm_runtime_set_active still?
Need to rebuild for Pi4 and test there too..

We will be resetting and initialising the HDMI block more often after this, but it shouldn't be at a time that matters.

@mripard
Copy link
Contributor

mripard commented Nov 29, 2021

Still not right - I was accidentally testing with composite enabled again from the dtoverlay line. Fumbling in the dark as to the correct sequence of pm_runtime calls to enable it "normally". Do we need to split things apart with pm_runtime_get_noresume and pm_runtime_set_active still?

It shouldn't be needed anymore I think

@6by9
Copy link
Contributor Author

6by9 commented Dec 2, 2021

Think I've found it - the pixel clock wasn't running for whatever reason when configured through the kernel clock driver. Add it to the firmware clock driver, and point hdmi@7e902000 at it, and I still have an HDMI display.

6by9 added 4 commits December 2, 2021 14:13
Pi0-3 have power domains attached to the pm_runtime hooks
for the HDMI block. Initialisation done in the reset called
from bind is therefore lost if all users of the domain are
suspended.
The VEC shares the same lowest level clock/power gating as
the HDMI block, so whilst that is enabled the block is never
actually powered down, but if it isn't enabled then we lose
the state.

Reset and initialise the HDMI block from pm_resume.

Signed-off-by: Dave Stevenson <[email protected]>
The clk-bcm2835 handling of the pixel clock does not function
correctly when the HDMI power domain is disabled.

The firmware supports it correctly, so add it to the
firmware clock driver.

Signed-off-by: Dave Stevenson <[email protected]>
The clk-bcm2835 handling of the pixel clock does not function
correctly when the HDMI power domain is disabled.

The firmware supports it correctly, and the firmware clock
driver now supports it, so switch the vc4-hdmi driver to use
the firmware clock driver.

Signed-off-by: Dave Stevenson <[email protected]>
Reinstates the new handling.

This reverts commit 46c99e3.

Signed-off-by: Dave Stevenson <[email protected]>
@6by9
Copy link
Contributor Author

6by9 commented Dec 2, 2021

Branch updated.
Works for me on a Pi3 (composite confirmed as disabled via modetest) and on a Pi4.

@pelwell
Copy link
Contributor

pelwell commented Dec 2, 2021

I get HDMI output with and without the ,composite dtparam with this PR. Without it, the ,nocomposite dtparam kills the display and the login console.

LGTM.

@pelwell pelwell merged commit e717ba8 into raspberrypi:rpi-5.10.y Dec 2, 2021
@6by9
Copy link
Contributor Author

6by9 commented Dec 2, 2021

I get HDMI output with and without the ,composite dtparam with this PR.

HDMI hotplug overrides composite on Pi0-3.
They share a pixel valve, so they can't both be active at once, and composite is the fallback should HDMI not be connected as the connection state can't be checked.

popcornmix added a commit to raspberrypi/firmware that referenced this pull request Dec 3, 2021
See: raspberrypi/linux#4755

kernel: DPI panel configuration
See: raspberrypi/linux#4753

kernel: KMS 7" DSI panel and touchscreen fixes
See: raspberrypi/linux#4750

kernel: drm/vc4: Hold pm_runtime for vc4
See: raspberrypi/linux#4706
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request Dec 3, 2021
See: raspberrypi/linux#4755

kernel: DPI panel configuration
See: raspberrypi/linux#4753

kernel: KMS 7" DSI panel and touchscreen fixes
See: raspberrypi/linux#4750

kernel: drm/vc4: Hold pm_runtime for vc4
See: raspberrypi/linux#4706
@popcornmix
Copy link
Collaborator

popcornmix commented Dec 22, 2021

@6by9 this breaks hdmi output with kms on Pi3.
@HiassofT reported in LE slack

@popcornmix I'm also seeing flip_done with latest rpi-update/next kernel on RPiOS buster lite (with vc4-kms-v3d)
root@raspberrypi:# dmesg | grep flip
[ 24.150919] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] ERROR [CRTC:72:crtc-3] flip_done timed out
root@raspberrypi:# uname -a
Linux raspberrypi 5.15.10-v7+ #1503 SMP Fri Dec 17 15:35:13 GMT 2021 armv7l GNU/Linux
the version before (ef60132cb3c4ebb42390bc786c1d3e9033514508) looks fine
root@raspberrypi:# dmesg | grep flip
root@raspberrypi:# uname -a
Linux raspberrypi 5.15.8-v7+ #1500 SMP Thu Dec 16 14:19:58 GMT 2021 armv7l GNU/Linux

I've bisected rpi-update kernel and confirmed by reverting the 4 commits from head of rpi-5.10.y.
Note after the flip_done error, you sometimes get console appearing after ten seconds (but it's prone to disappear later).

It affects 5.10.y and 5.15.y trees. I was testing on 5.10.y.

@popcornmix
Copy link
Collaborator

Without reverting, using dtoverlay=vc4-kms-v3d,composite also avoids the issue.

@popcornmix
Copy link
Collaborator

I'm having trouble reproducing the success I had with the revert.
Certainly on 5.15, the revert of this PR is not fixing the issue.
Reverting firmware to an earlier version seems to be fixing it
(but I'm not sure it's 100% repeatable so will continue testing).

@6by9
Copy link
Contributor Author

6by9 commented Jan 4, 2022

How early a firmware? Before it supported shutting down DispmanX via the mailbox?

@popcornmix
Copy link
Collaborator

I discovered why I couldn't reproduce the bisected regression. It seems to only occur with force_turbo=1 enabled and using pi0-3.
That results in flip_done error. Adding ",composite" to kms overlay is a workaround, as is reverting the "change composite handling" part of commit.

popcornmix pushed a commit that referenced this pull request Apr 17, 2023
…sockopt

This attempts to fix the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted
------------------------------------------------------
sco-tester/31 is trying to acquire lock:
ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at:
sco_sock_getsockopt+0x1fc/0xa90

but task is already holding lock:
ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_sock_getsockopt+0x104/0xa90

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       lock_sock_nested+0x32/0x80
       sco_connect_cfm+0x118/0x4a0
       hci_sync_conn_complete_evt+0x1e6/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #1 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       hci_sync_conn_complete_evt+0x1ad/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #0 (&hdev->lock){+.+.}-{3:3}:
       __lock_acquire+0x18cc/0x3740
       lock_acquire+0x151/0x3a0
       __mutex_lock+0x13b/0xcc0
       sco_sock_getsockopt+0x1fc/0xa90
       __sys_getsockopt+0xe9/0x190
       __x64_sys_getsockopt+0x5b/0x70
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda

other info that might help us debug this:

Chain exists of:
  &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
                               lock(hci_cb_list_lock);
                               lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
  lock(&hdev->lock);

 *** DEADLOCK ***

1 lock held by sco-tester/31:
 #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0},
 at: sco_sock_getsockopt+0x104/0xa90

Fixes: 248733e ("Bluetooth: Allow querying of supported offload codecs over SCO socket")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
popcornmix pushed a commit that referenced this pull request Apr 24, 2023
…sockopt

[ Upstream commit 975abc0 ]

This attempts to fix the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted
------------------------------------------------------
sco-tester/31 is trying to acquire lock:
ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at:
sco_sock_getsockopt+0x1fc/0xa90

but task is already holding lock:
ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_sock_getsockopt+0x104/0xa90

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       lock_sock_nested+0x32/0x80
       sco_connect_cfm+0x118/0x4a0
       hci_sync_conn_complete_evt+0x1e6/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #1 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       hci_sync_conn_complete_evt+0x1ad/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #0 (&hdev->lock){+.+.}-{3:3}:
       __lock_acquire+0x18cc/0x3740
       lock_acquire+0x151/0x3a0
       __mutex_lock+0x13b/0xcc0
       sco_sock_getsockopt+0x1fc/0xa90
       __sys_getsockopt+0xe9/0x190
       __x64_sys_getsockopt+0x5b/0x70
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda

other info that might help us debug this:

Chain exists of:
  &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
                               lock(hci_cb_list_lock);
                               lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
  lock(&hdev->lock);

 *** DEADLOCK ***

1 lock held by sco-tester/31:
 #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0},
 at: sco_sock_getsockopt+0x104/0xa90

Fixes: 248733e ("Bluetooth: Allow querying of supported offload codecs over SCO socket")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this pull request Apr 24, 2023
…sockopt

[ Upstream commit 975abc0 ]

This attempts to fix the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted
------------------------------------------------------
sco-tester/31 is trying to acquire lock:
ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at:
sco_sock_getsockopt+0x1fc/0xa90

but task is already holding lock:
ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_sock_getsockopt+0x104/0xa90

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       lock_sock_nested+0x32/0x80
       sco_connect_cfm+0x118/0x4a0
       hci_sync_conn_complete_evt+0x1e6/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #1 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       hci_sync_conn_complete_evt+0x1ad/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #0 (&hdev->lock){+.+.}-{3:3}:
       __lock_acquire+0x18cc/0x3740
       lock_acquire+0x151/0x3a0
       __mutex_lock+0x13b/0xcc0
       sco_sock_getsockopt+0x1fc/0xa90
       __sys_getsockopt+0xe9/0x190
       __x64_sys_getsockopt+0x5b/0x70
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda

other info that might help us debug this:

Chain exists of:
  &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
                               lock(hci_cb_list_lock);
                               lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
  lock(&hdev->lock);

 *** DEADLOCK ***

1 lock held by sco-tester/31:
 #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0},
 at: sco_sock_getsockopt+0x104/0xa90

Fixes: 248733e ("Bluetooth: Allow querying of supported offload codecs over SCO socket")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants