-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix a race condition in pthread call targets not waking up. #12244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// which in a race condition may have finished handling its event queue | ||
// just after we added our event. (We could also notify it once right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this possible? It looks like all enqueue and dequeue operations are protected by the call_queue_lock
, so the event must have been added either after the target thread finished handling its event queue and released the lock or before the target thread finished handling its event queue, in which case the event would be handled because the enqueue would synchronize with the dequeue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure. But we get clear deadlocks without this patch, where the main thread needs to be woken up, which this patch fixes (see #12258).
It may be that there is something not entirely atomic about our mutexes, in which case I'm not sure what the best debugging approach is (maybe we need to debug the browser itself?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be that there is something not entirely atomic about our mutexes
Yikes! Perhaps we could demonstrate such an issue with our mutexes in a smaller, controlled experiment? A test of a dining philosophers solution could be a good simple stress test for deadlock. It would also be good to narrow down whether the lock misbehaves on the main thread or on non-main threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#12258 has the smallest controlled experiment I can get so far. But it still depends on allocation, proxying, and mutexes...
I have found the actual cause here, and will open a refactoring PR and then a fix PR shortly. |
A thread may happen to have finished handling its events right before we add
another one. If it is idle, it may never handle it. To avoid that, if we wait on a call
then notify it to wake up.
To allow that, track the target thread of each proxied call, so we know who to
notify.