Skip to content

Fix a race condition in pthread call targets not waking up. #12244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions system/include/emscripten/threading.h
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,9 @@ typedef struct em_queued_call
// after it has been executed. If false, the caller is in control of the
// memory.
int calleeDelete;

// The thread on which to perform the call.
void* targetThread;
} em_queued_call;

void emscripten_sync_run_in_main_thread(em_queued_call *call);
Expand Down
17 changes: 16 additions & 1 deletion system/lib/pthread/library_pthread.c
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ static em_queued_call* em_queued_call_malloc() {
call->operationDone = 0;
call->functionPtr = 0;
call->satelliteData = 0;
call->targetThread = 0;
}
return call;
}
Expand Down Expand Up @@ -411,7 +412,7 @@ static CallQueue* GetOrAllocateQueue(

EMSCRIPTEN_RESULT emscripten_wait_for_call_v(em_queued_call* call, double timeoutMSecs) {
int r;

int notifiedThread = 0;
int done = emscripten_atomic_load_u32(&call->operationDone);
if (!done) {
double now = emscripten_get_now();
Expand All @@ -420,6 +421,16 @@ EMSCRIPTEN_RESULT emscripten_wait_for_call_v(em_queued_call* call, double timeou
while (!done && now < waitEndTime) {
r = emscripten_futex_wait(&call->operationDone, 0, waitEndTime - now);
done = emscripten_atomic_load_u32(&call->operationDone);
if (!done && !notifiedThread) {
// If we are not done even after the wait, notify the target thread,
// which in a race condition may have finished handling its event queue
// just after we added our event. (We could also notify it once right
Comment on lines +426 to +427
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this possible? It looks like all enqueue and dequeue operations are protected by the call_queue_lock, so the event must have been added either after the target thread finished handling its event queue and released the lock or before the target thread finished handling its event queue, in which case the event would be handled because the enqueue would synchronize with the dequeue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure. But we get clear deadlocks without this patch, where the main thread needs to be woken up, which this patch fixes (see #12258).

It may be that there is something not entirely atomic about our mutexes, in which case I'm not sure what the best debugging approach is (maybe we need to debug the browser itself?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be that there is something not entirely atomic about our mutexes

Yikes! Perhaps we could demonstrate such an issue with our mutexes in a smaller, controlled experiment? A test of a dining philosophers solution could be a good simple stress test for deadlock. It would also be good to narrow down whether the lock misbehaves on the main thread or on non-main threads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12258 has the smallest controlled experiment I can get so far. But it still depends on allocation, proxying, and mutexes...

// after adding the event to the queue, but that could lead to a lot of
// unnecessary postMessages, so just do it when we are sure it needs to
// be woken up.
_emscripten_notify_thread_queue(call->targetThread, emscripten_main_browser_thread_id());
notifiedThread = 1;
}
now = emscripten_get_now();
}
emscripten_set_current_thread_status(EM_THREAD_STATUS_RUNNING);
Expand Down Expand Up @@ -754,6 +765,7 @@ EMSCRIPTEN_KEEPALIVE double emscripten_run_in_main_runtime_thread_js(int index,
c->calleeDelete = 1-sync;
c->functionEnum = EM_PROXIED_JS_FUNCTION;
c->functionPtr = (void*)index;
c->targetThread = emscripten_main_browser_thread_id();
// We write out the JS doubles into args[], which must be of appropriate size - JS will assume that.
assert(sizeof(em_variant_val) == sizeof(double));
assert(num_args+1 <= EM_QUEUED_JS_CALL_MAX_ARGS);
Expand Down Expand Up @@ -781,6 +793,7 @@ void emscripten_async_run_in_main_runtime_thread_(EM_FUNC_SIGNATURE sig, void* f
return;
q->functionEnum = sig;
q->functionPtr = func_ptr;
q->targetThread = emscripten_main_browser_thread_id();

EM_FUNC_SIGNATURE argumentsType = sig & EM_FUNC_SIG_ARGUMENTS_TYPE_MASK;
va_list args;
Expand Down Expand Up @@ -818,6 +831,7 @@ em_queued_call* emscripten_async_waitable_run_in_main_runtime_thread_(
return NULL;
q->functionEnum = sig;
q->functionPtr = func_ptr;
q->targetThread = emscripten_main_browser_thread_id();

EM_FUNC_SIGNATURE argumentsType = sig & EM_FUNC_SIG_ARGUMENTS_TYPE_MASK;
va_list args;
Expand Down Expand Up @@ -864,6 +878,7 @@ int EMSCRIPTEN_KEEPALIVE _emscripten_call_on_thread(
q->functionEnum = sig;
q->functionPtr = func_ptr;
q->satelliteData = satellite;
q->targetThread = targetThread;

EM_FUNC_SIGNATURE argumentsType = sig & EM_FUNC_SIG_ARGUMENTS_TYPE_MASK;
va_list args;
Expand Down