Skip to content

Mt deadlock issue #12309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

AlexanderJ
Copy link

Hi,

we have had massive problems with deadlocks with emscripten due to proxy to main thread calls.

For example if a worker-thread holding a mutex tries to allocate/deallocate memory and the main thread is then executing additional functionality, because other calls were queued on the main thread as well, trying to obtain the same mutex. This happens a lot in our Qt based application by signal/slot or invokeMethod calls.

To avoid this I introduced a new function emscripten_current_thread_process_proxied_queued_calls which replaces emscripten_current_thread_process_queued_calls and emscripten_sync_run_in_main_runtime_thread_ in situations were it would be dangerous to execute something else than proxied calls.

For us this worked very well.

Regarding a test case I'm unsure, I should be able to come up with something which reproduces the original issue, but I first seek feedback whether the solution might have unwanted side effects I just don't see yet.

Kind regards,
Alex

Alexander Jaehrling added 2 commits September 23, 2020 08:20
Now only proxy calls are executed for mutex, sleep
and other dangerous situations.
@welcome
Copy link

welcome bot commented Sep 23, 2020

Thank you for submitting a pull request! If this is your first PR, make sure to add yourself to AUTHORS.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 23, 2020

@kripken could this be related to recent deadlock issues you have been working on?

@AlexanderJ
Copy link
Author

Regarding the failed test I'm not really sure whether this might be an desired side effect of my modification. Someone with more knowledge in the area should have a look.

@kripken
Copy link
Member

kripken commented Sep 24, 2020

@AlexanderJ

To avoid this I introduced a new function emscripten_current_thread_process_proxied_queued_calls which replaces emscripten_current_thread_process_queued_calls and emscripten_sync_run_in_main_runtime_thread_ in situations were it would be dangerous to execute something else than proxied calls.

Maybe I'm misunderstanding this, but I'm confused - what is "something else"?

About the name, the only thing emscripten_current_thread_process_queued_calls does is process proxied calls, unless I've misunderstood @juj's code. That is, the call queue only contains proxied calls. So the name emscripten_current_thread_process_proxied_queued_calls (which adds _proxied) doesn't indicate to me what is actually new.

Reading the code, one difference seems to be that the new version copies to a local call, and it sets the pointer to 0. Another difference is that it drops a bunch of special-casing of main thread handling that the old function does. Is one of those two the key difference? Or something else?

@sbc100

This change is orthogonal to the change in my PR #12318 .

However if this PR just decreases the risk of a deadlock, then it's possible that that PR fixes the deadlock entirely, it would be good to test your code with that PR @AlexanderJ

@AlexanderJ
Copy link
Author

AlexanderJ commented Sep 24, 2020

Our problems with deadlocks started when migrating from version 1.38.47 to 1.39.8. The deadlocks were easily reproducible with our code and occurred very often. With this fix we were no longer able to reproduce it.

The difference of emscripten_current_thread_process_proxied_queued_calls is that it only handles calls which are of type EM_FUNC_SIG_SPECIAL_INTERNAL (these are the calls which are in the PROXY class of calls, e.g. those handling memory allocation) those are set to 0 to avoid executing them again from the main event loop call. The other calls are ignored and will only be handled when invoked directly from the browser, but not from within e.g. futex_wait.

@kripken based on the explanation and the source code changes you did, I don't think that your fix is related to our problem. That is that a non-proxy class call deadlocks with a proxy class call. IMHO this would still be possible.

@kripken
Copy link
Member

kripken commented Sep 24, 2020

Interesting, I wasn't aware of EM_FUNC_SIG_SPECIAL_INTERNAL. I don't see many docs for it, and reading the code I'm unsure why it makes sense to special-case those and not other things. So it's hard for me to reason about the correctness of this PR (handling fewer proxied things seems very dangerous, on the face of it).

That is that a non-proxy class call deadlocks with a proxy class call.

Do you have a testcase showing the issue? That could help greatly to understand what you are seeing.

I understand that this seems unrelated to #12318, but I'd suggest trying your application with that. You can do emsdk install tot to use it from tip of tree. It would be good to confirm whether it helps or not.

@juj do you have time to look at this?

@AlexanderJ
Copy link
Author

I think the main issue with the current implementation is that a mutex is expected to block execution. Due to the nature of the browser implementation this is not completely possible as things like memory allocation need to be delegated to the main thread. However, the call queue contains all sorts of calls even those which can lead to deadlocks. No, one expects that during waiting for a mutex other functionality of the same application can be executed. This is what I fixed.

@kripken
Copy link
Member

kripken commented Sep 26, 2020

I think the main issue with the current implementation is that a mutex is expected to block execution. Due to the nature of the browser implementation this is not completely possible as things like memory allocation need to be delegated to the main thread.

That's not accurate for memory allocation. malloc and free do take a mutex, but no proxying happens, and the operations they do while holding the mutex do not depend on another thread, so they are guaranteed to run to completion and not deadlock, AFAIK.

Other things do need to be delegated to the main thread, like I/O, so if the main thread blocks on a worker which is doing I/O that can be dangerous, you are definitely right there.

#10155 has two examples of known current deadlocks. Does this PR fix them? If so that could help understand what's going on here.

(If not, then note that #12318 and #12240 have landed, both of which may be relevant, so you can test them on a tip of tree build, emsdk install tot.)

@AlexanderJ
Copy link
Author

In the meantime I tested tot (btw. is there a way to ensure its the proper one I used? Where to look best. I want to double check).
This version does not work. I still get the deadlocks as expected.

One case is e.g. that a new thread is created while holding a mutex while the main thread receives is woken up by a Qt signal/slot connection from a background thread. Now the main thread instead of just creating the new worker thread executes function which obtains the same mutex and deadlocks. With my fix only the thread creating is executed, but not the Qt signal/slot (which is a delegated method without the EM_FUNC_SIG_SPECIAL_INTERNAL flag.

@AlexanderJ
Copy link
Author

AlexanderJ commented Oct 5, 2020

@kripken finally I put together a test case which deadlocks without and works fine with my fix.

#include <emscripten.h>
#include <emscripten/html5.h>
#include <emscripten/threading.h>
#include <pthread.h>

static pthread_mutex_t mutex=PTHREAD_MUTEX_INITIALIZER;

void dangerous(int number) {
	pthread_mutex_lock(&mutex);
	emscripten_log(EM_LOG_CONSOLE | EM_LOG_C_STACK, "I'm the dangerous call %d", number);
	pthread_mutex_unlock(&mutex);
}

void *worker(void *) {
	for (int i=0; i<20; ++i) {
		pthread_mutex_lock(&mutex);
		pthread_t thread;
		// do something which needs to be proxied to main thread, while holding the mutex
		pthread_create(&thread, /*attr:*/nullptr, &worker, nullptr);
		emscripten_log(EM_LOG_CONSOLE | EM_LOG_C_STACK, "I'm the worker");
		pthread_mutex_unlock(&mutex);

		// delegate something else to the main thread, which is unrelated, but uses the same mutex
		emscripten_async_run_in_main_runtime_thread_(EM_FUNC_SIG_VI, (void*)&dangerous, i);
	}
	return nullptr;
}

EM_BOOL eventHandler(double, void*)
{
	return EM_TRUE;
}

int main(int, char**)
{
	pthread_t thread;
	pthread_create(&thread, /*attr:*/nullptr, &worker, nullptr);
	emscripten_request_animation_frame_loop(eventHandler, 0);
}

@kripken
Copy link
Member

kripken commented Oct 6, 2020

Thanks for the testcase @AlexanderJ , that's very helpful!

IIUC, what happens here is that the worker takes the lock. It then does a synchronous proxy operation to the main thread (here, for pthread_create). The main thread handles an async proxy request it got sent earlier, in which it wants the same lock. The worker has it, so it blocks. The worker in turn is blocking because it's doing a synchronous proxy operation.

This seems like an impossible problem to solve in general, as this is a true logical deadlock. However, I think this PR might help make it less likely, if what happens is slightly different: the worker does the synchronous proxy operation to the main thread (for pthead_create). When the main thread wants to handle events, it sees the async proxy request from earlier, and starts to handle that first, which then wants the lock and we deadlock. If, instead, the main thread handled synchronous proxy events first, skipping all async ones while doing so, this problem would not happen.

As I said this wouldn't be a full solution since the deadlock can still happen, if main starts to run the async event before it gets the sync one. But it may still be worth doing!

Specifically, it seems like the right behavior to me to always process a sync event before any async ones. An async event must be ordered wrt other async events, but there is no ordering with sync ones, and handling sync ones would make things more responsive. In addition, it seems reasonable to assume that a sync proxy would be written to avoid things like taking a lock (because sync proxying is so costly already), which is the case in this example (and I guess in Qt in general, signal/slot connection are async?).

In other words, I'm proposing that emscripten_current_thread_process_queued_calls be modified to prioritize sync calls. I think that would be simpler than the current PR. What do you think @AlexanderJ @juj ? However, I admit I don't fully understand the EM_FUNC_SIG_SPECIAL_INTERNAL part of the current PR, so maybe I'm still missing something.

@AlexanderJ
Copy link
Author

AlexanderJ commented Oct 7, 2020

@kripken The trick is EM_FUNC_SIG_SPECIAL_INTERNAL that ensures that only proxied calls like e.g. the thread creation are handled. The other trick is that I use different functions depending on where I call the event loop. In all situation except the normal main thread event loop only the internal calls are handled. This means that a mutex cannot accidently call application functionality which uses another mutex, but only execute things like thread creation. Thus I think this PR avoids all such deadlocks.

I wonder if there are other synchronous calls wihtout EM_FUNC_SIG_SPECIAL_INTERNAL which can cause issues though.

@AlexanderJ
Copy link
Author

Qt offers synchronous and asynchronous calls to other threads. signal/slot connections are always asynchronous.

@juj
Copy link
Collaborator

juj commented Oct 12, 2020

In other words, I'm proposing that emscripten_current_thread_process_queued_calls be modified to prioritize sync calls. I think that would be simpler than the current PR. What do you think @AlexanderJ @juj ?

This would not be correct. The semantical model of sync and async calls is that all the calls will be observed in sequential order. The meaning of async is just a local optimization for the calling thread where the calling thread can say at the time of the call that "I do not care about the return value/result of the proxied call, so let me proceed as if the operation has already occurred, and skip waiting in this thread; I know it will be safe to proceed".

If async operations were processed before sync ones, it would break all kinds of proxied JS libraries, when the order of operations would get wrong (e.g. proxied WebGL, where glGet*()s are sync, and most other commands async)

A proxied function should only be called async if the function call will not observe any shared state. That is, generally a call to a JS/DOM function, or call to a C function that does not access shared state (since that could happen out of order w.r.t to the calling thread)

This PR seems to implement a restriction that certain (kind of arbitrary) thread cancellation points stop processing some proxied queue events, but will still process others. The procesing is implemented via a peekahead into the proxy queue, and latching on to the enum EM_FUNC_SIG_SPECIAL_INTERNAL. (the special/internal bits about those are more about code organization vs multithreading semantics, which makes this a bit arbitrary division)

This kind of queue peekahead will also break the semantical in-order model of execution of messages.

The PR does seemingly fix the issue, but only by stopping processing of the affected proxied operations to avoid the priority inversion.

The issue with the code posted in #12309 (comment) is twofold:

  1. on the line
emscripten_async_run_in_main_runtime_thread_(EM_FUNC_SIG_VI, (void*)&dangerous, i);

where the async proxying functionality is used improperly: if one performs an async proxied operation, one must be ready to accept that the operation will indeed occur asynchronously with respect to the calling thread. In this case, the calling function will have time to grab the mutex on the next loop iteration before dangerous has time to get going.

Then dangerous will execute on the main thread, and will halt because it cannot get the mutex, because worker did exactly what it was asked to: it was told it would be safe to proceed asynchronously (and steal the mutex).

The main thread has now been set up to block on the worker, this is already alone a recipe for disaster. Main thread code should be designed not to need to hold ultimatum locks like this, because the main thread cannot block long periods of times on the web.

Then we get the second part of this:

  1. The worker proceeds to issue a pthread_create(), which under the current Emscripten proxying model is a synchronously proxied function onto the main thread. So the worker pauses to wait for the main thread to finish spawning the thread. But main thread is paused on the worker from before.

The root issue here is the use of the emscripten_async_run_in_main_runtime_thread_ proxying functionality, and the special behavior of pthread_create compared to native POSIX platforms. To fix the scenario, either

  • switch the proxying code to instead be synchronous, so that the worker cannot steal the needed mutex, or
  • issue a dummy no-op synchronous call from worker to main thread before taking the lock, to simulate a __sync_synchronize(); style of operation w.r.t. the proxying model.

Another way to fix this particular issue would be if we can improve pthread_create() to always be an asynchronously proxied function, although one will still be able to reproduce problems like this by substituting pthread_create() with any other API call that will need to synchronously proxy.

The rule of thumb is that every time you issue any proxied C or JS call (which includes any syscall, so file i/o), pthread_create, emscripten_webgl_create_context or emscripten_set_canvas_element_size, think of those as the main thread just having successfully acquired a lock that the worker will need to pause to wait on until the main thread finishes the call. If the worker then at the same time is holding on a resource that will make it impossible for the main thread to finish the requested call in question, you will have a deadlock.

I agree that this is quite a big gotcha compared to other platforms, but that is the rules that JS multithreading imposes.

The root question perhaps then for you is what are those primitives in question that cause the deadlock - I wonder if pthread_create() was involved in Qt codebase as well? If so, then the real investigation is whether we could safely improve pthread_create() to run with asynchronous proxying rather than synchronous? See in particular this code:

// Synchronously proxy the thread creation to main thread if possible. If we

What would happen if you commented lines 712-714 out to make pthread_create() operate asynchronously? Would that avoid the deadlock? (I don't 100% recall off the top of my head why pthread_create is not always asynchronous today, will need to read into the code to refresh my memory...)

@AlexanderJ
Copy link
Author

@juj for me this sounds like the concept Qt for Web Assembly is based on is fundamentally flawed. The only solution for me would be that the Qt main thread is on a worker and proxies all UI related calls (to access the webgl view they use) to the main thread. Is this assumption correct in your opinion?

@kripken
Copy link
Member

kripken commented Oct 19, 2020

@juj

Thanks for the info!

Very important to know that about the ordering assumptions here. We should document that - it doesn't seem to be mentioned atm. I opened #12546 for that.

I don't 100% recall off the top of my head why pthread_create is not always asynchronous today

@RReverser mentioned recently that we could do that with a dedicated worker just for pthread_create. It could work, but it does overlap a little with PROXY_TO_PTHREAD. Speaking of that mode, @AlexanderJ , would it be possible to run Qt with it? That would be better for responsiveness, and also allows synchronous pthread_create.

However, even PROXY_TO_PTHREAD does not solve the deadlock here in general - it does for pthread_create, but not for another synchronous proxy command. Was pthread_create just an example there, or the actual blocker for you @AlexanderJ ?

@AlexanderJ
Copy link
Author

@kripken last time I talked to the guys at Qt it seemed that PROXY_TO_PTHREAD is not an option for them. Qt generally uses the main thread for all drawing. For Web Assembly they use an WebGL view to render. I'm not sure whether PROXY_TO_PTHREAD would be a good solution as every drawing call would then need to be proxied which seems to be a lot of thread switches which may lead to poor performance.

Thus a solution where another thread would take care of this would IMHO make sense to me. However, this would IMHO have a similar effect on the order of the event queue. As the queued jobs will be processed by different threads iIt will no longer be executed in order as the order in which other threads process the calls is non-deterministic.

I'm still not convinced though that a different execution order for things like thread creation or file I/O and what else is proxied to main thread would hurt so much. @juj could you give an example where this might be an issue for these internal calls? Maybe we just need to handle a more specific set of proxied calls the way I did.

@RReverser
Copy link
Collaborator

I'm not sure whether PROXY_TO_PTHREAD would be a good solution as every drawing call would then need to be proxied which seems to be a lot of thread switches which may lead to poor performance.

Note that this was true in the past, but nowadays we have OffscreenCanvas which allows to draw to the canvas / GPU directly from a Worker, and Emscripten already leverages it when possible.

Thus, using PROXY_TO_PTHREAD could still improve performance instead, as drawing would no longer be blocked by any user actions happening on the main thread.

@juj
Copy link
Collaborator

juj commented Oct 20, 2020

@juj for me this sounds like the concept Qt for Web Assembly is based on is fundamentally flawed. The only solution for me would be that the Qt main thread is on a worker and proxies all UI related calls (to access the webgl view they use) to the main thread. Is this assumption correct in your opinion?

Sorry, I do not know about Qt code to be able to say if it is flawed or not. However they do need to adhere to the web restrictions here with respect to functions that are proxied.

Qt generally uses the main thread for all drawing. For Web Assembly they use an WebGL view to render. I'm not sure whether PROXY_TO_PTHREAD would be a good solution as every drawing call would then need to be proxied which seems to be a lot of thread switches which may lead to poor performance.

All WebGL draw commands are proxied asynchronously, so there is mostly increased latency. Resource creation, glGetError() and other getters are proxied synchronously.

I'm still not convinced though that a different execution order for things like thread creation or file I/O and what else is proxied to main thread would hurt so much.

It would - changing the semantics of what async proxying means is unfortunately not open for debate, because existing code depends on the ordering guarantee, and the guarantee also extends to user-facing code, so weakening the guarantee for other codebases could cause them to start randomly breaking.

If you want to implement a real asynchronous and out of order proxied message, you can do that manually by calling postMessage() yourself, or by creating C-based message queue inside your application that has different queueing semantics. But the core proxied model must be in-order.

@juj could you give an example where this might be an issue for these internal calls?

Take a look at WebGL proxying for an example that will break. Also user code in other projects could break.

In general I agree that Qt probably should not ship with required PROXY_TO_PTHREAD mode, because it can depend a lot on the user project whether it can afford to run in Worker or not.

I would recommend as a first line to look at switching pthread_create() to be async proxied to avoid this particular deadlock, and see if that is enough for Qt.

@msorvig
Copy link
Contributor

msorvig commented Oct 22, 2020

Hello from Qt -

I've read through this thread now, and it looks like the semantics of emscripten_async_run_in_main_runtime_thread_ are different from what we assumed they were when making use of it in Qt code. I think we can find a workaround though, by implementing a two-step async call:

void (*intermediate)(void *) = [](void *){
    emscripten_async_call([](void *) { 
        printf("hello from the (idle) main thread\n");
     }, nullptr, 0);
};
emscripten_async_run_in_main_runtime_thread_(EM_FUNC_SIG_VI, (void *)intermediate, nullptr);

This should make sure that the callback is made at a point where the main thread is not servicing a sync call from a worker thread.

(Does Emscripten already have an API which provides this functionality?)

@msorvig
Copy link
Contributor

msorvig commented Oct 22, 2020

Prospective patch for: Qt https://codereview.qt-project.org/c/qt/qtbase/+/318554

I have not been able to reproduce the deadlock myself, so please us know if this fixes the deadlock or not.

@kripken
Copy link
Member

kripken commented Oct 22, 2020

@msorvig

I may be missing something (multithreading is hard...) but I think that could still deadlock as follows:

  1. Worker sends async proxy request.
  2. Worker takes the lock.
  3. The main thread starts to handle the async proxy request, which needs that lock, so it blocks.
  4. Worker does a synchronous proxy to the main thread, blocking while it has that lock.

Step 3 is still a problem even if there is an additional async step to get there. In other words, even if the main thread is completely free when it starts to handle the async event, if the worker is blocking at that time, we will deadlock.

I think it's close to working, though. Something like a reorderable async event could help here. But I think the reordering would need to have knowledge of the lock, like this:

void* intermediate(void *) {
  emscripten_async_call([](void *) {
    if (try_lock()) {
      printf("hello from the (idle) main thread\n");
    } else {
      // The lock is used by another thread. Don't block; try again later.
      intermediate(nullptr);
    }
  }, nullptr, 0);
}
emscripten_async_run_in_main_runtime_thread_(EM_FUNC_SIG_VI, (void *)intermediate, nullptr);

Or even better, waitAsync (which waits on a lock and returns a callback with it taken) could be used to avoid the risk of multiple unlucky failures.

@kripken
Copy link
Member

kripken commented Oct 22, 2020

Oh, actually I think I'm wrong, sorry about that... the main thread will process queued events while it blocks (because it blocks in a special way that allows that). So it would be able to handle any synchronous or asynchronous events in the queue. The extra async step takes the reorderable event out of the queue. So looks like it should work!

@msorvig
Copy link
Contributor

msorvig commented Oct 22, 2020

Yes, I was going back and forth on this as well, and came to a similar conclusion that the "special" blocking in the main thread would make it work.

The Qt patch should take care of our one usage of emscripten_async_run_in_main_runtime_thread in Qt. @AlexanderJ will test to see if it is sufficient to solve the deadlocking issues he's seeing.

@kripken
Copy link
Member

kripken commented Oct 22, 2020

Sounds good @msorvig

If we confirm this works well, I think it would make sense to add an API for this, to make it easy to use.

@juj
Copy link
Collaborator

juj commented Oct 23, 2020

I think we can find a workaround though, by implementing a two-step async call:

This is a neat trick, and for calls known to be safe unorderable proxied, this can certainly help resolve such a priority inversion problem.

the main thread will process queued events while it blocks (because it blocks in a special way that allows that).

This will only be guaranteed to happen if using pthread mutexes - other types of locks implemented in custom codebases might not help. But certainly would move the boundary of when such issues can arise.

If we confirm this works well, I think it would make sense to add an API for this, to make it easy to use.

I'd recommend making such an unordered_async proxied API call very explicit and elaborate - opposite of easy - since for most of the user code, I'd expect that making a call unordered can have very odd consequences if user does not understand what can happen to it.

@AlexanderJ
Copy link
Author

Sorry, for the long delay, but I was busy with other things. I tested the Qt fix now and unfortunately it does not solve all our issues. We still have reproducible deadlocks when creating threads from a background thread. However, the remaining deadlocks do not seem to be related to Qt anymore. We hold a mutex during thread creation. I think it is related to that. Further investigation will be required, but any coments are appreciated.

@Denzeli
Copy link

Denzeli commented Mar 3, 2021

Hi, grappling with the same issue here. How I see it - there's no way we can avoid deadlocking even with the Qt patch proposed. It just pushes the possibility of a deadlock to the next event loop iteration, because messages requesting sync operations might be posted from worker threads after the call to emscripten_async_call, queueing behind again.

Base automatically changed from master to main March 8, 2021 23:49
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Apr 16, 2022
@stale stale bot closed this Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants