-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Mt deadlock issue #12309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mt deadlock issue #12309
Conversation
Now only proxy calls are executed for mutex, sleep and other dangerous situations.
Thank you for submitting a pull request! If this is your first PR, make sure to add yourself to AUTHORS. |
@kripken could this be related to recent deadlock issues you have been working on? |
Regarding the failed test I'm not really sure whether this might be an desired side effect of my modification. Someone with more knowledge in the area should have a look. |
Maybe I'm misunderstanding this, but I'm confused - what is "something else"? About the name, the only thing Reading the code, one difference seems to be that the new version copies to a local This change is orthogonal to the change in my PR #12318 . However if this PR just decreases the risk of a deadlock, then it's possible that that PR fixes the deadlock entirely, it would be good to test your code with that PR @AlexanderJ |
Our problems with deadlocks started when migrating from version 1.38.47 to 1.39.8. The deadlocks were easily reproducible with our code and occurred very often. With this fix we were no longer able to reproduce it. The difference of emscripten_current_thread_process_proxied_queued_calls is that it only handles calls which are of type EM_FUNC_SIG_SPECIAL_INTERNAL (these are the calls which are in the PROXY class of calls, e.g. those handling memory allocation) those are set to 0 to avoid executing them again from the main event loop call. The other calls are ignored and will only be handled when invoked directly from the browser, but not from within e.g. futex_wait. @kripken based on the explanation and the source code changes you did, I don't think that your fix is related to our problem. That is that a non-proxy class call deadlocks with a proxy class call. IMHO this would still be possible. |
Interesting, I wasn't aware of
Do you have a testcase showing the issue? That could help greatly to understand what you are seeing. I understand that this seems unrelated to #12318, but I'd suggest trying your application with that. You can do @juj do you have time to look at this? |
I think the main issue with the current implementation is that a mutex is expected to block execution. Due to the nature of the browser implementation this is not completely possible as things like memory allocation need to be delegated to the main thread. However, the call queue contains all sorts of calls even those which can lead to deadlocks. No, one expects that during waiting for a mutex other functionality of the same application can be executed. This is what I fixed. |
That's not accurate for memory allocation. Other things do need to be delegated to the main thread, like I/O, so if the main thread blocks on a worker which is doing I/O that can be dangerous, you are definitely right there. #10155 has two examples of known current deadlocks. Does this PR fix them? If so that could help understand what's going on here. (If not, then note that #12318 and #12240 have landed, both of which may be relevant, so you can test them on a tip of tree build, |
In the meantime I tested tot (btw. is there a way to ensure its the proper one I used? Where to look best. I want to double check). One case is e.g. that a new thread is created while holding a mutex while the main thread receives is woken up by a Qt signal/slot connection from a background thread. Now the main thread instead of just creating the new worker thread executes function which obtains the same mutex and deadlocks. With my fix only the thread creating is executed, but not the Qt signal/slot (which is a delegated method without the EM_FUNC_SIG_SPECIAL_INTERNAL flag. |
@kripken finally I put together a test case which deadlocks without and works fine with my fix.
|
Thanks for the testcase @AlexanderJ , that's very helpful! IIUC, what happens here is that the worker takes the lock. It then does a synchronous proxy operation to the main thread (here, for This seems like an impossible problem to solve in general, as this is a true logical deadlock. However, I think this PR might help make it less likely, if what happens is slightly different: the worker does the synchronous proxy operation to the main thread (for As I said this wouldn't be a full solution since the deadlock can still happen, if main starts to run the async event before it gets the sync one. But it may still be worth doing! Specifically, it seems like the right behavior to me to always process a sync event before any async ones. An async event must be ordered wrt other async events, but there is no ordering with sync ones, and handling sync ones would make things more responsive. In addition, it seems reasonable to assume that a sync proxy would be written to avoid things like taking a lock (because sync proxying is so costly already), which is the case in this example (and I guess in Qt in general, signal/slot connection are async?). In other words, I'm proposing that |
@kripken The trick is I wonder if there are other synchronous calls wihtout |
Qt offers synchronous and asynchronous calls to other threads. signal/slot connections are always asynchronous. |
This would not be correct. The semantical model of sync and async calls is that all the calls will be observed in sequential order. The meaning of async is just a local optimization for the calling thread where the calling thread can say at the time of the call that "I do not care about the return value/result of the proxied call, so let me proceed as if the operation has already occurred, and skip waiting in this thread; I know it will be safe to proceed". If async operations were processed before sync ones, it would break all kinds of proxied JS libraries, when the order of operations would get wrong (e.g. proxied WebGL, where glGet*()s are sync, and most other commands async) A proxied function should only be called async if the function call will not observe any shared state. That is, generally a call to a JS/DOM function, or call to a C function that does not access shared state (since that could happen out of order w.r.t to the calling thread) This PR seems to implement a restriction that certain (kind of arbitrary) thread cancellation points stop processing some proxied queue events, but will still process others. The procesing is implemented via a peekahead into the proxy queue, and latching on to the enum EM_FUNC_SIG_SPECIAL_INTERNAL. (the special/internal bits about those are more about code organization vs multithreading semantics, which makes this a bit arbitrary division) This kind of queue peekahead will also break the semantical in-order model of execution of messages. The PR does seemingly fix the issue, but only by stopping processing of the affected proxied operations to avoid the priority inversion. The issue with the code posted in #12309 (comment) is twofold:
emscripten_async_run_in_main_runtime_thread_(EM_FUNC_SIG_VI, (void*)&dangerous, i); where the async proxying functionality is used improperly: if one performs an async proxied operation, one must be ready to accept that the operation will indeed occur asynchronously with respect to the calling thread. In this case, the calling function will have time to grab the mutex on the next loop iteration before Then The main thread has now been set up to block on the worker, this is already alone a recipe for disaster. Main thread code should be designed not to need to hold ultimatum locks like this, because the main thread cannot block long periods of times on the web. Then we get the second part of this:
The root issue here is the use of the
Another way to fix this particular issue would be if we can improve pthread_create() to always be an asynchronously proxied function, although one will still be able to reproduce problems like this by substituting pthread_create() with any other API call that will need to synchronously proxy. The rule of thumb is that every time you issue any proxied C or JS call (which includes any syscall, so file i/o), pthread_create, emscripten_webgl_create_context or emscripten_set_canvas_element_size, think of those as the main thread just having successfully acquired a lock that the worker will need to pause to wait on until the main thread finishes the call. If the worker then at the same time is holding on a resource that will make it impossible for the main thread to finish the requested call in question, you will have a deadlock. I agree that this is quite a big gotcha compared to other platforms, but that is the rules that JS multithreading imposes. The root question perhaps then for you is what are those primitives in question that cause the deadlock - I wonder if pthread_create() was involved in Qt codebase as well? If so, then the real investigation is whether we could safely improve pthread_create() to run with asynchronous proxying rather than synchronous? See in particular this code: emscripten/src/library_pthread.js Line 709 in 59421d2
What would happen if you commented lines 712-714 out to make pthread_create() operate asynchronously? Would that avoid the deadlock? (I don't 100% recall off the top of my head why pthread_create is not always asynchronous today, will need to read into the code to refresh my memory...) |
@juj for me this sounds like the concept Qt for Web Assembly is based on is fundamentally flawed. The only solution for me would be that the Qt main thread is on a worker and proxies all UI related calls (to access the webgl view they use) to the main thread. Is this assumption correct in your opinion? |
Thanks for the info! Very important to know that about the ordering assumptions here. We should document that - it doesn't seem to be mentioned atm. I opened #12546 for that.
@RReverser mentioned recently that we could do that with a dedicated worker just for However, even |
@kripken last time I talked to the guys at Qt it seemed that Thus a solution where another thread would take care of this would IMHO make sense to me. However, this would IMHO have a similar effect on the order of the event queue. As the queued jobs will be processed by different threads iIt will no longer be executed in order as the order in which other threads process the calls is non-deterministic. I'm still not convinced though that a different execution order for things like thread creation or file I/O and what else is proxied to main thread would hurt so much. @juj could you give an example where this might be an issue for these internal calls? Maybe we just need to handle a more specific set of proxied calls the way I did. |
Note that this was true in the past, but nowadays we have Thus, using PROXY_TO_PTHREAD could still improve performance instead, as drawing would no longer be blocked by any user actions happening on the main thread. |
Sorry, I do not know about Qt code to be able to say if it is flawed or not. However they do need to adhere to the web restrictions here with respect to functions that are proxied.
All WebGL draw commands are proxied asynchronously, so there is mostly increased latency. Resource creation, glGetError() and other getters are proxied synchronously.
It would - changing the semantics of what async proxying means is unfortunately not open for debate, because existing code depends on the ordering guarantee, and the guarantee also extends to user-facing code, so weakening the guarantee for other codebases could cause them to start randomly breaking. If you want to implement a real asynchronous and out of order proxied message, you can do that manually by calling
Take a look at WebGL proxying for an example that will break. Also user code in other projects could break. In general I agree that Qt probably should not ship with required PROXY_TO_PTHREAD mode, because it can depend a lot on the user project whether it can afford to run in Worker or not. I would recommend as a first line to look at switching pthread_create() to be async proxied to avoid this particular deadlock, and see if that is enough for Qt. |
Hello from Qt - I've read through this thread now, and it looks like the semantics of
This should make sure that the callback is made at a point where the main thread is not servicing a sync call from a worker thread. (Does Emscripten already have an API which provides this functionality?) |
Prospective patch for: Qt https://codereview.qt-project.org/c/qt/qtbase/+/318554 I have not been able to reproduce the deadlock myself, so please us know if this fixes the deadlock or not. |
I may be missing something (multithreading is hard...) but I think that could still deadlock as follows:
Step 3 is still a problem even if there is an additional async step to get there. In other words, even if the main thread is completely free when it starts to handle the async event, if the worker is blocking at that time, we will deadlock. I think it's close to working, though. Something like a reorderable async event could help here. But I think the reordering would need to have knowledge of the lock, like this: void* intermediate(void *) {
emscripten_async_call([](void *) {
if (try_lock()) {
printf("hello from the (idle) main thread\n");
} else {
// The lock is used by another thread. Don't block; try again later.
intermediate(nullptr);
}
}, nullptr, 0);
}
emscripten_async_run_in_main_runtime_thread_(EM_FUNC_SIG_VI, (void *)intermediate, nullptr); Or even better, |
Oh, actually I think I'm wrong, sorry about that... the main thread will process queued events while it blocks (because it blocks in a special way that allows that). So it would be able to handle any synchronous or asynchronous events in the queue. The extra async step takes the reorderable event out of the queue. So looks like it should work! |
Yes, I was going back and forth on this as well, and came to a similar conclusion that the "special" blocking in the main thread would make it work. The Qt patch should take care of our one usage of emscripten_async_run_in_main_runtime_thread in Qt. @AlexanderJ will test to see if it is sufficient to solve the deadlocking issues he's seeing. |
Sounds good @msorvig If we confirm this works well, I think it would make sense to add an API for this, to make it easy to use. |
This is a neat trick, and for calls known to be safe unorderable proxied, this can certainly help resolve such a priority inversion problem.
This will only be guaranteed to happen if using pthread mutexes - other types of locks implemented in custom codebases might not help. But certainly would move the boundary of when such issues can arise.
I'd recommend making such an unordered_async proxied API call very explicit and elaborate - opposite of easy - since for most of the user code, I'd expect that making a call unordered can have very odd consequences if user does not understand what can happen to it. |
Sorry, for the long delay, but I was busy with other things. I tested the Qt fix now and unfortunately it does not solve all our issues. We still have reproducible deadlocks when creating threads from a background thread. However, the remaining deadlocks do not seem to be related to Qt anymore. We hold a mutex during thread creation. I think it is related to that. Further investigation will be required, but any coments are appreciated. |
Hi, grappling with the same issue here. How I see it - there's no way we can avoid deadlocking even with the Qt patch proposed. It just pushes the possibility of a deadlock to the next event loop iteration, because messages requesting sync operations might be posted from worker threads after the call to emscripten_async_call, queueing behind again. |
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant. |
Hi,
we have had massive problems with deadlocks with emscripten due to proxy to main thread calls.
For example if a worker-thread holding a mutex tries to allocate/deallocate memory and the main thread is then executing additional functionality, because other calls were queued on the main thread as well, trying to obtain the same mutex. This happens a lot in our Qt based application by signal/slot or invokeMethod calls.
To avoid this I introduced a new function emscripten_current_thread_process_proxied_queued_calls which replaces emscripten_current_thread_process_queued_calls and emscripten_sync_run_in_main_runtime_thread_ in situations were it would be dangerous to execute something else than proxied calls.
For us this worked very well.
Regarding a test case I'm unsure, I should be able to come up with something which reproduces the original issue, but I first seek feedback whether the solution might have unwanted side effects I just don't see yet.
Kind regards,
Alex