-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Avoid static allocations in pthreads code #12055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Yeah I think it would be great to move this stuff in the native code at some point. |
This was referenced Aug 26, 2020
Merged
sbc100
approved these changes
Aug 27, 2020
kripken
added a commit
that referenced
this pull request
Sep 22, 2020
kripken
added a commit
that referenced
this pull request
Sep 24, 2020
The deadlock was introduced in #12055 which was released in 2.0.2. However, the issue is deeper as the deadlock was just made more frequent by that. Background: Our pthreads implementation has some extra complexity because of how the main thread works on the web, specifically, we can't block there. So we can't do Atomics.wait there. Instead, we do a busy-wait in effect, and to make that as least bad as possible, we special case the main thread. There is a memory address, the "main thread futex address", which indicates the futex the main thread is waiting on, or 0 if none. When a thread runs futex_wake, it first checks if the main thread is waiting on that futex by checking that address, and if it is, we notify it first before any workers. In effect this gives the main thread priority. And by using that address, the main thread can busy-wait on it, just polling to see when a pthread sets it to 0, which indicates the wait can stop. This broke in #12055 because it erroneously ended up setting the main thread futex address only on the main thread. We did not get that address to workers. Instead, they got a JS undefined. So they would not wake up the main thread. It is actually a lot more complicated than that, because futex_wait can recurse: while it does the busy-wait loop, it will run the event loop, so it does not starve pthreads and get deadlocks - that is, it checks if there are incoming proxied calls, and it runs them. To run the event loop we must take a mutex. With the right timing we can hit the slow paths in both the original mutex (leading to a call to futex_wait instead of a quick atomic operation) and in the proxying mutex, and then we get this recursion. This was not handled by the old code correctly, in two ways. First, it did not clear the main thread futex address when futex_wait exits due to a timeout. So it looked like we were still waiting. A later futex_wake on that futex would unset the main thread futex address, counting that as one of the things it wakes up - but nothing was woken up there, so it ended up not waking anyone up (if it was told to just wake 1 person). Second, futex_wake did not check if it was on the main thread. The main thread should never wake itself up, but it did, making it more confusing (I'm not sure if this caused deadlocks by itself, though). A possible third issue is that when recursing in that way we could have the main thread in effect waiting on two futexes, A and then B. If a worker wants to wake up A while in the recursed call that set B, we don't see A in the main thread futex address - there is just one such address, and B has trampled it. The old code would actually just stop waiting in this case, since it looped while the address is the address we set, which means that after it called the nested wait, which wrote B, when we return to the first loop we'd see 0 (if the wait for B ended cleanly) or B (if we timed out and due the bug mentioned earlier we left B as the main thread futex address erroneously). And both are not A so we'd stop. That wasn't correct, though - it's possible we should still be waiting for A. To fix all this, this PR does two things: Allocate the main thread futex address in C, so that it is accessible in workers properly. Change how futex_wait works: it sets the main thread futex address and unsets it before exiting. It also unsets it before doing a nested call, so that they cannot interfere with each other. This also adds a lot of assertions in that code, which would have caught the previous bug, and I also verified happen on our test suite, so we have decent coverage. The one tricky thing here is handling the case in the paragraph from earlier with futexes A and B that we wait for in a nested manner. When nested, we wait until B is done, then return to the loop for A. How can we tell if A is done or not? The pthread waking it up may have already had its chance to wake it. The solution taken here is to read the futex's memory value, exactly like when starting in futex_wait. That is, futex_wait checks if an address has a value, and only if it does then it waits. I believe it is valid to check it later as well, since the result is the same as if the main thread were busy with other stuff, then checked the value later, after the other thread did some work. See the comment in the source for more details. This fixes a test that was wrong in how it handled a race condition, and adds a disabled test that was very useful for manual debugging here. That test hammers on all the proxying and locking mechanisms at once, basically. It's not a small little test, but I couldn't find a way to simplify it, and it's just for manual debugging.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Just replaces some static allocations with mallocs that are never
freed. Optimally these could be actually static allocations in C in a
future refactoring perhaps.
Relative to #12052 which refactors the pthreads code first.