-
Notifications
You must be signed in to change notification settings - Fork 18.1k
runtime: failures in TestCtrlHandler with "could not read stdout: EOF" on windows-arm64 #49458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Marking as release-blocker since this appears to be a very recent regression (CC @aclements @mknyszek). |
CC @bufflig |
Yea, this looks new and bad. I'll take it. |
@mknyszek - I wonder if you have thoughts on this? It's reproducible beginning with the enabling of the new Pacer. It is fairly uncommon, but if you e.g. add a runtime.Gosched() after signal.Notify in the test program (which, of course should change nothing), it happens reliably but only on arm64. Anyway, even when it happens, the signal walks through our handlers and we end up thinking we delivered the signal (sigsend in sigqueque.go), the sig.state is sigReceiving both when it works and when it doesn't reach the channel in the test program, and notewakeup is ending up in semawakeup. So everything happens as expected, but we never get anything on the channel (or the m does not get woken up). This happens with a low probability if the program is untouched, and a much higher probability if i yield after calling signal.Notify. If I disable the new pacer, I cannot get it to fail at all. Could it be that the m does not get woken up in time? The process sits idle for a while and then it gets forcefully killed, which is also a little mysterious, but it does not seem to crash. It's possible taskkill eventually teminates it. Then the actual test fails as it gets EOF on the pipe. |
Hm... The only thing the new pacer changes is when GC's happen, basically (in theory). Does |
You're spot on. That seems to be the problem. What happens is, I'm pretty sure, this:
This happens when a GC is needed between the signal has arrived and the signal is processed by the little test program, and would not happen on amd64 as preemptM is implemented there. For unknown reasons, GC sometimes happen early and there'n no neeed for GC between the signal arrives and the program is terminated in a normal way, sometimes a GC is needed in the "bad" situation. I see three solutions:
I suggest 2 for 1.18, 3 for 1.19. Any input is welcome. |
Implementing preemptM shouldn't be hard. But I agree we probably don't do it in the freeze. |
Somewhat related, I have this CL for preemption on windows/arm, which I apparently failed to get in again and which probably needs a non-trivial rebase at this point. |
That is for 32bit arm though? I think the adding of arm64 conflicts with it, but as I just looked at the code, I can confirm that it would be a simple rebase. I could try to extend it to arm64. Still, I feel it would be best done in 1.19, but I could change my mind if you think it's better to do it now. |
Correct, that is for 32-bit ARM, but the code is probably quite similar for 64-bit ARM. I'm not advocating for implementing preemption for 1.19, since that's something that really benefits from some soak time. That said, I'm somewhat skeptical of things that really depend on asynchronous preemption for correctness (versus latency). |
Yea, the whole SetConsoleCtrlHandler thing is somewhat sketchy as it needs to completely hang its thread to not immediately terminate the program. To handle i.e. ctrl-C, nothing like that is needed, but if we chose to handle these "termination" signals, we will get into trouble. I don't know if it's even a valuable functionality, but what do I know :) |
Would it be possible to just block the thread, instead of doing an infinite sleep? Or, drop the P before going to sleep? |
This seems to make the test pass reliably, without depending on async preemption. But I'm not sure how safe it is. What if the handler runs when we are in some interesting state (e.g. no P, non-Go thread, or some critical section of the runtime)? Can it happen? |
@cherrymui The |
Thanks @mknyszek . Does it have a P? If so, why? If not, why we need to preempt it? |
I made some printouts in the handler and in some other parts of the code, and it certainly appears to run on the same OS thread as some other stuff. The OS thread ID of the m it's signalling is sometimes the same as the thread executing the handler. Unless I messed it up in some way, of course. The thread is also often (but not always) associated with m0. I'll try the block call, if my theories are correct, that should do it. |
Ah, I see. It's because we treat it like a Windows callback (via |
So, that's my bad, in effect |
Okay, thanks! Yeah, if it is called via cgocallback, it would look like a regular Go function, so blocking the function probably will work. |
If I were to make a guess, when we switched to I can dig through the history to confirm this is the case. |
It appears that Cherry's suggestion fixes it, at least I can no longer reproduce. And it fits all the symptoms I've seen. The preemption on amd64 makes it work anyway, right? |
Yeah, async preemption probably makes it work on AMD64. It fails occasionally (with the original code) if I set GODEBUG=asyncpreemptoff=1. |
Cool, thanks! I can make the cl, if that's fine with you (I need the practice :)). |
Sure, no problem :) |
Change https://golang.org/cl/364556 mentions this issue: |
greplogs --dashboard -md -l -e 'FAIL: TestCtrlHandler'
2021-11-08T17:46:34-2e210b4/windows-arm64-10
2021-11-07T04:56:11-85493d5/windows-arm64-10
2021-11-05T22:53:55-3b7e376/windows-arm64-10
2021-11-05T22:30:17-b07c41d/windows-arm64-10
2021-11-05T21:34:10-bb53fd7/windows-arm64-10
2021-11-04T21:50:21-bfd74fd/windows-arm64-10
2021-11-04T21:40:51-2c32f29/windows-arm64-10
2021-11-04T20:01:10-9b2dd1f/windows-arm64-10
The text was updated successfully, but these errors were encountered: