-
Notifications
You must be signed in to change notification settings - Fork 18k
x/tools/internal/lsp/regtest: various flakes due to go.mod contention #37318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@ridersofrohan: Can you take a look at these failures? The error messages at the top seem to be caused by |
Thanks, this looks like a flake. Let me investigate first. |
Ok, I think this is a test simply timing out due to an extremely slow initial workspace load (we delete the temp modfiles when the test exits, so I think the -modfile error log is a red herring). Locally, the entire regtest suite runs in ~3s, but I can produce a similar error message by setting the test timeout to 1s. As a temporary fix, I'm going to increase the timeout to 60s to see if this still flakes. Longer term, I need to understand why this test runs more than an order of magnitude slower on the builders. |
Change https://golang.org/cl/220280 mentions this issue: |
In golang.org/issues/37318, it appears that the regtests are occasionally timing out on the builders. I'm not sure why they're running so slowly, but as a temporary measure lets increase the test timeout to hopefully eliminate flakes. Updates golang/go#37318 Change-Id: Id9c854ea518c9dc3618ea2b521fe6133e71af8b2 Reviewed-on: https://go-review.googlesource.com/c/tools/+/220280 Run-TryBot: Robert Findley <[email protected]> Reviewed-by: Rohan Challa <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
Here is a similar failure on
|
Hmm, my intuition was that the timeout was not specific to this test, but the fact that it is the same test and that it timed out at 60s suggests that this is a real failure. I haven't yet been able to reproduce, but I will skip this test until it is understood. |
Change https://golang.org/cl/220360 mentions this issue: |
This test is flaking on the Trybots. Skip it until this is understood. Updates golang/go#37318 Change-Id: Ie4c1db47797b88d5eb201a73c4ddfb5481f362ea Reviewed-on: https://go-review.googlesource.com/c/tools/+/220360 Run-TryBot: Robert Findley <[email protected]> Reviewed-by: Rohan Challa <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
Looks like
Looks like the same or a similar root cause. |
Root cause is still unknown, but I agree it's probably the same. It's actually good to see this test failing as well. I've been trying to catch this failure in https://golang.org/cl/220362, but have not yet succeeded. I'm going to remove all my t.Parallel() calls in hopes that this just goes away. It's concerning that we don't understand this, but pragmatically this level of parallelism is unrealistic and I'd prefer not to have to skip all of these tests. If this reoccur's after the parallelism is removed, I'll skip the tests. |
Change https://golang.org/cl/221379 mentions this issue: |
Originally I decided to use t.Parallel() in hopes of uncovering new bugs. That may have worked... but manifested as rare flakes that are difficult to diagnose (golang.org/issues/37318). Since this level of parallelism is extremely unlikely in normal gopls workloads, I'm going to remove the t.Parallel() calls in hopes of eliminating this flakiness. I'd rather be able to continue running these tests. Also, don't run in the 'Shared' execution mode by default: normal gopls execution is either as a sidecar (the Singleton execution mode), or as a daemon (the Forwarded execution mode). Un-skip the TestGoToStdlibDefinition test, as hopefully it will no longer flake. Updates golang/go#37318. Change-Id: Id73ee3c8702ab4ab1d039baa038fbce879e38df8 Reviewed-on: https://go-review.googlesource.com/c/tools/+/221379 Run-TryBot: Robert Findley <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]>
Possibly the same failure mode in
|
@bcmills, where was this test failure? Was this on a builder, or did you produce it locally? I ask both because I haven't been able to reproduce locally, and to check if the build had golang.org/cl/221379. |
Sorry, thought I had pasted a link. That one is from https://build.golang.org/log/e78c48b9bfdaee4e6529403cb7d074969d334387. A lot of these seem to be occurring on the I wonder if you could reproduce it locally by randomly sending |
Change https://golang.org/cl/226957 mentions this issue: |
Wrap the regtest test servers to capture jsonrpc2 logs, so that they can be printed on test failure. There was a little bit of complication to implement this in the 'Shared' execution mode, and since we're not really using this mode I just deleted it. Updates golang/go#36897 Updates golang/go#37318 Change-Id: Ic0107c3f317850ae3beb760fc94ae474e647cb78 Reviewed-on: https://go-review.googlesource.com/c/tools/+/226957 Run-TryBot: Robert Findley <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Cottrell <[email protected]>
This may also affect internal/lsp/cmd tests. CC @ianthehat
Edit: this is possibly related but unfortunately not the root cause. See below. |
Oh no, as soon as I wrote this I realized that the write synchronization had been removed at the CL I was debugging, so this can't be the cause :( But I think the part about connection errors not being surfaced is still important, and fixing jsonrpc2 shutdown is likely critical to solving these flakes, if they still exist. |
Using fetchlogs/greplogs, I categorized flakes for the last month into two groups:
The problem appears to be that we re-use the temp modfile path, and not all actions we perform on it are atomic. @stamblerre is looking into a couple possible fixes: (1) locking, or (2) using a different temp modfile location for each invocation.
|
Change https://golang.org/cl/237517 mentions this issue: |
Refactor internal/lsp/cache to use a new temporary go.mod file for each go command invocation. This cleans up the abstraction in the source package, as we no longer are aware of temporary go.mod files. This will also fix the raciness of reusing the same temporary go.mod file for each invocation. Updates golang/go#37318. Fixes golang/go#39504. Change-Id: I90bc17a678b5df222ab598c8f7dbf6c6fdd393f6 Reviewed-on: https://go-review.googlesource.com/c/tools/+/237517 Run-TryBot: Rebecca Stambler <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]>
@findleyr: Now that the load concurrency errors should be gone, I'll try to take a look at the TestInconsistentVendoring flakes (since I wrote the test). Do you have any logs for those? |
I can find some, but those were pretty rare and may have also been related to mod concurrency. In the meantime, while grepping for logs, I discovered the more concerning #39696, which may be related to your fix. |
Here's an example of a TestInconsistentVendoring flake from today: |
I just looked around in the build dashboard, and I agree that #39696 does look more concerning. Am currently trying to reproduce it locally, but also improved the logging in CL 238737, so hopefully that will help. I also noticed a few more test failures on the dashboard (1, 2, 3), which all include same error message:
It appears once in a |
Ran into another instance of this on CL 239117: https://storage.googleapis.com/go-build-log/206ee872/linux-amd64_efb99276.log. Edit: The error turned out to be unrelated. |
I haven't seen one of these in ages. Close? |
https://build.golang.org/log/fe27a791913be11bc439f9612e1ba92f52673e54
CC @stamblerre @jayconrod @matloob
The text was updated successfully, but these errors were encountered: