-
Notifications
You must be signed in to change notification settings - Fork 18k
sync: Pool appears to interact poorly with concurrent GC #11834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jacobsa, can you paste your gctrace output? |
Here is a snippet with
And here with
The latter looks much more 'orderly', with apparently one freelist item being thrown out each time. The former is more variable. I suspect you could reproduce this with a much smaller test case by creating a program that does For my notes: this is GoogleCloudPlatform/gcsfuse@04f3cfb and jacobsa/fuse@3755e07, running the benchmark like |
Thanks! Could you explain a bit about when your "allocMessage" logs are printed and what the count is? Also, roughly how large are the objects being allocated to the pool? And do you know roughly how much non-pool allocation is happening? (Given that your live heap size isn't changing at all, I assume there's basically no non-pool allocation, but wanted to confirm this.) |
Note that emptying the pools is actually one of the very first things GC does when triggered, since it wants to collect those objects in that cycle. Given the heap goal overshoot in the gctrace and the high allocation rate, I suspect this is equivalent to #11677. I should be able to tell better once I understand more about allocMessage. |
Thanks. The There are other sources of allocations in the program. Regarding the timing of clearing the pools: it appears the code here stops the world before clearing, and my understanding is that that happened last in the GC process. But I am probably wrong; I haven't looked closely, and the only design doc I could find seems like maybe it's out of date? I will go read #11677, thanks for the pointer. |
Okay, the confusion is that |
The program with concurrent GC is slower, because GC runs all the time.
The second GC starts in 10ms after the first one, while the wall clock time of the first is 9.17ms. Because concurrent GC constantly lags behind it also runs more frequently: every ~6ms vs ~28ms for non-concurrent GC. This explains higher poll miss rate. What makes the situation worse in this case is that most of the overallocated memory is live at the end of GC, because it is added to the pool, and the pool is cleared only at the beginning of GC. |
Regarding the pprof question above, --inuse_xxx shows data still live at the end of the last GC. --alloc_xxx shows the total number of things ever allocated, as of the last GC. Go knows the total allocated even since the last GC, but we report "as of the last GC" so that the two numbers are comparable. |
@dvyukov: Thanks, that mostly makes sense. The one thing I can't tell: Is "Goal of the second GC is set basically to the current heap size" a bug, an unfortunate side effect of my program in particular, or what? @rsc: Thanks for the background. I believe the reason I was mystified by this was that the allocation profile didn't show some very large allocations that only happened a few times. (Which makes sense, it's sampled. Although I do sometimes see such effects even when I've set |
Looks like a general issue in garbage collector. |
I believe Pool itself is as good as it can get with concurrent GC. (They are somewhat at odds, and I think the implementation does the best it can, clearing the pools at the start of the GC. That means that the objects used during that concurrent GC will not become available for collection until the next one, but so be it.) I believe the problems you are seeing are #11677 - the allocations are running too far ahead of the GC. As such, I'm going to close this bug. Please follow #11677, and if you're still seeing problems after that is fixed, feel free to reopen this one (if you can, I don't know what the GitHub permission model is) or open a new one. |
@aclements: I assume you're talking about the recently committed ones. They appear to make the problem significantly worse for me (in particular, 510fd13 does). Here are the throughputs reported by my benchmark at a selection of commits. It's slightly noisy so I wouldn't worry about single-digit changes, but it looks like the change at 510fd13 is significant:
(The numbers above are for an old version of the program, detailed below and mostly matching the one in my previous posts, where I was still using Here is a sample of
For my records, I am running the following with GoogleCloudPlatform/gcsfuse@d4bbf69:
|
Hmm. Well, the overallocation you were seeing is gone, but the trigger is really close of the goal in most of the cycles, which is a little strange. Roughly how many goroutines does your test have and how many do you expect are runnable at a given moment? |
For each request the program receives from fuse in the kernel, it starts a goroutine. The goroutine replies to the kernel and then goes away. This particular benchmark consists of a ton of serial requests from the kernel, so I expect at most a small number runnable goroutines at a time, but high churn in creating and destroying goroutines. |
Background: I have a program that has a steady state heap size of <100 MiB, but does a large rate of allocation, all of which is freed quickly. One of the main things I allocate is buffers that can be reused, so I intermediate the allocations through a
sync.Pool
. It may be relevant to my theory below that I callGet
on only a single goroutine, but callPut
on multiple goroutines, and thePut
goroutines are all distinct from theGet
one.I measured the rate that
Get
"misses" in the freelist and falls through to theNew
function, and was surprised by how high it was. Here are some observations I made while debugging:GODEBUG=gcstoptheworld=2
to disable concurrent garbage collection, the number of freelist misses is 25% as high as before, and my program is 50% faster end-to-end.GOGC=off
to disable garbage collection altogether, the number of freelist misses is 1% as high as before, and my program is 60% faster end-to-end.Digging into the implementation, it appears that
sync.Pool
grows without bound until the stop-the-world phase of GC, at which point it abandons all of the freelist content. I think there may be a race here that makes this poorly tuned for concurrent GC, at least with my load pattern:Get
, and finds the freelist empty. So it goes to allocate.Put
, contributing items back to the freelist.From staring at
gctrace=1
and printf debugging output, it appears my program may be repeatedly tickling this issue.I wonder if it would be better to have
sync.Pool
grow its freelist to a high water mark size, then decay exponentially in size with each GC round?The text was updated successfully, but these errors were encountered: