Description
Background: I have a program that has a steady state heap size of <100 MiB, but does a large rate of allocation, all of which is freed quickly. One of the main things I allocate is buffers that can be reused, so I intermediate the allocations through a sync.Pool
. It may be relevant to my theory below that I call Get
on only a single goroutine, but call Put
on multiple goroutines, and the Put
goroutines are all distinct from the Get
one.
I measured the rate that Get
"misses" in the freelist and falls through to the New
function, and was surprised by how high it was. Here are some observations I made while debugging:
- If I set
GODEBUG=gcstoptheworld=2
to disable concurrent garbage collection, the number of freelist misses is 25% as high as before, and my program is 50% faster end-to-end. - If I set
GOGC=off
to disable garbage collection altogether, the number of freelist misses is 1% as high as before, and my program is 60% faster end-to-end.
Digging into the implementation, it appears that sync.Pool
grows without bound until the stop-the-world phase of GC, at which point it abandons all of the freelist content. I think there may be a race here that makes this poorly tuned for concurrent GC, at least with my load pattern:
- Goroutine A calls
Get
, and finds the freelist empty. So it goes to allocate. - The allocation triggers a GC run. The GC begins happening concurrently.
- Goroutine A allocate several more items, and hands them off to Goroutines B1, B2, ...
- Goroutines B1, B2, ... call
Put
, contributing items back to the freelist. - GC reaches the stop-the-world phase and empties the freelist, throwing away the items just added.
- Go to (1).
From staring at gctrace=1
and printf debugging output, it appears my program may be repeatedly tickling this issue.
I wonder if it would be better to have sync.Pool
grow its freelist to a high water mark size, then decay exponentially in size with each GC round?
go version devel +1421bc1 Wed Jul 22 09:18:33 2015 +0000 linux/amd64