-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix(dispatch): reduce locking contention #4552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d60b6fe
to
f7685e1
Compare
The thing I like about mtx.Lock()
defer mtx.Unlock() is that it makes the management of the lock very clear and explicit. Your changes make sense to me. However, I wonder if, for improved maintainability, it would be worth extracting the logic that needs to hold locks into smaller functions where we can still use For example, instead of: func (d *Dispatcher) doMaintenance() {
d.mtx.RLock()
empty := make(map[*Route]map[model.Fingerprint]*aggrGroup)
for route, groups := range d.aggrGroupsPerRoute {
for fp, ag := range groups {
if ag.empty() {
if empty[route] == nil {
empty[route] = make(map[model.Fingerprint]*aggrGroup)
}
empty[route][fp] = ag
}
}
}
d.mtx.RUnlock()
} it could become: func (d *Dispatcher) makeEmptyAggrMap() map[*Route]map[model.Fingerprint]*aggrGroup {
d.mtx.RLock()
defer d.mtx.RUnlock()
empty := make(map[*Route]map[model.Fingerprint]*aggrGroup)
for route, groups := range d.aggrGroupsPerRoute {
for fp, ag := range groups {
if ag.empty() {
if empty[route] == nil {
empty[route] = make(map[model.Fingerprint]*aggrGroup)
}
empty[route][fp] = ag
}
}
}
return empty
}
func (d *Dispatcher) doMaintenance() {
empty := d.makeEmptyAggrMap()
} This approach could be applied to all the places where lock holding is involved, essentially making the code easier to maintain and reason about. wdyt? |
f7685e1
to
bf4e9c5
Compare
I reimplemented the change using custom nested structures. I will update PR description and benchmarks tomorrow. |
bf4e9c5
to
00dd27c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks 👌🏾 Just a few comments on wrapping test sencarios.
Would be nice to make metrics an interface so that we can also test if metrics are being set correctly, but I would say that is for a diff mr.
return as | ||
} | ||
|
||
// Test regular situation where we wait for group_wait to send out alerts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test function covers several scenarios. Would it be better to wrap each scenario in a t.Run() block? That would allow a single failing test to be run repeatedly for easier debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this in a separate MR.
The group_test.go
was split from dispatch_test.go
without modifications.
In dispatch_test.go
there is TestGroupsWithLimits
which checks the alertmanager_dispatcher_aggregation_group_limit_reached_total
metric.
Note: Limits are currently not used in Alertmanager, it's a feature imported by Grafana Mimir.
|
||
ag.stop() | ||
|
||
// Add an alert that started more than group_interval in the past. We expect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same idea here
} | ||
} | ||
|
||
// Resolve an alert, and it should be removed after the next batch was sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
} | ||
} | ||
|
||
// Resolve all remaining alerts, they should be removed after the next batch was sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this last scenario
- split dispatch.go into multiple files - update copyright Signed-off-by: Siavash Safi <[email protected]>
00dd27c
to
46b464c
Compare
Add custom nested map implementation with locks at root and branches. This allows to read/write operation to not always block each other. Signed-off-by: Siavash Safi <[email protected]>
46b464c
to
2a27e78
Compare
Reduce the amount of time spent holding locks in the dispatcher by using the new data structures: - doMaintenance() snapshots empty groups and deletes them afterwards - Groups() snapshots routes and queries groups per route on demand - processAlert() only locks the group it is processing an alert for None of the above 3 methods hold any locks directly. This results in -68% maintenance overhead or +12991% alert processing rate improvement: ``` goos: darwin goarch: arm64 pkg: github.com/prometheus/alertmanager/dispatch cpu: Apple M3 Pro │ bench-dispatch-main.txt │ bench-dispatch-nested-map-locks.txt │ │ sec/op │ sec/op vs base │ Dispatch_100k_AggregationGroups_10k_Empty-12 1.242µ ± 1% 1.193µ ± 5% -3.91% (p=0.037 n=10) Dispatch_100k_AggregationGroups_20k_Empty-12 1.329µ ± 2% 1.188µ ± 3% -10.65% (p=0.000 n=10) Dispatch_100k_AggregationGroups_30k_Empty-12 1.437µ ± 3% 1.288µ ± 7% -10.37% (p=0.001 n=10) Dispatch_100k_AggregationGroups_40k_Empty-12 1.695µ ± 8% 1.236µ ± 6% -27.06% (p=0.000 n=10) Dispatch_100k_AggregationGroups_50k_Empty-12 2.185µ ± 11% 1.286µ ± 11% -41.17% (p=0.000 n=10) Dispatch_20k_AggregationGroups_Groups_Impact-12 189007.708µ ± 6% 1.143µ ± 18% -100.00% (p=0.000 n=10) Dispatch_50k_AggregationGroups_Groups_Impact-12 602.17m ± 69% 28.09m ± 605% -95.33% (p=0.000 n=10) Dispatch_100k_AggregationGroups_Groups_Impact-12 1.543 ± 29% 1.272 ± 96% ~ (p=0.247 n=10) geomean 187.7µ 24.22µ -87.10% │ bench-dispatch-main.txt │ bench-dispatch-nested-map-locks.txt │ │ alerts/sec │ alerts/sec vs base │ Dispatch_100k_AggregationGroups_10k_Empty-12 1.616M ± 1% 1.767M ± 3% +9.36% (p=0.000 n=10) Dispatch_100k_AggregationGroups_20k_Empty-12 1.412M ± 2% 1.715M ± 1% +21.45% (p=0.000 n=10) Dispatch_100k_AggregationGroups_30k_Empty-12 1.235M ± 4% 1.637M ± 4% +32.58% (p=0.002 n=10) Dispatch_100k_AggregationGroups_40k_Empty-12 950.9k ± 14% 1614.0k ± 8% +69.73% (p=0.000 n=10) Dispatch_100k_AggregationGroups_50k_Empty-12 693.5k ± 18% 1370.7k ± 17% +97.65% (p=0.000 n=10) Dispatch_20k_AggregationGroups_Groups_Impact-12 5.586 ± 14% 2027690.000 ± 5% +36299398.75% (p=0.000 n=10) Dispatch_50k_AggregationGroups_Groups_Impact-12 3.277 ± 40% 742918.500 ± 167% +22667131.12% (p=0.000 n=10) Dispatch_100k_AggregationGroups_Groups_Impact-12 1.424 ± 4255939% 252756.500 ± 73% +17749654.21% (p=0.000 n=10) geomean 9.134k 1.196M +12991.42% │ bench-dispatch-main.txt │ bench-dispatch-nested-map-locks.txt │ │ maintenance_overhead_% │ maintenance_overhead_% vs base │ Dispatch_100k_AggregationGroups_10k_Empty-12 17.185 ± 7% 5.905 ± ? -65.64% (p=0.002 n=10) Dispatch_100k_AggregationGroups_20k_Empty-12 36.50 ± 11% 13.66 ± 26% -62.58% (p=0.000 n=10) Dispatch_100k_AggregationGroups_30k_Empty-12 55.44 ± 13% 17.45 ± 26% -68.52% (p=0.002 n=10) Dispatch_100k_AggregationGroups_40k_Empty-12 125.65 ± 27% 27.03 ± 34% -78.49% (p=0.000 n=10) Dispatch_100k_AggregationGroups_50k_Empty-12 172.40 ± 36% 64.64 ± 60% -62.51% (p=0.000 n=10) geomean 59.62 18.97 -68.17% ``` Signed-off-by: Siavash Safi <[email protected]>
2a27e78
to
4520036
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to review, mostly because of how much was moved. At this point we should leave it like it is, because I've gone through the review. I would suggest in the future to move code in a separate PR because I spent a lot of time just reviewing lines that moved to a different file.
|
||
// notifyFunc is a function that performs notification for the alert | ||
// with the given fingerprint. It aborts on context cancelation. | ||
// Returns false iff notifying failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Returns false iff notifying failed. | |
// Returns false if notifying failed. |
// // Each branch of the map can hold its own R(W)Lock. | ||
type routeGroups struct { | ||
mu sync.RWMutex | ||
routeGroups map[*Route]*fingerprintGroups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a weird name conflict with the struct name. The comment is also slightly misleading. This struct isn't the map, but a structure that holds the map. Should this be called something like fingerprints because that's what it is pointing to?
type routeGroups struct { | ||
mu sync.RWMutex | ||
routeGroups map[*Route]*fingerprintGroups | ||
groupsNum *atomic.Int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would be better named in the reverse - numGroups
} | ||
} | ||
|
||
// fingerprintGroups is a map of fingerprints to aggregation groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is also technically wrong
type fingerprintGroups struct { | ||
mu sync.RWMutex | ||
aggrGroups map[model.Fingerprint]*aggrGroup | ||
groupsNum *atomic.Int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would also be better to rename as numGroups
fg.aggrGroups = make(map[model.Fingerprint]*aggrGroup) | ||
} | ||
// Check if we've reached the rate limit before creating a new group. | ||
if fg.LimitReached() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this not be a good time to return an error?
func (ag AlertGroups) Less(i, j int) bool { | ||
if ag[i].Labels.Equal(ag[j].Labels) { | ||
return ag[i].Receiver < ag[j].Receiver | ||
// Second pass: remove collected groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with not having the lock this whole time is that between defining what to remove above and actually removing it below, the state could have changed because we give up the read lock, then try to get a write lock. Who knows what happens in between. Is this maintenance function the one that take the significant amount of time?
If so, we could try to range over it to find empty groups, and when we do, instead of putting it in a slice, we could decide to do the expensive operation of taking a write lock, again reading to make sure it's still empty, then deleting/removing it. That would keep write locks down to only when we have something to do, even if we pay that price twice in the less likely scenario that we have cleanup to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this maintenance function the one that take the significant amount of time?
yes, if you check the benchmarks there is a huge perfomance penalty of locking the dispatcher every 30s.
Who knows what happens in between. Is this maintenance function the one that take the significant amount of time?
I think the code can be improved, but we don't want to keep the lock at the root while looping over branches.
When trying to remove a group, we can double check if it's still empty or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for correctness we have to re-check for emptyness inside an RWLock.
There are 3 atomic commits which can be reviewed separately, if that helps. |
Looks like there's a merge conflict. |
So #4607 turned out to be the major bottleneck in dispatch. |
Reduce the amount of time spent holding locks in the dispatcher by using
the new data structures:
None of the above 3 methods hold any locks directly.
This results in -68% maintenance overhead or +12991% alert processing rate improvement:
other changes
feat(dispatch): add nested map implementation
Add custom nested map implementation with locks at root and branches.
This allows to read/write operation to not always block each other.
chore(dispatch): split dispatch into multiple files
Signed-off-by: Siavash Safi [email protected]