fix(dispatch): reduce locking contention #4552

siavashs · 2025-09-08T18:30:18Z

Reduce the amount of time spent holding locks in the dispatcher by using
the new data structures:

doMaintenance() snapshots empty groups and deletes them afterwards
Groups() snapshots routes and queries groups per route on demand
processAlert() only locks the group it is processing an alert for

None of the above 3 methods hold any locks directly.

This results in -68% maintenance overhead or +12991% alert processing rate improvement:

goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/dispatch
cpu: Apple M3 Pro
                                                 │ bench-dispatch-main.txt │  bench-dispatch-nested-map-locks.txt   │
                                                 │         sec/op          │    sec/op      vs base                 │
Dispatch_100k_AggregationGroups_10k_Empty-12                  1.242µ ±  1%   1.193µ ±   5%    -3.91% (p=0.037 n=10)
Dispatch_100k_AggregationGroups_20k_Empty-12                  1.329µ ±  2%   1.188µ ±   3%   -10.65% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_30k_Empty-12                  1.437µ ±  3%   1.288µ ±   7%   -10.37% (p=0.001 n=10)
Dispatch_100k_AggregationGroups_40k_Empty-12                  1.695µ ±  8%   1.236µ ±   6%   -27.06% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_50k_Empty-12                  2.185µ ± 11%   1.286µ ±  11%   -41.17% (p=0.000 n=10)
Dispatch_20k_AggregationGroups_Groups_Impact-12          189007.708µ ±  6%   1.143µ ±  18%  -100.00% (p=0.000 n=10)
Dispatch_50k_AggregationGroups_Groups_Impact-12              602.17m ± 69%   28.09m ± 605%   -95.33% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_Groups_Impact-12               1.543 ± 29%    1.272 ±  96%         ~ (p=0.247 n=10)
geomean                                                       187.7µ         24.22µ          -87.10%

                                                 │ bench-dispatch-main.txt │       bench-dispatch-nested-map-locks.txt        │
                                                 │       alerts/sec        │     alerts/sec      vs base                      │
Dispatch_100k_AggregationGroups_10k_Empty-12             1.616M ±       1%        1.767M ±   3%         +9.36% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_20k_Empty-12             1.412M ±       2%        1.715M ±   1%        +21.45% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_30k_Empty-12             1.235M ±       4%        1.637M ±   4%        +32.58% (p=0.002 n=10)
Dispatch_100k_AggregationGroups_40k_Empty-12             950.9k ±      14%       1614.0k ±   8%        +69.73% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_50k_Empty-12             693.5k ±      18%       1370.7k ±  17%        +97.65% (p=0.000 n=10)
Dispatch_20k_AggregationGroups_Groups_Impact-12           5.586 ±      14%   2027690.000 ±   5%  +36299398.75% (p=0.000 n=10)
Dispatch_50k_AggregationGroups_Groups_Impact-12           3.277 ±      40%    742918.500 ± 167%  +22667131.12% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_Groups_Impact-12          1.424 ± 4255939%    252756.500 ±  73%  +17749654.21% (p=0.000 n=10)
geomean                                                  9.134k                   1.196M            +12991.42%

                                             │ bench-dispatch-main.txt │      bench-dispatch-nested-map-locks.txt       │
                                             │ maintenance_overhead_%  │ maintenance_overhead_%  vs base                │
Dispatch_100k_AggregationGroups_10k_Empty-12              17.185 ±  7%              5.905 ±   ?  -65.64% (p=0.002 n=10)
Dispatch_100k_AggregationGroups_20k_Empty-12               36.50 ± 11%              13.66 ± 26%  -62.58% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_30k_Empty-12               55.44 ± 13%              17.45 ± 26%  -68.52% (p=0.002 n=10)
Dispatch_100k_AggregationGroups_40k_Empty-12              125.65 ± 27%              27.03 ± 34%  -78.49% (p=0.000 n=10)
Dispatch_100k_AggregationGroups_50k_Empty-12              172.40 ± 36%              64.64 ± 60%  -62.51% (p=0.000 n=10)
geomean                                                    59.62                    18.97        -68.17%

other changes

feat(dispatch): add nested map implementation

Add custom nested map implementation with locks at root and branches.
This allows to read/write operation to not always block each other.

chore(dispatch): split dispatch into multiple files

split dispatch.go into multiple files
update copyright

Signed-off-by: Siavash Safi [email protected]

OGKevin · 2025-09-09T11:32:01Z

The thing I like about

mtx.Lock()
defer mtx.Unlock()

is that it makes the management of the lock very clear and explicit.

Your changes make sense to me. However, I wonder if, for improved maintainability, it would be worth extracting the logic that needs to hold locks into smaller functions where we can still use defer for unlocking.

For example, instead of:

func (d *Dispatcher) doMaintenance() {
    d.mtx.RLock()
    empty := make(map[*Route]map[model.Fingerprint]*aggrGroup)
    for route, groups := range d.aggrGroupsPerRoute {
        for fp, ag := range groups {
            if ag.empty() {
                if empty[route] == nil {
                    empty[route] = make(map[model.Fingerprint]*aggrGroup)
                }
                empty[route][fp] = ag
            }
        }
    }
    d.mtx.RUnlock()
}

it could become:

func (d *Dispatcher) makeEmptyAggrMap() map[*Route]map[model.Fingerprint]*aggrGroup {
    d.mtx.RLock()
    defer d.mtx.RUnlock()

    empty := make(map[*Route]map[model.Fingerprint]*aggrGroup)
    for route, groups := range d.aggrGroupsPerRoute {
        for fp, ag := range groups {
            if ag.empty() {
                if empty[route] == nil {
                    empty[route] = make(map[model.Fingerprint]*aggrGroup)
                }
                empty[route][fp] = ag
            }
        }
    }

    return empty
}

func (d *Dispatcher) doMaintenance() {
    empty := d.makeEmptyAggrMap()
}

This approach could be applied to all the places where lock holding is involved, essentially making the code easier to maintain and reason about.

wdyt?

siavashs · 2025-09-09T17:10:03Z

I reimplemented the change using custom nested structures.
This allows dispatcher to hold r/w locks at different branches of the data structure, so it is less blocking for parallel operations.
It also addresses your comment @OGKevin since we don't hold any locks in dispatcher methods.

I will update PR description and benchmarks tomorrow.

OGKevin

Looks 👌🏾 Just a few comments on wrapping test sencarios.

Would be nice to make metrics an interface so that we can also test if metrics are being set correctly, but I would say that is for a diff mr.

OGKevin · 2025-09-10T08:22:17Z

dispatch/group_test.go

+		return as
+	}
+
+	// Test regular situation where we wait for group_wait to send out alerts.


This test function covers several scenarios. Would it be better to wrap each scenario in a t.Run() block? That would allow a single failing test to be run repeatedly for easier debugging.

I think we can do this in a separate MR.

The group_test.go was split from dispatch_test.go without modifications.
In dispatch_test.go there is TestGroupsWithLimits which checks the alertmanager_dispatcher_aggregation_group_limit_reached_total metric.

Note: Limits are currently not used in Alertmanager, it's a feature imported by Grafana Mimir.

OGKevin · 2025-09-10T08:22:32Z

dispatch/group_test.go

+
+	ag.stop()
+
+	// Add an alert that started more than group_interval in the past. We expect


Same idea here

OGKevin · 2025-09-10T08:22:42Z

dispatch/group_test.go

+		}
+	}
+
+	// Resolve an alert, and it should be removed after the next batch was sent.


OGKevin · 2025-09-10T08:23:19Z

dispatch/group_test.go

+		}
+	}
+
+	// Resolve all remaining alerts, they should be removed after the next batch was sent.


And this last scenario

- split dispatch.go into multiple files - update copyright Signed-off-by: Siavash Safi <[email protected]>

Add custom nested map implementation with locks at root and branches. This allows to read/write operation to not always block each other. Signed-off-by: Siavash Safi <[email protected]>

Reduce the amount of time spent holding locks in the dispatcher by using the new data structures: - doMaintenance() snapshots empty groups and deletes them afterwards - Groups() snapshots routes and queries groups per route on demand - processAlert() only locks the group it is processing an alert for None of the above 3 methods hold any locks directly. This results in -68% maintenance overhead or +12991% alert processing rate improvement: ``` goos: darwin goarch: arm64 pkg: github.com/prometheus/alertmanager/dispatch cpu: Apple M3 Pro │ bench-dispatch-main.txt │ bench-dispatch-nested-map-locks.txt │ │ sec/op │ sec/op vs base │ Dispatch_100k_AggregationGroups_10k_Empty-12 1.242µ ± 1% 1.193µ ± 5% -3.91% (p=0.037 n=10) Dispatch_100k_AggregationGroups_20k_Empty-12 1.329µ ± 2% 1.188µ ± 3% -10.65% (p=0.000 n=10) Dispatch_100k_AggregationGroups_30k_Empty-12 1.437µ ± 3% 1.288µ ± 7% -10.37% (p=0.001 n=10) Dispatch_100k_AggregationGroups_40k_Empty-12 1.695µ ± 8% 1.236µ ± 6% -27.06% (p=0.000 n=10) Dispatch_100k_AggregationGroups_50k_Empty-12 2.185µ ± 11% 1.286µ ± 11% -41.17% (p=0.000 n=10) Dispatch_20k_AggregationGroups_Groups_Impact-12 189007.708µ ± 6% 1.143µ ± 18% -100.00% (p=0.000 n=10) Dispatch_50k_AggregationGroups_Groups_Impact-12 602.17m ± 69% 28.09m ± 605% -95.33% (p=0.000 n=10) Dispatch_100k_AggregationGroups_Groups_Impact-12 1.543 ± 29% 1.272 ± 96% ~ (p=0.247 n=10) geomean 187.7µ 24.22µ -87.10% │ bench-dispatch-main.txt │ bench-dispatch-nested-map-locks.txt │ │ alerts/sec │ alerts/sec vs base │ Dispatch_100k_AggregationGroups_10k_Empty-12 1.616M ± 1% 1.767M ± 3% +9.36% (p=0.000 n=10) Dispatch_100k_AggregationGroups_20k_Empty-12 1.412M ± 2% 1.715M ± 1% +21.45% (p=0.000 n=10) Dispatch_100k_AggregationGroups_30k_Empty-12 1.235M ± 4% 1.637M ± 4% +32.58% (p=0.002 n=10) Dispatch_100k_AggregationGroups_40k_Empty-12 950.9k ± 14% 1614.0k ± 8% +69.73% (p=0.000 n=10) Dispatch_100k_AggregationGroups_50k_Empty-12 693.5k ± 18% 1370.7k ± 17% +97.65% (p=0.000 n=10) Dispatch_20k_AggregationGroups_Groups_Impact-12 5.586 ± 14% 2027690.000 ± 5% +36299398.75% (p=0.000 n=10) Dispatch_50k_AggregationGroups_Groups_Impact-12 3.277 ± 40% 742918.500 ± 167% +22667131.12% (p=0.000 n=10) Dispatch_100k_AggregationGroups_Groups_Impact-12 1.424 ± 4255939% 252756.500 ± 73% +17749654.21% (p=0.000 n=10) geomean 9.134k 1.196M +12991.42% │ bench-dispatch-main.txt │ bench-dispatch-nested-map-locks.txt │ │ maintenance_overhead_% │ maintenance_overhead_% vs base │ Dispatch_100k_AggregationGroups_10k_Empty-12 17.185 ± 7% 5.905 ± ? -65.64% (p=0.002 n=10) Dispatch_100k_AggregationGroups_20k_Empty-12 36.50 ± 11% 13.66 ± 26% -62.58% (p=0.000 n=10) Dispatch_100k_AggregationGroups_30k_Empty-12 55.44 ± 13% 17.45 ± 26% -68.52% (p=0.002 n=10) Dispatch_100k_AggregationGroups_40k_Empty-12 125.65 ± 27% 27.03 ± 34% -78.49% (p=0.000 n=10) Dispatch_100k_AggregationGroups_50k_Empty-12 172.40 ± 36% 64.64 ± 60% -62.51% (p=0.000 n=10) geomean 59.62 18.97 -68.17% ``` Signed-off-by: Siavash Safi <[email protected]>

sysadmind

It took me a while to review, mostly because of how much was moved. At this point we should leave it like it is, because I've gone through the review. I would suggest in the future to move code in a separate PR because I spent a lot of time just reviewing lines that moved to a different file.

sysadmind · 2025-09-13T01:57:10Z

dispatch/notify.go

+
+// notifyFunc is a function that performs notification for the alert
+// with the given fingerprint. It aborts on context cancelation.
+// Returns false iff notifying failed.


Suggested change

// Returns false iff notifying failed.

// Returns false if notifying failed.

sysadmind · 2025-09-13T02:06:35Z

dispatch/map.go

+// // Each branch of the map can hold its own R(W)Lock.
+type routeGroups struct {
+	mu          sync.RWMutex
+	routeGroups map[*Route]*fingerprintGroups


This seems like a weird name conflict with the struct name. The comment is also slightly misleading. This struct isn't the map, but a structure that holds the map. Should this be called something like fingerprints because that's what it is pointing to?

sysadmind · 2025-09-13T02:08:20Z

dispatch/map.go

+type routeGroups struct {
+	mu          sync.RWMutex
+	routeGroups map[*Route]*fingerprintGroups
+	groupsNum   *atomic.Int64


This seems like it would be better named in the reverse - numGroups

sysadmind · 2025-09-13T02:14:44Z

dispatch/map.go

+	}
+}
+
+// fingerprintGroups is a map of fingerprints to aggregation groups.


This comment is also technically wrong

sysadmind · 2025-09-13T02:15:02Z

dispatch/map.go

+type fingerprintGroups struct {
+	mu         sync.RWMutex
+	aggrGroups map[model.Fingerprint]*aggrGroup
+	groupsNum  *atomic.Int64


I think this would also be better to rename as numGroups

sysadmind · 2025-09-13T02:18:00Z

dispatch/map.go

+		fg.aggrGroups = make(map[model.Fingerprint]*aggrGroup)
+	}
+	// Check if we've reached the rate limit before creating a new group.
+	if fg.LimitReached() {


Would this not be a good time to return an error?

sysadmind · 2025-09-13T02:39:07Z

dispatch/dispatch.go

-func (ag AlertGroups) Less(i, j int) bool {
-	if ag[i].Labels.Equal(ag[j].Labels) {
-		return ag[i].Receiver < ag[j].Receiver
+	// Second pass: remove collected groups


The trouble with not having the lock this whole time is that between defining what to remove above and actually removing it below, the state could have changed because we give up the read lock, then try to get a write lock. Who knows what happens in between. Is this maintenance function the one that take the significant amount of time?

If so, we could try to range over it to find empty groups, and when we do, instead of putting it in a slice, we could decide to do the expensive operation of taking a write lock, again reading to make sure it's still empty, then deleting/removing it. That would keep write locks down to only when we have something to do, even if we pay that price twice in the less likely scenario that we have cleanup to do.

Is this maintenance function the one that take the significant amount of time?

yes, if you check the benchmarks there is a huge perfomance penalty of locking the dispatcher every 30s.

Who knows what happens in between. Is this maintenance function the one that take the significant amount of time?

I think the code can be improved, but we don't want to keep the lock at the root while looping over branches.
When trying to remove a group, we can double check if it's still empty or not.

I think for correctness we have to re-check for emptyness inside an RWLock.

siavashs · 2025-09-13T13:32:05Z

It took me a while to review, mostly because of how much was moved. At this point we should leave it like it is, because I've gone through the review. I would suggest in the future to move code in a separate PR because I spent a lot of time just reviewing lines that moved to a different file.

There are 3 atomic commits which can be reviewed separately, if that helps.

sysadmind · 2025-09-30T01:14:29Z

Looks like there's a merge conflict.

siavashs · 2025-10-19T15:56:08Z

So #4607 turned out to be the major bottleneck in dispatch.
But I will simplify this PR and update it since there are still minor improvements we can add.

siavashs force-pushed the fix-dispatch-locking branch from d60b6fe to f7685e1 Compare September 9, 2025 08:12

siavashs changed the title ~~fix(dispatch): improve locking contention~~ fix(dispatch): reduce locking contention Sep 9, 2025

siavashs force-pushed the fix-dispatch-locking branch from f7685e1 to bf4e9c5 Compare September 9, 2025 17:07

siavashs force-pushed the fix-dispatch-locking branch from bf4e9c5 to 00dd27c Compare September 9, 2025 17:30

OGKevin reviewed Sep 10, 2025

View reviewed changes

chore(dispatch): split dispatch into multiple files

2815718

- split dispatch.go into multiple files - update copyright Signed-off-by: Siavash Safi <[email protected]>

siavashs force-pushed the fix-dispatch-locking branch from 00dd27c to 46b464c Compare September 10, 2025 09:25

feat(dispatch): add nested map implementation

689b82a

Add custom nested map implementation with locks at root and branches. This allows to read/write operation to not always block each other. Signed-off-by: Siavash Safi <[email protected]>

siavashs force-pushed the fix-dispatch-locking branch from 46b464c to 2a27e78 Compare September 10, 2025 09:56

siavashs marked this pull request as ready for review September 10, 2025 09:59

siavashs mentioned this pull request Sep 10, 2025

fix(dispatch): unblock processing in maintenance #4551

Closed

siavashs force-pushed the fix-dispatch-locking branch from 2a27e78 to 4520036 Compare September 10, 2025 10:05

sysadmind reviewed Sep 13, 2025

View reviewed changes

siavashs marked this pull request as draft October 19, 2025 15:56


		ag.stop()

		// Add an alert that started more than group_interval in the past. We expect

	// Returns false iff notifying failed.
	// Returns false if notifying failed.

fix(dispatch): reduce locking contention #4552

Are you sure you want to change the base?

fix(dispatch): reduce locking contention #4552

Uh oh!

Conversation

siavashs commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

other changes

feat(dispatch): add nested map implementation

chore(dispatch): split dispatch into multiple files

Uh oh!

OGKevin commented Sep 9, 2025

Uh oh!

siavashs commented Sep 9, 2025

Uh oh!

OGKevin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sysadmind left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siavashs Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siavashs commented Sep 13, 2025

Uh oh!

sysadmind commented Sep 30, 2025

Uh oh!

siavashs commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

siavashs commented Sep 8, 2025 •

edited

Loading

siavashs Sep 13, 2025 •

edited

Loading