feat: Add a scoring plugin to distribute new groups evenly #357

usize · 2025-09-22T21:25:19Z

The plugin uses a LRU cache to assign higher scores to pods that have less recently handled a request matching no existing prefixes (thus, a new group). This should help even cache growth, as new groups are likely to add more new blocks to the cache than existing groups.

For simplicity, this implementation relies on state produced by the prefix-cache-scorer plugin in order to track whether a request hits or misses the cache. Because of that, prefix-cache-scorer is required for this to work properly in production.

Fixes: #346

Copilot

Pull Request Overview

This PR adds a new scoring plugin called "no-hit-lru-scorer" that helps distribute new groups evenly across pods by tracking which pods have least recently handled cache-miss requests. The plugin uses an LRU cache to give higher scores to pods that haven't recently served new groups, helping to balance cache growth.

Key changes:

Implements NoHitLRU scorer that tracks pod usage for cache-miss requests
Adds comprehensive test coverage for the new plugin functionality
Integrates the plugin into the registration system with sample configuration

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
pkg/plugins/scorer/no_hit_lru.go	Core implementation of the NoHitLRU scorer plugin
pkg/plugins/scorer/no_hit_lru_test.go	Comprehensive test suite covering functionality and edge cases
pkg/plugins/register.go	Registers the new plugin in the plugin system
deploy/config/sim-epp-no-hit-lru.yaml	Sample configuration demonstrating plugin usage

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/plugins/scorer/no_hit_lru_test.go

pkg/plugins/scorer/no_hit_lru.go

vMaroon

Overall looks good - a few comments.

vMaroon · 2025-10-02T12:35:09Z

pkg/plugins/scorer/no_hit_lru.go

+// - LRU ordering is with respect to when a pod last received a cold request.
+// - Least recently used (or never used) pods get highest score (1.0)
+// - Most recently used pods get lowest score (approaching 0.0)
+func (s *NoHitLRU) Score(ctx context.Context, cycleState *types.CycleState, request *types.LLMRequest, pods []types.Pod) map[types.Pod]float64 {


I think it might help readability if the main logical blocks in this call are separated to different functions.

vMaroon · 2025-10-02T12:37:01Z

pkg/plugins/scorer/no_hit_lru.go

+
+	// Read prefix cache state to determine if this is a cold request
+	// This is treated as an optimization - if the state isn't available, we assume cold request
+	prefixState, err := types.ReadCycleStateKey[*prefix.SchedulingContextState](cycleState, plugins.StateKey(s.prefixPluginName))


Not a blocker but would be great: do you think you can extend and support the same optimization for the precise-prefix-cache-scorer too? I personally am eager to use this scorer with the latter.

It is also possible that this optimization can be achieved with unequal weighting (e.g., ratio of 2:1 between prefix-cache scorers and this one) - but could also be fragile in some cases.

Great idea. Maybe I can add this in a follow up PR? I'm a bit worried about mixing concerns.

Sounds good.

vMaroon · 2025-10-02T12:56:55Z

pkg/plugins/scorer/no_hit_lru.go

+type NoHitLRU struct {
+	typedName        plugins.TypedName
+	lruCache         *lru.Cache[string, struct{}] // pod name -> dummy value (we only care about order)
+	mutex            *sync.RWMutex


the lru.Cache package is thread-safe, the Add and Keys calls are protected by an internal lock. The use of this mutex is therefore purely for snapshot semantics: If Keys() is called, the mutex makes sure no other request will progress before the scoring. Might be good to test the effect of this locking and profile the overhead as well.

Only with unit tests, but I did do a benchmark and added a test with multiple threads. Results support removing mutex imo. Being stale by 1ms on rare occasions shouldn't significantly break the intent behind this -- cache balancing. Here are benchmark results:

https://gist.github.com/usize/b936f67655029c05ba701218c1bfe5da

vMaroon · 2025-10-06T07:06:26Z

@usize looks good overall, thank you for the contribution! I think we're just missing updates to the architecture doc which contains info about plugins and their configuration - then good to do.

usize · 2025-10-06T22:06:07Z

@vMaroon

TYSM, I'll follow up with the "phase 2" that the task described, and also file an issue and make a patch for the the improvement mentioned here: #357 (comment) 🙏🏻

vMaroon · 2025-10-07T09:54:10Z

docs/architecture.md

+  - `lruSize` (optional): The maximum number of pods to track in the LRU cache. Defaults to 1024.
+
+**Note:** This scorer is designed to work alongside a prefix cache scorer (such as `prefix-cache-scorer` or
+`precise-prefix-cache-scorer`). If no prefix cache state is available, all requests are treated as cold.


I think following with a yaml example of what a configuration looks like is helpful here. Additionally, should state that when integrating with a prefix-cache scorer, the latter should be defined first in the scheduling profile.

vMaroon · 2025-10-08T17:06:51Z

Great work @usize, thank you!

/lgtm
/approve

vMaroon · 2025-10-08T17:19:38Z

@usize can you verify the commits? See this for more info https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification

When a request results in no cache hit it opens a new group. New groups are likely to disproportiately grow the cache where they land. To avoid uneven cache growth, a new scorer prefers to send new groups to pods that have least recently handled a request from a new group. Signed-off-by: usize <[email protected]>

Signed-off-by: usize <[email protected]>

If no prefix cache state is available, treat all requests as cold. This will still allow for some benefit in terms of balancing. Signed-off-by: usize <[email protected]>

Instead of inspecting the state of the LRU cache directly, it's possible to verify behavior via the usual scoring API. Signed-off-by: usize <[email protected]>

The LRU Cache's size should be included in its error message. Signed-off-by: usize <[email protected]>

Signed-off-by: Maroon Ayoub <[email protected]> Signed-off-by: usize <[email protected]>

…llm-d#358) Signed-off-by: Kay Yan <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]> Signed-off-by: usize <[email protected]>

Signed-off-by: usize <[email protected]>

Signed-off-by: Kellen Swain <[email protected]> Signed-off-by: usize <[email protected]>

Signed-off-by: usize <[email protected]>

* bump llm-d-kv-cache-manager version (v0.3.2-rc1) Signed-off-by: Maroon Ayoub <[email protected]> * bump kvc mgr version Signed-off-by: Maroon Ayoub <[email protected]> --------- Signed-off-by: Maroon Ayoub <[email protected]> Signed-off-by: Maroon Ayoub <[email protected]> Signed-off-by: usize <[email protected]>

Signed-off-by: usize <[email protected]>

…llm-d#368) Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo). Updates `github.com/onsi/ginkgo/v2` from 2.25.3 to 2.26.0 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.25.3...v2.26.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.26.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

usize force-pushed the lru-scorer branch 3 times, most recently from bc08bee to 5ee44e1 Compare September 23, 2025 15:44

vMaroon requested a review from Copilot September 26, 2025 22:38

Copilot AI reviewed Sep 26, 2025

View reviewed changes

pkg/plugins/scorer/no_hit_lru_test.go Outdated Show resolved Hide resolved

pkg/plugins/scorer/no_hit_lru.go Outdated Show resolved Hide resolved

vMaroon reviewed Sep 26, 2025

View reviewed changes

pkg/plugins/scorer/no_hit_lru.go Show resolved Hide resolved

usize force-pushed the lru-scorer branch 2 times, most recently from 1a56bda to 1e04f9d Compare September 26, 2025 23:10

usize requested a review from vMaroon September 26, 2025 23:15

vMaroon reviewed Oct 2, 2025

View reviewed changes

usize requested a review from vMaroon October 2, 2025 18:24

usize force-pushed the lru-scorer branch from b3f5ac4 to b9d8107 Compare October 6, 2025 17:13

vMaroon reviewed Oct 7, 2025

View reviewed changes

usize force-pushed the lru-scorer branch from 10d859e to a1bd4f5 Compare October 7, 2025 23:00

usize requested a review from vMaroon October 7, 2025 23:01

github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2025

github-actions bot approved these changes Oct 8, 2025

View reviewed changes

usize and others added 9 commits October 8, 2025 10:23

Add a config enabling the no-hit-lru scoring plugin.

f8dc19a

Signed-off-by: usize <[email protected]>

Remove superfluous comments

5e662a9

Signed-off-by: usize <[email protected]>

Plugins should never panic

92c2f40

Signed-off-by: usize <[email protected]>

Other scorers should not be dependencies.

69dcc2b

If no prefix cache state is available, treat all requests as cold. This will still allow for some benefit in terms of balancing. Signed-off-by: usize <[email protected]>

Avoid usage of reflection.

5cb2c13

Instead of inspecting the state of the LRU cache directly, it's possible to verify behavior via the usual scoring API. Signed-off-by: usize <[email protected]>

Errors should include parameter values when possible.

84562a5

The LRU Cache's size should be included in its error message. Signed-off-by: usize <[email protected]>

bump llm-d-kv-cache-manager version (llm-d#359)

3792c67

Signed-off-by: Maroon Ayoub <[email protected]> Signed-off-by: usize <[email protected]>

rename EPP config to kv-cache-utilization-scorer from kv-cache-scorer (…

53169dc

…llm-d#358) Signed-off-by: Kay Yan <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]> Signed-off-by: usize <[email protected]>

usize and others added 8 commits October 8, 2025 10:23

Fix lint errors: unused variable in tests.

38257b8

Signed-off-by: usize <[email protected]>

updating release issue-template (llm-d#361)

f2810f1

Signed-off-by: Kellen Swain <[email protected]> Signed-off-by: usize <[email protected]>

Break up long scoring function.

2417f49

Signed-off-by: usize <[email protected]>

Remove unnecessary mutex (LRU cache is thread safe)

7ca12fd

Signed-off-by: usize <[email protected]>

Explicitly test helper functions for division by zero guards.

0d3e709

Signed-off-by: usize <[email protected]>

Update architecture documentation to include NoHitLRU scoring.

673d592

Signed-off-by: usize <[email protected]>

usize force-pushed the lru-scorer branch from d2f86cd to b79e716 Compare October 8, 2025 17:28

Merge branch 'main' into lru-scorer

0eb81ca

vMaroon enabled auto-merge (squash) October 8, 2025 22:45

vMaroon merged commit 12aa104 into llm-d:main Oct 8, 2025
6 checks passed

usize deleted the lru-scorer branch October 8, 2025 23:03

vMaroon mentioned this pull request Oct 13, 2025

Enhancement for Load Balancing During High-Concurrency Scenarios in EPP kubernetes-sigs/gateway-api-inference-extension#1700

Open

feat: Add a scoring plugin to distribute new groups evenly #357

feat: Add a scoring plugin to distribute new groups evenly #357

Uh oh!

Conversation

usize commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vMaroon left a comment

Choose a reason for hiding this comment

Uh oh!

vMaroon Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

vMaroon Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

usize Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

vMaroon Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

vMaroon Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

usize Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

vMaroon commented Oct 6, 2025

Uh oh!

usize commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vMaroon Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

vMaroon commented Oct 8, 2025

Uh oh!

vMaroon commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

usize commented Sep 22, 2025 •

edited

Loading

usize commented Oct 6, 2025 •

edited

Loading