Skip to content

Conversation

usize
Copy link
Contributor

@usize usize commented Sep 22, 2025

The plugin uses a LRU cache to assign higher scores to pods that have less recently handled a request matching no existing prefixes (thus, a new group). This should help even cache growth, as new groups are likely to add more new blocks to the cache than existing groups.

For simplicity, this implementation relies on state produced by the prefix-cache-scorer plugin in order to track whether a request hits or misses the cache. Because of that, prefix-cache-scorer is required for this to work properly in production.

Fixes: #346

@usize usize force-pushed the lru-scorer branch 3 times, most recently from bc08bee to 5ee44e1 Compare September 23, 2025 15:44
@vMaroon vMaroon requested a review from Copilot September 26, 2025 22:38
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new scoring plugin called "no-hit-lru-scorer" that helps distribute new groups evenly across pods by tracking which pods have least recently handled cache-miss requests. The plugin uses an LRU cache to give higher scores to pods that haven't recently served new groups, helping to balance cache growth.

Key changes:

  • Implements NoHitLRU scorer that tracks pod usage for cache-miss requests
  • Adds comprehensive test coverage for the new plugin functionality
  • Integrates the plugin into the registration system with sample configuration

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/plugins/scorer/no_hit_lru.go Core implementation of the NoHitLRU scorer plugin
pkg/plugins/scorer/no_hit_lru_test.go Comprehensive test suite covering functionality and edge cases
pkg/plugins/register.go Registers the new plugin in the plugin system
deploy/config/sim-epp-no-hit-lru.yaml Sample configuration demonstrating plugin usage

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@usize usize force-pushed the lru-scorer branch 2 times, most recently from 1a56bda to 1e04f9d Compare September 26, 2025 23:10
@usize usize requested a review from vMaroon September 26, 2025 23:15
Copy link
Member

@vMaroon vMaroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good - a few comments.

// - LRU ordering is with respect to when a pod last received a cold request.
// - Least recently used (or never used) pods get highest score (1.0)
// - Most recently used pods get lowest score (approaching 0.0)
func (s *NoHitLRU) Score(ctx context.Context, cycleState *types.CycleState, request *types.LLMRequest, pods []types.Pod) map[types.Pod]float64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might help readability if the main logical blocks in this call are separated to different functions.


// Read prefix cache state to determine if this is a cold request
// This is treated as an optimization - if the state isn't available, we assume cold request
prefixState, err := types.ReadCycleStateKey[*prefix.SchedulingContextState](cycleState, plugins.StateKey(s.prefixPluginName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker but would be great: do you think you can extend and support the same optimization for the precise-prefix-cache-scorer too? I personally am eager to use this scorer with the latter.

It is also possible that this optimization can be achieved with unequal weighting (e.g., ratio of 2:1 between prefix-cache scorers and this one) - but could also be fragile in some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Maybe I can add this in a follow up PR? I'm a bit worried about mixing concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

type NoHitLRU struct {
typedName plugins.TypedName
lruCache *lru.Cache[string, struct{}] // pod name -> dummy value (we only care about order)
mutex *sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lru.Cache package is thread-safe, the Add and Keys calls are protected by an internal lock. The use of this mutex is therefore purely for snapshot semantics: If Keys() is called, the mutex makes sure no other request will progress before the scoring. Might be good to test the effect of this locking and profile the overhead as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only with unit tests, but I did do a benchmark and added a test with multiple threads. Results support removing mutex imo. Being stale by 1ms on rare occasions shouldn't significantly break the intent behind this -- cache balancing. Here are benchmark results:

https://gist.github.com/usize/b936f67655029c05ba701218c1bfe5da

@usize usize requested a review from vMaroon October 2, 2025 18:24
@vMaroon
Copy link
Member

vMaroon commented Oct 6, 2025

@usize looks good overall, thank you for the contribution! I think we're just missing updates to the architecture doc which contains info about plugins and their configuration - then good to do.

@usize
Copy link
Contributor Author

usize commented Oct 6, 2025

@vMaroon

TYSM, I'll follow up with the "phase 2" that the task described, and also file an issue and make a patch for the the improvement mentioned here: #357 (comment) 🙏🏻

- `lruSize` (optional): The maximum number of pods to track in the LRU cache. Defaults to 1024.

**Note:** This scorer is designed to work alongside a prefix cache scorer (such as `prefix-cache-scorer` or
`precise-prefix-cache-scorer`). If no prefix cache state is available, all requests are treated as cold.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think following with a yaml example of what a configuration looks like is helpful here. Additionally, should state that when integrating with a prefix-cache scorer, the latter should be defined first in the scheduling profile.

@vMaroon
Copy link
Member

vMaroon commented Oct 8, 2025

Great work @usize, thank you!

/lgtm
/approve

@github-actions github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2025
@vMaroon
Copy link
Member

vMaroon commented Oct 8, 2025

usize and others added 9 commits October 8, 2025 10:23
When a request results in no cache hit it opens a new group.
New groups are likely to disproportiately  grow the cache where they land.
To avoid uneven cache growth, a new scorer prefers to send new groups to pods
that have least recently handled a request from a new group.

Signed-off-by: usize <[email protected]>
If no prefix cache state is available, treat all requests as cold.
This will still allow for some benefit in terms of balancing.

Signed-off-by: usize <[email protected]>
Instead of inspecting the state of the LRU cache directly, it's
possible to verify behavior via the usual scoring API.

Signed-off-by: usize <[email protected]>
The LRU Cache's size should be included in its error message.

Signed-off-by: usize <[email protected]>
usize and others added 8 commits October 8, 2025 10:23
Signed-off-by: Kellen Swain <[email protected]>
Signed-off-by: usize <[email protected]>
* bump llm-d-kv-cache-manager version (v0.3.2-rc1)

Signed-off-by: Maroon Ayoub <[email protected]>

* bump kvc mgr version

Signed-off-by: Maroon Ayoub <[email protected]>

---------

Signed-off-by: Maroon Ayoub <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
Signed-off-by: usize <[email protected]>
…llm-d#368)

Bumps the go-dependencies group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo).


Updates `github.com/onsi/ginkgo/v2` from 2.25.3 to 2.26.0
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.25.3...v2.26.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@vMaroon vMaroon enabled auto-merge (squash) October 8, 2025 22:45
@vMaroon vMaroon merged commit 12aa104 into llm-d:main Oct 8, 2025
6 checks passed
@usize usize deleted the lru-scorer branch October 8, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] LRU Pod Scheduling

4 participants