Skip to content

Conversation

shmuelk
Copy link
Collaborator

@shmuelk shmuelk commented Jul 1, 2025

This PR:

  • Migrates the llm-d-inference-scheduler to use the new GIE text based configuration capabilities.
  • Removes the code that created the configuration based on environment variables.
  • Provides sample configurations based on the old development configurations.
  • Documents how to configure the system using the new text based configuration capabilities

This PR completes issue #201

@shmuelk shmuelk mentioned this pull request Jul 1, 2025
@nirrozenbaum nirrozenbaum added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 1, 2025
@shmuelk shmuelk changed the title [WIP] Migrate the llm-d-inference-scheduler's configuration to the new text based configuration Migrate the llm-d-inference-scheduler's configuration to the new text based configuration Jul 2, 2025
@shmuelk shmuelk removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2025
hashBlockSize: 5
maxPrefixBlocksToMatch: 256
lruCapacityPerServer: 31250
- name: decodeFilter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, remove the name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
- name: prefixScorer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the name and rely on the default value just like the other plugins at the end of this config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

mountPath: /etc/epp
volumes:
- name: epp-config
configMap:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we place common canned configurations in the container image just so we don't have to create a configmap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these are for development scripts, I think it's more important to make it easy to use other configurations as the developer desires.

kind: EndpointPickerConfig
plugins:
- type: single-profile
- name: decodeFilter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, this looks redundant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```
The fields in a plugin entry are:
- *name* which is optional, provides a name by which the plugin instance can be referenced. If this field is
omitted, the plugin's type wil be used as its name.<br>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
omitted, the plugin's type wil be used as its name.<br>
omitted, the plugin's type will be used as its name.<br>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 142 to 143
- name: prefixScorer
type: prefix-cache
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we converge on one naming convention, I like using pascal case (e.g., PrefixScorer).

Also, I think the cases where a name needs to be set will be limited, so I recommend to not set it unless required to not give the impression that this is required (which looks strange and redundant at first glance).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a discussion issue in GIE: kubernetes-sigs/gateway-api-inference-extension#1086

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there should be a naming convention for plugin types, I don't think that applies as much for plugin names, when used. In particular I made the plugin names, which have been since removed, camel case to show that they are different from the plugin types and the defaulted names.

decode = "decode"
prefill = "prefill"
defaultDecodeProfile = "decode"
defaultPßrefillProfile = "prefill"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

shmuelk added 2 commits July 3, 2025 10:03
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
specified in-line as a parameter. The configuration defines the set of plugins to be instantiated along with their parameters. Each plugin is also given a name, enabling the same plugin type to be instantiated
multiple times, if needed. Also defined is a set of SchedulingProfiles, which determine the set of
plugins to be used when scheduling a request. The set of plugins instantiated must also include a
Profile Handler, which determines which SchedulingProfiles will be used for a particular request.
Copy link
Collaborator

@nirrozenbaum nirrozenbaum Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ and how their results will be processed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +118 to +119
field is omitted, the plugin's type will be used as its name.<br>
- *type* specifies the type of the plugin to be instantiated.<br>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need <br>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't format without all of the <br> in there

Using unordered lists or blank lines make the sections much longer and harder to read

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried double space at the end of the line?
I think this should give you the same behavior, just without html tags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like trailing spaces, which are unseen and easily get lost

nirrozenbaum
nirrozenbaum previously approved these changes Jul 3, 2025
Copy link
Collaborator

@nirrozenbaum nirrozenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall PR looks very good.
I left few nits, non blocking.

verifying - was this tested e2e?

/lgtm
/approve

@shmuelk
Copy link
Collaborator Author

shmuelk commented Jul 3, 2025

Which end to end tests?

I have used make env-dev-kind to run the entire setup with P/D enabled and with P/D not enabled and the system seemed to do the right thing...

Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
@shmuelk shmuelk merged commit 31aff9d into llm-d:main Jul 3, 2025
2 checks passed
@shmuelk shmuelk deleted the no-code-config branch July 3, 2025 13:07
pierDipi pushed a commit to pierDipi/llm-d-inference-scheduler that referenced this pull request Oct 14, 2025
Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants