-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[serve.llm] Prefix-aware scheduler [2/N] Configure PrefixAwareReplicaScheduler as default scheduler in LLMServer #52725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve.llm] Prefix-aware scheduler [2/N] Configure PrefixAwareReplicaScheduler as default scheduler in LLMServer #52725
Conversation
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
…xt directly Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
…eploymentConfig currently not working Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
python/ray/llm/_internal/serve/deployments/routers/prefix_tree_deployment.py
Outdated
Show resolved
Hide resolved
python/ray/serve/_private/replica_scheduler/llm_pow_2_scheduler.py
Outdated
Show resolved
Hide resolved
python/ray/serve/_private/replica_scheduler/old_prefix_aware_scheduler.py
Outdated
Show resolved
Hide resolved
python/ray/serve/_private/replica_scheduler/prefix_aware_scheduler.py
Outdated
Show resolved
Hide resolved
python/ray/serve/_private/replica_scheduler/prefix_aware_scheduler.py
Outdated
Show resolved
Hide resolved
python/ray/llm/_internal/serve/deployments/routers/prefix_tree_deployment.py
Outdated
Show resolved
Hide resolved
|
This was also left from our discussion. For v0 we need some interface + example code like this (It doesn't have to work with yaml build pattern): from ray import serve
from ray.serve.llm import LLMConfig, LLMServer, LLMRouter
from ray.serve.router import PrefixTreeDeployment
from ray.serve.replica_scheduler import PrefixAwareReplicaScheduler
llm_config = LLMConfig(
model_loading_config=dict(
model_id="qwen-0.5b",
model_source="Qwen/Qwen2.5-0.5B-Instruct",
),
deployment_config=dict(
autoscaling_config=dict(
min_replicas=1, max_replicas=2,
)
),
accelerator_type="A10G",
)
tree_deployement = PrefixTreeDeployment.bind()
# TODO: Some how make tree_deployment appear when you do 'serve.get_deployment_handle("xyz")`.
# Deploy the application
deployment = LLMServer.as_deployment(llm_config.get_serve_options(name_prefix="vLLM:")).bind(llm_config)
deployment = deployment.options(replica_scheduler_class=PrefixAwareReplicaScheduler)
llm_app = LLMRouter.as_deployment().bind(llm_deplyments=[deployment], tree_deployment=tree_deployement)
serve.run(llm_app, blocking=True) |
…the deployment config Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
…ca-scheduler-benchmarks Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
… in LLM Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
|
Benchmark scripts moved to https://github.com/anyscale/serve-llm-replica-scheduler-benchmarks |
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
|
To change from the default prefix aware request router looks something like this: |
Signed-off-by: Seiji Eicher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one major comment about not making this request router the default. For the rest of the stuff we can merge as is and come back to it during next iterations.
python/ray/llm/_internal/serve/request_router/prefix_aware/prefix_tree.py
Show resolved
Hide resolved
python/ray/llm/_internal/serve/request_router/prefix_aware/prefix_tree.py
Show resolved
Hide resolved
| if count == min_count | ||
| ] | ||
|
|
||
| def start_eviction_loop( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more like a background thread. (event loop should not be kept busy because of eviction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one major comment about not making this request router the default. For the rest of the stuff we can merge as is and come back to it during next iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one major comment about not making this request router the default. For the rest of the stuff we can merge as is and come back to it during next iterations.
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>




WIP
P0:
P1:
Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.