Skip to content

Conversation

@ywang96
Copy link
Member

@ywang96 ywang96 commented Sep 18, 2025

Purpose

For some reason we have two flags to specify the backend/endpoint type and this can cause confusion since we're going to deprecate the benchmark scripts. This PR cleans them up.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>
@mergify mergify bot added the performance Performance-related issues label Sep 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to clean up redundant CLI flags by deprecating --backend in favor of --endpoint-type. The changes are in the right direction, but I've found a critical issue that will cause the script to crash, and another issue related to the deprecation logic that could lead to silent misconfigurations. Please see my detailed comments for suggestions on how to fix these.

@ywang96
Copy link
Member Author

ywang96 commented Sep 18, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to clean up CLI flags in vllm bench serve by deprecating --backend in favor of --endpoint-type. The changes are in the right direction, replacing usages of the old flag and adding deprecation warnings. However, I've found a critical bug in the deprecation logic that causes the user-provided --endpoint-type to be incorrectly ignored. My review includes a comment with a detailed explanation and a code suggestion to fix this issue, ensuring that flag precedence is handled correctly while maintaining backward compatibility for the default behavior.

Signed-off-by: Roger Wang <[email protected]>
@ywang96
Copy link
Member Author

ywang96 commented Sep 18, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a good job of cleaning up the command-line flags for vllm bench serve by deprecating --backend in favor of --endpoint-type. The changes are consistent and improve clarity.

I have one suggestion to make the deprecation handling more robust. Currently, if both flags are provided, --backend unconditionally overrides --endpoint-type, which could be surprising. I've proposed a change to raise an error in case of conflicting values to avoid ambiguity and prevent users from running benchmarks against an unintended endpoint.

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 18, 2025
@ywang96 ywang96 enabled auto-merge (squash) September 18, 2025 05:56
@ywang96 ywang96 requested a review from hmellor as a code owner September 18, 2025 06:27
@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 18, 2025
@ywang96
Copy link
Member Author

ywang96 commented Sep 18, 2025

On a second thought - it seems that --backend flag is rather used more often everywhere in vLLM, so I'll keep that instead.

Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
current_dt = datetime.now().strftime("%Y%m%d-%H%M%S")
result_json["date"] = current_dt
result_json["endpoint_type"] = args.endpoint_type
result_json["endpoint_type"] = args.backend
Copy link
Member Author

@ywang96 ywang96 Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "endpoint_type" key here is not modified on purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you wanna dump it to "backend" too. i guess this result is only used by @huydhn ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm not sure if this will break PyTorch's dashboard - but let me also dump it to "backend" too

@ywang96
Copy link
Member Author

ywang96 commented Sep 18, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request cleans up the command-line flags for vllm bench serve by deprecating --endpoint-type in favor of --backend. The changes are applied across documentation, tests, and the benchmark implementation. My review identifies a critical issue in the deprecation logic that could cause user-provided --endpoint-type values to be ignored. I've also suggested a refactoring to improve code clarity by renaming a variable to align with its new purpose.

type=str,
default=None,
choices=list(ASYNC_REQUEST_FUNCS.keys()),
help="'--endpoint-type' is deprecated and will be removed in v0.11.0. "
Copy link
Collaborator

@yeqcharlotte yeqcharlotte Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can throw a warning with customized action

Copy link
Member Author

@ywang96 ywang96 Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if you're okay with the current version

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the warning LGTM

@yeqcharlotte yeqcharlotte self-assigned this Sep 18, 2025
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
@ywang96 ywang96 merged commit 21da733 into vllm-project:main Sep 18, 2025
41 checks passed
debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants