Skip to content

Conversation

yeqcharlotte
Copy link
Collaborator

@yeqcharlotte yeqcharlotte commented Jul 22, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

We are planning for a few more changes in benchmarking script. Before doing that, move existing reference in documentation, CI and examples on benchmark_.*.py so we don't have to add them in 2 places. FIX #21206.

Test Plan

sh .buildkite/scripts/run-benchmarks.sh

Rely on CI for the rest.

Test Result

It ran

(Optional) Documentation Update

@yeqcharlotte yeqcharlotte requested a review from hmellor as a code owner July 22, 2025 07:23
@mergify mergify bot added documentation Improvements or additions to documentation ci/build performance Performance-related issues tpu Related to Google TPUs labels Jul 22, 2025
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the benchmark invocation by replacing direct script calls (python benchmarks/benchmark_*.py) with the new vllm bench command-line interface. The changes are applied consistently across CI scripts, documentation, and examples. Additionally, the old benchmark scripts are now marked as deprecated, which is a good practice for a smooth transition. The changes improve maintainability by centralizing the benchmark entry points. I've reviewed the changes and found no high or critical issues.

@ywang96 ywang96 self-assigned this Jul 22, 2025
Copy link

mergify bot commented Jul 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yeqcharlotte.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even recommend just nuke this file, or some sort of tombstone, maybe too aggressive 😓

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do that as a next step. wait for 1 weekto verify i don't break CI/CD in any weird ways lol.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, one more PR is good.

@hmellor hmellor enabled auto-merge (squash) July 24, 2025 11:42
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 24, 2025
prefix_len=$(( INPUT_LEN * MIN_CACHE_HIT_PCT / 100 ))
adjusted_input_len=$(( INPUT_LEN - prefix_len ))
python3 benchmarks/benchmark_serving.py \
vllm3 bench serve \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm

Comment on lines +597 to +600
@deprecated(
"benchmark_serving.py is deprecated and will be removed in a future "
"version. Please use 'vllm bench serve' instead.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeqcharlotte you are going to break all of our benchmark scripts 😆

I didn't realize that vllm serve used different code from benchmark_serving.py -- One downside of this change is that it will require installing vllm in order to run benchmarks IIUC.

Copy link
Collaborator Author

@yeqcharlotte yeqcharlotte Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you use it? This should just throw warning at the beginning instead of crashing everything instantly.

For most single host benchmark, before running benchmark_serving.py you would probably have already run vllm serve that requires install vllm anyway?

If you are benchmarking against remote, then … this client script itself is probably not good enough anyway 😅

Potentially we can split the packaging later to allow bench script to be installed separately pip install vllm[bench] that won’t install all dependency if vllm serve?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally okay with this change: right now it's a bit weird that we have two versions of benchmark code and I do want to get rid of one as soon as possible to avoid confusion and maintenance overhead, and I think Charlotte made some good points on why we should move them into vllm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very difficult to guard the standalone scripts to prevent it from importing vllm. also, i was able to install and run the bench commands on CPU node as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlrmchlsmth given the discussion, are you ok if we merge this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very difficult to guard the standalone scripts to prevent it from importing vllm. also, i was able to install and run the bench commands on CPU node as well.

This is true. Even though benchmark_serving.py tries to support this case it's broken 😆

If you are benchmarking against remote, then … this client script itself is probably not good enough anyway 😅

@yeqcharlotte the usecase I'm thinking of is in-cluster benchmarking from a container that doesn't have the CUDA runtime or GPUs. Do you see any problems with using it in that case?

@hmellor hmellor disabled auto-merge July 24, 2025 16:07
Signed-off-by: Ye (Charlotte) Qi <[email protected]>
Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I like this move, consolidated code path.

@houseroad houseroad enabled auto-merge (squash) July 26, 2025 04:39
@vllm-bot vllm-bot merged commit e7c4f9e into vllm-project:main Jul 26, 2025
45 of 47 checks passed
liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025
HsChen-sys pushed a commit to HsChen-sys/vllm that referenced this pull request Aug 1, 2025
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
…e to vllm bench CLI (vllm-project#21355)

Signed-off-by: Ye (Charlotte) Qi <[email protected]>
Signed-off-by: x22x22 <[email protected]>
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
…e to vllm bench CLI (vllm-project#21355)

Signed-off-by: Ye (Charlotte) Qi <[email protected]>
Signed-off-by: Jinzhen Lin <[email protected]>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
…e to vllm bench CLI (vllm-project#21355)

Signed-off-by: Ye (Charlotte) Qi <[email protected]>
Signed-off-by: Paul Pak <[email protected]>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
…e to vllm bench CLI (vllm-project#21355)

Signed-off-by: Ye (Charlotte) Qi <[email protected]>
Signed-off-by: Diego-Castan <[email protected]>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Consolidate benchmark_serving.py and serve.py to avoid code duplication and usage confusions

7 participants