Skip to content

Conversation

Potabk
Copy link
Collaborator

@Potabk Potabk commented Feb 17, 2025

What this PR does / why we need it?

The purpose of this PR is to add benchmark scripts for npu, developers can easily run performance tests on their own machines with one line of code .

Does this PR introduce any user-facing change?

How was this patch tested?

@Potabk Potabk changed the title [Misc]dd benchmark scripts [Misc]Add benchmark scripts Feb 17, 2025
@Potabk Potabk changed the title [Misc]Add benchmark scripts [Misc][WIP]Add benchmark scripts Feb 17, 2025
@Potabk Potabk force-pushed the benchmarks branch 2 times, most recently from 759243e to 1493751 Compare February 26, 2025 07:40
@Potabk Potabk changed the title [Misc][WIP]Add benchmark scripts [Doc]Add benchmark scripts Feb 26, 2025
Yikun
Yikun previously requested changes Feb 27, 2025
Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, only review on tutorials.md and bechmark_latency.py.

The problem is that should we copy vllm benchmark here or just use it?

INFO 02-19 17:37:35 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1.9 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
```

## Performance Benchmark
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be developer guide

@@ -0,0 +1,193 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# This file is a part of the vllm-ascend project.
# Adapted from vllm-project/vllm/benchmarks/backend_request_func.py
# Copyright 2023 The vLLM team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

**kwargs,
)

ASYNC_REQUEST_FUNCS = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add note for the different with vLLM.

@@ -0,0 +1,152 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 21, 2025
@Potabk
Copy link
Collaborator Author

Potabk commented Mar 21, 2025

Please note that we copied the benchmark code from the vllm benchmark. This is just a stopgap measure. In the future, we will push the community to complete the benchmark cli, and then we will run benchmark like:

vllm bench <options> --parameters ...

for details see *13993

### Introduction
This document outlines the benchmarking process for vllm-ascend, designed to evaluate its performance under various workloads. The primary goal is to help developers assess whether their pull requests improve or degrade vllm-ascend's performance.To maintain consistency with the vllm community, we have reused the vllm community [benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks) script.
### Overview
**Benchmarking Coverage**: We measure latency, throughput, and fixed-QPS serving on the Atlas800I A2 (see [quick_start](./quick_start.md) to learn more supported devices list), with different models(coming soon).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./quick_start.md 404

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed at 98cfc5d

@@ -0,0 +1,54 @@
### Introduction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use #

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed at 98cfc5d

Potabk added 7 commits March 21, 2025 15:46
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
@wangxiyuan wangxiyuan merged commit 9a175ca into vllm-project:main Mar 21, 2025
5 checks passed
@Potabk Potabk deleted the benchmarks branch April 1, 2025 08:55
ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this pull request Apr 27, 2025
### What this PR does / why we need it?
The purpose of this PR is to add benchmark scripts for npu, developers
can easily run performance tests on their own machines with one line of
code .


---------

Signed-off-by: wangli <[email protected]>
Skywalker-EP pushed a commit to Skywalker-EP/vllm-ascend that referenced this pull request Jul 24, 2025
running time reduction forward_before and forward_end
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 9, 2025
running time reduction forward_before and forward_end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants