Remove unused kwargs from model definitions #13555

hmellor · 2025-02-19T12:45:26Z

Follow up for #11967 which removes kv_cache and attn_metadata from all model definitions.

Summary of changes:

Remove kv_cache and attn_metadata from Attention.forward() args
Use forward context instead of attn_metadata arg in MambaMixer and MambaMixer2
Remove kv_caches, kv_cache and attn_metadata from forward() args of all model modules
Remove kv_caches and attn_metadata from new model docs
Leave kv_caches arg (but try not to use it) in all child classes of ModelRunnerBase.execute_model() to avoid further complication

Signed-off-by: Harry Mellor <[email protected]>

github-actions · 2025-02-19T12:45:38Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Harry Mellor <[email protected]>

DarkLight1337 · 2025-02-19T14:31:41Z

Can you fix the failure in helm chart CI?

Signed-off-by: Harry Mellor <[email protected]>

hmellor · 2025-02-19T15:25:10Z

Can you fix the failure in helm chart CI?

This failure is likely due to the CPU runner not being updated correctly. If I can fix the CPU model runner in buildkite then this workflow should also pass.

Signed-off-by: Harry Mellor <[email protected]>

hmellor · 2025-02-21T21:24:23Z

The above commit should stop errors coming from Attention.forward.

I'm not sure what to do about CustomModelForCausalLM.forward. I could add some code to inspect the function signature in all the model runners and pass None to kv_caches and attn_metadata if needed?

youkaichao · 2025-02-22T03:26:03Z

@youkaichao will this break out of tree models?

I think it's fine to ask out-of-tree models to update to new vLLM versions. We cannot let out-of-tree models to slow down the development. In fact we can remove these code several months ago, and I think we have waited for long enough.

youkaichao

agree on the direction, thanks for cleaning up the code!

vllm/v1/worker/tpu_model_runner.py

This reverts commit 5d84b99. Signed-off-by: Harry Mellor <[email protected]>

docs/source/contributing/model/basic.md

hmellor · 2025-02-22T09:17:25Z

Reverting the deprecation of kv_cache and attn_metadata from Attention.forward because it was only a half measure that does not fully solve the problem of breaking out of tree models.

This way, it is unambiguous that we do not expect these arguments in Model.forward or Attention.forward.

heheda12345

I think it won't be difficult for out-of-tree models to remove these kwargs. One thing that is already broken is models with kv cache sharing (multiple attention layers using the same kv cache). It exists in open source models but not in vllm repo. We need to design the interface for it after we get the first pr for kv cache sharing model.

I think we need to tell the users how to debug kv cache related problems. These two variables are accessed frequently during debugging the models.

heheda12345 · 2025-02-23T11:54:18Z

vllm/attention/layer.py

    ) -> torch.Tensor:
-        # NOTE: please avoid accessing `kv_cache` and `attn_metadata` arguments
-        # directly, use `self.kv_cache` and
-        # `get_forward_context().attn_metadata` instead.


Not necessary in this pr, but can we update these notes to help people print kv cache & attn_metadata to debug kv-cache related problems?

Signed-off-by: Harry Mellor <[email protected]>

### What this PR does / why we need it? The arg list of `Attention.forward()` is changed by vllm-project/vllm#13555. The unused args `kv_caches` and `attn_metadata` are removed. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. Signed-off-by: MengqingCao <[email protected]>

Signed-off-by: Louis Ulmer <[email protected]>

) ### What this PR does / why we need it? The arg list of `Attention.forward()` is changed by vllm-project/vllm#13555. The unused args `kv_caches` and `attn_metadata` are removed. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. Signed-off-by: MengqingCao <[email protected]>

hmellor added 6 commits February 19, 2025 13:28

Remove kv_cache and attn_metadata from Attention

28c7f27

Signed-off-by: Harry Mellor <[email protected]>

Remove attn_metadata from MambaMixer 1 & 2

1fe2b0d

Signed-off-by: Harry Mellor <[email protected]>

Remove kv_caches and attn_metadata from forward call

153d253

Signed-off-by: Harry Mellor <[email protected]>

Remove kv_caches and attn_metadata from new model docs

eb30940

Signed-off-by: Harry Mellor <[email protected]>

Remove kv_caches and attn_metadata from model interface

7a75753

Signed-off-by: Harry Mellor <[email protected]>

Remove args from a batch of models

7ddfd1f

Signed-off-by: Harry Mellor <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Feb 19, 2025

Remove args from another batch of models

f8794e9

Signed-off-by: Harry Mellor <[email protected]>

hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 19, 2025

hmellor marked this pull request as ready for review February 19, 2025 13:35

hmellor added 2 commits February 19, 2025 14:38

Remove attn_metadata from a couple more places

f81cad0

Signed-off-by: Harry Mellor <[email protected]>

Attempt fix HPU model runner

6beb1b1

Signed-off-by: Harry Mellor <[email protected]>

hmellor added 2 commits February 19, 2025 15:49

Update CPU model runners

c784070

Signed-off-by: Harry Mellor <[email protected]>

Update V1 GPU model runner

72450ae

Signed-off-by: Harry Mellor <[email protected]>

hmellor requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners February 19, 2025 15:00

mergify bot added the v1 label Feb 19, 2025

DarkLight1337 requested a review from youkaichao February 19, 2025 15:05

hmellor added 3 commits February 19, 2025 16:08

Update draft model runner

fdda9c6

Signed-off-by: Harry Mellor <[email protected]>

Update enc dec model runner

f9a1ee8

Signed-off-by: Harry Mellor <[email protected]>

Update remaining non-device model runners

b91538a

Signed-off-by: Harry Mellor <[email protected]>

mergify bot added the speculative-decoding label Feb 19, 2025

Allow kv_caches to be passed to execute_model

59f01be

Signed-off-by: Harry Mellor <[email protected]>

Deprecate args in Attention.forward instead

5d84b99

Signed-off-by: Harry Mellor <[email protected]>

youkaichao approved these changes Feb 22, 2025

View reviewed changes

youkaichao reviewed Feb 22, 2025

View reviewed changes

vllm/v1/worker/tpu_model_runner.py Show resolved Hide resolved

Revert "Deprecate args in Attention.forward instead"

8925e30

This reverts commit 5d84b99. Signed-off-by: Harry Mellor <[email protected]>

youkaichao reviewed Feb 22, 2025

View reviewed changes

docs/source/contributing/model/basic.md Show resolved Hide resolved

heheda12345 approved these changes Feb 23, 2025

View reviewed changes

hmellor added 2 commits February 24, 2025 11:34

Merge branch 'main' into remove-unused-attn-args

b7ec2d9

Fix mllama KV cache access

a775d1c

Signed-off-by: Harry Mellor <[email protected]>

simon-mo merged commit cdc1fa1 into vllm-project:main Feb 25, 2025
52 of 54 checks passed

LucasWilkinson mentioned this pull request Feb 25, 2025

[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" #13802

Merged

MengqingCao mentioned this pull request Feb 25, 2025

[ModelRunner] remove unused args (follow vllm changes) vllm-project/vllm-ascend#159

Merged

hmellor deleted the remove-unused-attn-args branch February 25, 2025 09:20

hmellor mentioned this pull request Feb 25, 2025

Fix failing MyGemma2Embedding test #13820

Merged

hmellor mentioned this pull request Feb 26, 2025

[Model] GPTBigCodeForEmbedding supporting token span classification #13684

Closed

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

Remove unused kwargs from model definitions (vllm-project#13555)

ae7df05

DarkLight1337 mentioned this pull request Mar 5, 2025

[Bug]: Can't run offline inference (example script) in OpenVINO: TypeError: OpenVINOCausalLM.forward() missing 2 required positional arguments: 'kv_caches' and 'attn_metadata' #14205

Closed

1 task

hmellor mentioned this pull request Mar 5, 2025

Fix missing kv_caches and attn_metadata in OpenVINOCausalLM #14271

Merged

DarkLight1337 mentioned this pull request Mar 21, 2025

[Model] Add Qwen3 and Qwen3MoE #15289

Merged

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

Remove unused kwargs from model definitions (vllm-project#13555)

9cd7af8

Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

DarkLight1337 mentioned this pull request Apr 21, 2025

[Model] support MiniMax-VL-01 model #16328

Merged

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

Remove unused kwargs from model definitions (vllm-project#13555)

894765f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove unused kwargs from model definitions #13555

Remove unused kwargs from model definitions #13555

Uh oh!

hmellor commented Feb 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 19, 2025

Uh oh!

DarkLight1337 commented Feb 19, 2025

Uh oh!

hmellor commented Feb 19, 2025

Uh oh!

hmellor commented Feb 21, 2025

Uh oh!

youkaichao commented Feb 22, 2025

Uh oh!

youkaichao left a comment

Uh oh!

Uh oh!

Uh oh!

hmellor commented Feb 22, 2025

Uh oh!

heheda12345 left a comment

Uh oh!

heheda12345 Feb 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Remove unused kwargs from model definitions #13555

Remove unused kwargs from model definitions #13555

Uh oh!

Conversation

hmellor commented Feb 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2025

Uh oh!

DarkLight1337 commented Feb 19, 2025

Uh oh!

hmellor commented Feb 19, 2025

Uh oh!

hmellor commented Feb 21, 2025

Uh oh!

youkaichao commented Feb 22, 2025

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hmellor commented Feb 22, 2025

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

heheda12345 Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hmellor commented Feb 19, 2025 •

edited by github-actions bot

Loading