-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Remove unused kwargs from model definitions #13555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Can you fix the failure in helm chart CI? |
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
This failure is likely due to the CPU runner not being updated correctly. If I can fix the CPU model runner in buildkite then this workflow should also pass. |
Signed-off-by: Harry Mellor <[email protected]>
The above commit should stop errors coming from I'm not sure what to do about |
I think it's fine to ask out-of-tree models to update to new vLLM versions. We cannot let out-of-tree models to slow down the development. In fact we can remove these code several months ago, and I think we have waited for long enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree on the direction, thanks for cleaning up the code!
This reverts commit 5d84b99. Signed-off-by: Harry Mellor <[email protected]>
Reverting the deprecation of This way, it is unambiguous that we do not expect these arguments in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it won't be difficult for out-of-tree models to remove these kwargs. One thing that is already broken is models with kv cache sharing (multiple attention layers using the same kv cache). It exists in open source models but not in vllm repo. We need to design the interface for it after we get the first pr for kv cache sharing model.
I think we need to tell the users how to debug kv cache related problems. These two variables are accessed frequently during debugging the models.
) -> torch.Tensor: | ||
# NOTE: please avoid accessing `kv_cache` and `attn_metadata` arguments | ||
# directly, use `self.kv_cache` and | ||
# `get_forward_context().attn_metadata` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary in this pr, but can we update these notes to help people print kv cache & attn_metadata to debug kv-cache related problems?
Signed-off-by: Harry Mellor <[email protected]>
### What this PR does / why we need it? The arg list of `Attention.forward()` is changed by vllm-project/vllm#13555. The unused args `kv_caches` and `attn_metadata` are removed. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: Louis Ulmer <[email protected]>
) ### What this PR does / why we need it? The arg list of `Attention.forward()` is changed by vllm-project/vllm#13555. The unused args `kv_caches` and `attn_metadata` are removed. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. Signed-off-by: MengqingCao <[email protected]>
Follow up for #11967 which removes
kv_cache
andattn_metadata
from all model definitions.Summary of changes:
kv_cache
andattn_metadata
fromAttention.forward()
argsattn_metadata
arg inMambaMixer
andMambaMixer2
kv_caches
,kv_cache
andattn_metadata
fromforward()
args of all model moduleskv_caches
andattn_metadata
from new model docskv_caches
arg (but try not to use it) in all child classes ofModelRunnerBase.execute_model()
to avoid further complication