[V1][Core] Fix memory issue with logits & sampling #14508

ywang96 · 2025-03-09T02:30:42Z

Reopened from reverted #13776

Co-authored by @varun-sundar-rabindranath for LoRA dummy run fix.

Signed-off-by: Roger Wang <[email protected]>

github-actions · 2025-03-09T02:30:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-03-09T02:31:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-03-09T02:32:55Z

@varun-sundar-rabindranath @jeejeelee Please help take a look why this is breaking LoRA tests on V1 - thank you very much! 🙏

vllm/v1/worker/gpu_worker.py

varun-sundar-rabindranath · 2025-03-09T09:11:37Z

Have a PR #14514 that should fix this issue 🤞 @ywang96 please Cherry pick the latest commit to this branch. Thanks 🙏🏻

varun-sundar-rabindranath · 2025-03-09T09:11:37Z

Have a PR #14514 that should fix this issue 🤞 @ywang96 please Cherry pick the latest commit to this branch. Thanks 🙏🏻

Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-03-10T04:45:25Z

Have a PR #14514 that should fix this issue 🤞 @ywang96 please Cherry pick the latest commit to this branch. Thanks 🙏🏻

Tested with some of your changes locally and now the lora tested indeed passes. Thanks for the help!

Signed-off-by: Roger Wang <[email protected]>

WoosukKwon · 2025-03-10T16:41:33Z

tests/basic_correctness/test_cumem.py

+    # NOTE: In V1, the memory buffer for logits (max_num_reqs x vocab_size)
+    # is captured but cannot be releasesd from PyTorch due to a known bug,


Could you please elaborate on this?

See the discussion here https://vllm-dev.slack.com/archives/C087WBWC5AQ/p1741398800083509?thread_ts=1741386694.452939&cid=C087WBWC5AQ - TLDR is that empty_cache cannot be called when we turn on sleep mode.

Hmm... Why do we need empty_cache?

The difference here is that we never (in both V0 and V1) warmed up sampler, therefore the memory fragmentation issue was always there but not as pronounced in V0 (since the default batch size is 256).

Now we're adding the sampler warmup in V1, but when we call sleep(), the memory buffer for logits can't be cleared from the pytorch caching allocator (the bug mentioned in this comment), therefore the memory usage will be a lot higher.

@ywang96 Thanks for the explanation. Just want to double check: We don't want to call empty_cache anyways, because we intentionally reserve the (max_num_reqs x vocab_size)-sized tensor in the pytorch allocator, right?

That is correct though I do think there should be a better & clean fix for this to work with sleep mode in the long term. We should probably free the memory when sleep is called, then warm up sampler again within wakeup, but this is currently blocked since we can't free the memory anyways.

Hmm,,, How is the logits tensor different from other intermediate activation tensors?

I don't understand why this specific tensor becomes a problem.

Because dummy_run doesn't include/activate sampler tensors, this is why we made dummy_sampler_run in the first place.

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Mu Huai <[email protected]>

ywang96 and others added 14 commits February 23, 2025 01:15

update

08e1311

Signed-off-by: Roger Wang <[email protected]>

update

e93aa14

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'vllm-project:main' into fix-memory

08f85bc

add note

1ea2fa2

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'vllm-project:main' into fix-memory

53b99c3

Merge branch 'main' into fix-memory

23ab4ce

Merge branch 'vllm-project:main' into fix-memory

77946d0

remove spec decode

18b6354

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'vllm-project:main' into fix-memory

a93cb87

Merge branch 'vllm-project:main' into fix-memory

58b3a39

Merge branch 'vllm-project:main' into fix-memory

1b04868

bypass

1f688cb

Signed-off-by: Roger Wang <[email protected]>

add fixme

19e66dc

Signed-off-by: Roger Wang <[email protected]>

add try catch

1e017e4

Signed-off-by: Roger Wang <[email protected]>

ywang96 requested review from WoosukKwon, robertgshaw2-redhat, njhill, comaniac and alexm-redhat as code owners March 9, 2025 02:30

mergify bot added v1 needs-rebase labels Mar 9, 2025

mergify bot mentioned this pull request Mar 9, 2025

[V1][Core] Fix memory issue with logits & sampling #13776

Merged

Merge branch 'main' into fix-memory

c6d39a5

mergify bot removed the needs-rebase label Mar 9, 2025

add bad_words

2bc44fb

Signed-off-by: Roger Wang <[email protected]>

varun-sundar-rabindranath reviewed Mar 9, 2025

View reviewed changes

vllm/v1/worker/gpu_worker.py Outdated Show resolved Hide resolved

ywang96 and others added 2 commits March 9, 2025 17:57

Merge branch 'vllm-project:main' into fix-memory

8062d68

fix

6256bcf

Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Roger Wang <[email protected]>

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 10, 2025

Merge branch 'vllm-project:main' into fix-memory

e52d22b

ywang96 mentioned this pull request Mar 10, 2025

[Performance]: V1 higher memory usage #12529

Closed

1 task

ywang96 added 3 commits March 10, 2025 08:30

Fix capture sizes

82c20b9

Signed-off-by: Roger Wang <[email protected]>

add assert

609d2e8

Signed-off-by: Roger Wang <[email protected]>

fix

ba08848

Signed-off-by: Roger Wang <[email protected]>

varun-sundar-rabindranath mentioned this pull request Mar 10, 2025

[DO NOT MERGE] Varun/fix memory #14514

Closed

WoosukKwon reviewed Mar 10, 2025

View reviewed changes

WoosukKwon approved these changes Mar 11, 2025

View reviewed changes

Merge branch 'vllm-project:main' into fix-memory

6d4f1f9

ywang96 enabled auto-merge (squash) March 11, 2025 02:16

ywang96 merged commit 1fc973c into vllm-project:main Mar 11, 2025
35 checks passed

comaniac mentioned this pull request Mar 11, 2025

[Bugfix][V1][PP] Only warmup sampler at last PP rank #14643

Merged

ywang96 mentioned this pull request Mar 24, 2025

[Doc] Update docs on handling OOM #15357

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1][Core] Fix memory issue with logits & sampling (vllm-project#14508)

4223594

Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

		# NOTE: In V1, the memory buffer for logits (max_num_reqs x vocab_size)
		# is captured but cannot be releasesd from PyTorch due to a known bug,

Uh oh!

[V1][Core] Fix memory issue with logits & sampling #14508

[V1][Core] Fix memory issue with logits & sampling #14508

Uh oh!

Conversation

ywang96 commented Mar 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 9, 2025

Uh oh!

mergify bot commented Mar 9, 2025

Uh oh!

ywang96 commented Mar 9, 2025

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Mar 9, 2025

Uh oh!

varun-sundar-rabindranath commented Mar 9, 2025

Uh oh!

ywang96 commented Mar 10, 2025

Uh oh!

WoosukKwon Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ywang96 commented Mar 9, 2025 •

edited by github-actions bot

Loading

ywang96 Mar 10, 2025 •

edited

Loading

ywang96 Mar 10, 2025 •

edited

Loading