Skip to content

Conversation

zou3519
Copy link
Collaborator

@zou3519 zou3519 commented Oct 15, 2025

In PyTorch 2.9, torch.compile has a bug where the graph partition is not taken into account during caching. Because vLLM's Mode.VLLM_COMPILE is the only mode that uses Inductor graph partition, and VLLM_COMPILE implies there is a PostGradPassManager, we put the list of operators to graph partition into the PostGradPassManager's uuid (which then gets incorporated into Inductor's FX graph cache key). Remove this hack whenever torch.compile fixes it.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround for a torch.compile caching bug related to graph partitioning by including the list of splitting operators in the PostGradPassManager's UUID. However, I've found a critical issue where the list of splitting operators is not correctly assigned, which means the workaround is currently ineffective. My review includes a specific code suggestion to fix this.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused; do we want to treat no-inductor-partition and inductor-partition-with-empty-splitting-ops differently or not?

# Remove this hack whenever torch.compile fixes it.
self.splitting_ops = None
if config.compilation_config.use_inductor_graph_partition:
if config.compilation_config.splitting_ops is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that we want empty splitting ops with inductor partition to behave differently than any splitting ops without inductor partition?

state["passes"].append(self.fix_functionalization.uuid())

# See [HACK: Bug with Inductor graph partition and torch.compile cache]
if self.splitting_ops is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, in both cases we will end up with an empty list

@zou3519 zou3519 force-pushed the fix_partition_cache branch from 19ca497 to e12bbdd Compare October 15, 2025 23:58
@zou3519
Copy link
Collaborator Author

zou3519 commented Oct 16, 2025

@ProExpertProg I updated the code to be clearer, if that helps. We want to add "the operators that we ask inductor to split" as a part of the cache key. If inductor_graph_partition is False, that is no operators, if inductor_graph_partition is True, then that is whatever compilation_config.splitting_ops is.

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 could you update the test in test_toy_llama that currently disables FX cache to work around this? Then I can add this PR to the inductor partition CI PR

In PyTorch 2.9, torch.compile has a bug where the graph
partition is not taken into account during caching.
Because vLLM's Mode.VLLM_COMPILE is the only mode that uses
Inductor graph partition, and VLLM_COMPILE implies there
is a PostGradPassManager, we put the list of operators to graph
partition into the PostGradPassManager's uuid (which
then gets incorporated into Inductor's FX graph cache key).
Remove this hack whenever torch.compile fixes it.

Signed-off-by: Richard Zou <[email protected]>
@zou3519 zou3519 force-pushed the fix_partition_cache branch from e12bbdd to ad717d4 Compare October 16, 2025 00:25
@zou3519
Copy link
Collaborator Author

zou3519 commented Oct 16, 2025

could you update the test in test_toy_llama that currently disables FX cache to work around this? Then I can add this PR to the inductor partition CI PR

Yup, updated

@mergify mergify bot added the llama Related to Llama models label Oct 16, 2025
@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx

ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
commit ad717d4
Author: Richard Zou <[email protected]>
Date:   Wed Oct 15 16:29:49 2025 -0700

    [BugFix] Work around graph partition x torch.compile cache issue

    In PyTorch 2.9, torch.compile has a bug where the graph
    partition is not taken into account during caching.
    Because vLLM's Mode.VLLM_COMPILE is the only mode that uses
    Inductor graph partition, and VLLM_COMPILE implies there
    is a PostGradPassManager, we put the list of operators to graph
    partition into the PostGradPassManager's uuid (which
    then gets incorporated into Inductor's FX graph cache key).
    Remove this hack whenever torch.compile fixes it.

    Signed-off-by: Richard Zou <[email protected]>

Signed-off-by: ProExpertProg <[email protected]>
@vllm-bot vllm-bot merged commit 9b6504c into vllm-project:main Oct 16, 2025
45 of 48 checks passed
mandy-li pushed a commit to mandy-li/vllm that referenced this pull request Oct 16, 2025
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 16, 2025
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants