-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[BugFix] Work around graph partition x torch.compile cache issue #26956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a workaround for a torch.compile
caching bug related to graph partitioning by including the list of splitting operators in the PostGradPassManager
's UUID. However, I've found a critical issue where the list of splitting operators is not correctly assigned, which means the workaround is currently ineffective. My review includes a specific code suggestion to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
75d4e46
to
19ca497
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused; do we want to treat no-inductor-partition and inductor-partition-with-empty-splitting-ops differently or not?
vllm/compilation/pass_manager.py
Outdated
# Remove this hack whenever torch.compile fixes it. | ||
self.splitting_ops = None | ||
if config.compilation_config.use_inductor_graph_partition: | ||
if config.compilation_config.splitting_ops is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment that we want empty splitting ops with inductor partition to behave differently than any splitting ops without inductor partition?
vllm/compilation/pass_manager.py
Outdated
state["passes"].append(self.fix_functionalization.uuid()) | ||
|
||
# See [HACK: Bug with Inductor graph partition and torch.compile cache] | ||
if self.splitting_ops is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, in both cases we will end up with an empty list
19ca497
to
e12bbdd
Compare
@ProExpertProg I updated the code to be clearer, if that helps. We want to add "the operators that we ask inductor to split" as a part of the cache key. If inductor_graph_partition is False, that is no operators, if inductor_graph_partition is True, then that is whatever compilation_config.splitting_ops is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zou3519 could you update the test in test_toy_llama that currently disables FX cache to work around this? Then I can add this PR to the inductor partition CI PR
In PyTorch 2.9, torch.compile has a bug where the graph partition is not taken into account during caching. Because vLLM's Mode.VLLM_COMPILE is the only mode that uses Inductor graph partition, and VLLM_COMPILE implies there is a PostGradPassManager, we put the list of operators to graph partition into the PostGradPassManager's uuid (which then gets incorporated into Inductor's FX graph cache key). Remove this hack whenever torch.compile fixes it. Signed-off-by: Richard Zou <[email protected]>
e12bbdd
to
ad717d4
Compare
Yup, updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
commit ad717d4 Author: Richard Zou <[email protected]> Date: Wed Oct 15 16:29:49 2025 -0700 [BugFix] Work around graph partition x torch.compile cache issue In PyTorch 2.9, torch.compile has a bug where the graph partition is not taken into account during caching. Because vLLM's Mode.VLLM_COMPILE is the only mode that uses Inductor graph partition, and VLLM_COMPILE implies there is a PostGradPassManager, we put the list of operators to graph partition into the PostGradPassManager's uuid (which then gets incorporated into Inductor's FX graph cache key). Remove this hack whenever torch.compile fixes it. Signed-off-by: Richard Zou <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
…m-project#26956) Signed-off-by: Richard Zou <[email protected]>
…m-project#26956) Signed-off-by: Richard Zou <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>
…m-project#26956) Signed-off-by: Richard Zou <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>
In PyTorch 2.9, torch.compile has a bug where the graph partition is not taken into account during caching. Because vLLM's Mode.VLLM_COMPILE is the only mode that uses Inductor graph partition, and VLLM_COMPILE implies there is a PostGradPassManager, we put the list of operators to graph partition into the PostGradPassManager's uuid (which then gets incorporated into Inductor's FX graph cache key). Remove this hack whenever torch.compile fixes it.