-
Notifications
You must be signed in to change notification settings - Fork 31k
Add support for torch.compile dynamic shapes #30560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for torch.compile dynamic shapes #30560
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! Just left a minor comment but it looks good!
(note that afaik Transformers does not enforce line violation
Line 6 in 0ae789e
| ignore = ["C901", "E501", "E741", "F402", "F823" ] |
Could you add a test for dynamic=True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @fxmarty feel free to merge when CIs are green if it's fine with you!
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@fxmarty @ArthurZucker I didn't touch Llava. The current "Tensor-likes are not close" error shouldn't have anything to do with this PR. It should be ready to go from my end. |
|
no worries, rebasing on main should most probably fix this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging as this is quite important. Let's keep an eye on the slow tests that will be triggered!
This PR adds support for compiling models with dynamic shapes
dynamic=Trueto almost all models with SDPAttention implementations which currently do not support dynamic shapes. #30442 added support for Llama, Gemma, OLMo, & Cohere.The only model not modified is DBRX, which needs the changes from both #30070 and #30442 to add support for SDPA's Flash Attention kernel and support for dynamic shapes, as it I believe it suffers from the same training memory issues detailed in #30010.
As mentioned in #30442, moving the
is_causaldispatch logic from inline to an if statement is required to support bothfullgraph=Trueanddynamic=True.I kept the
qlen>1comments but could remove them if we want to match Llama, which doesn't have it.cc @ArthurZucker and @fxmarty