-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Pass inductor config for static cuda launcher to workers #153382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153382
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (2 Unrelated Failures)As of commit 7124256 with merge base dc47295 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| # We can release this memory in the compile subprocesses: | ||
| linecache.clearcache() | ||
| return kernel, elapsed_ns // 1000 | ||
| import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masnesral , is it okay to import torch here? I figure load_kernel, precompile etc. will all import torch anyway so there should be no difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we already import torch in the subproc: https://fburl.com/code/0ncpgxr5. In the future, it might be nice to not import all of torch though because it really slows down the worker startup. So nit: It's fine if we change to just import config more narrowly, right? IIUC that still imports all of torch so it's effectively the same. But Jason was previously playing with some hack where he'd mock out torch such that it's possible to import torch.foo.bar without importing all of torch. Maybe we'd try something like that again in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I wasn't aware that there was a difference (I had assumed that importing foo.bar.baz was equivalent to first importing foo then foo.bar then foo.bar.baz), but yes, I can change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah. I think it is the same. I meant that in the future I can imagine us doing some shenanigans with torch imports in the compile subprocess specifically, and mocking foo such that foo.bar.baz imports a mocked foo and the real foo.bar.baz.
masnesral
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit about making the import slightly more friendly to potential future change?
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Async compile workers don't respect inductor configs generally that get changed in the middle of execution because they warm up early. StaticCudaLauncher is especially susceptible to this because it affects triton compilation without being part of the inductor meta. So we'll pass it in via extra configs on each worker run.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov