Skip to content

Conversation

@jamesjwu
Copy link
Contributor

@jamesjwu jamesjwu commented May 12, 2025

Stack from ghstack (oldest at bottom):

Async compile workers don't respect inductor configs generally that get changed in the middle of execution because they warm up early. StaticCudaLauncher is especially susceptible to this because it affects triton compilation without being part of the inductor meta. So we'll pass it in via extra configs on each worker run.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented May 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153382

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit 7124256 with merge base dc47295 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jamesjwu added a commit that referenced this pull request May 12, 2025
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 12, 2025
@jamesjwu jamesjwu requested a review from masnesral May 12, 2025 15:57
@jamesjwu jamesjwu marked this pull request as ready for review May 12, 2025 15:58
# We can release this memory in the compile subprocesses:
linecache.clearcache()
return kernel, elapsed_ns // 1000
import torch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masnesral , is it okay to import torch here? I figure load_kernel, precompile etc. will all import torch anyway so there should be no difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we already import torch in the subproc: https://fburl.com/code/0ncpgxr5. In the future, it might be nice to not import all of torch though because it really slows down the worker startup. So nit: It's fine if we change to just import config more narrowly, right? IIUC that still imports all of torch so it's effectively the same. But Jason was previously playing with some hack where he'd mock out torch such that it's possible to import torch.foo.bar without importing all of torch. Maybe we'd try something like that again in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I wasn't aware that there was a difference (I had assumed that importing foo.bar.baz was equivalent to first importing foo then foo.bar then foo.bar.baz), but yes, I can change!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. I think it is the same. I meant that in the future I can imagine us doing some shenanigans with torch imports in the compile subprocess specifically, and mocking foo such that foo.bar.baz imports a mocked foo and the real foo.bar.baz.

@jamesjwu jamesjwu requested review from aorenste, jansel and oulgen May 12, 2025 16:25
@jamesjwu jamesjwu added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category ciflow/pull labels May 12, 2025
Copy link
Contributor

@masnesral masnesral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit about making the import slightly more friendly to potential future change?

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 12, 2025
@jamesjwu
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/jamesjwu/151/head branch June 18, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants