Fix dynamo tracing into AOTAutogradCache results in cpu tensors #155251

jamesjwu · 2025-06-05T19:23:40Z

Stack from ghstack (oldest at bottom):

-> Fix dynamo tracing into AOTAutogradCache results in cpu tensors #155251

On this line, we see that the bw_compiler that dynamo uses for AotAutograd automatically disables the backward runnable:

pytorch/torch/_dynamo/backends/common.py

Line 76 in 05dd638

reason="do not trace generated backwards pass",

This disables dynamo in the bw_compiler but also disables the runnable the compiler returns.

On a AOTAutogradCache hit, however, we never call the bw_compiler! So we don't disable dynamo properly. This only has an effect on certain cases of cpu tensors' backwards, where the backward is being done in python land, and dynamo unnecessarily tries to trace through the inductor generated code. It also only matters if the backward is being accessed outside of dynamo itself (say, in a graph break in eager mode), since dynamo properly disables the forward function already.

I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] TorchDynamo attempted to trace the following frames: [
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517]   * fn /home/jjwu/test.py:9
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517]   * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517]   * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517]   * fn /home/jjwu/test.py:9
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517]   * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517]   * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] ]

This PR fixes the issue and adds a unit test showing that with or without cache hit, the frames dynamo is tracing is identical.

Fixes #154536

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

[ghstack-poisoned]

pytorch-bot · 2025-06-05T19:23:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155251

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit eacd200 with merge base d2a2bfc ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-cuda12.6-py3.10-gcc11-sm89 / test (default, 4, 5, linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bool

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3-clang12-executorch / build (gh) (#150261)
Final attempt failed. Child_process exited with error code 1
pull / unstable-linux-jammy-cuda12.6-py3.10-gcc11-sm89-xfail / build (gh)
ninja: build stopped: subcommand failed

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…pu tensors" On this line, we see that the bw_compiler that dynamo uses for AotAutograd automatically disables the backward runnable: https://github.com/pytorch/pytorch/blob/05dd638ee98b36254c84095894c36fd0e7d95544/torch/_dynamo/backends/common.py#L76 This disables dynamo in the bw_compiler but also disables the runnable the compiler returns. On a AOTAutogradCache hit, however, we never call the bw_compiler! So we don't disable dynamo properly. This only has an effect on certain cases of cpu tensors' backwards, where the backward is being done in python land, and dynamo unnecessarily tries to trace through the inductor generated code. It also only matters if the backward is being accessed outside of dynamo itself (say, in a graph break in eager mode), since dynamo properly disables the forward function already. ``` I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] TorchDynamo attempted to trace the following frames: [ I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] ] ``` This PR fixes the issue and adds a unit test showing that with or without cache hit, the frames dynamo is tracing is identical. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

jamesjwu · 2025-06-05T19:30:26Z

torch/_functorch/_aot_autograd/autograd_cache.py

    def _is_backward(self) -> bool:
        return True

+    def post_compile(


I wanted to put this in GenericCompiledBackward, but the post_compile its referencing in super() needs to refer to FxGraphCacheLoadable's post_compile, so I need to copy it twice for Bundled vs. non Bundled. Multiple inheritance is confusing 😅

bdhirsh · 2025-06-05T19:31:10Z

torch/_functorch/_aot_autograd/autograd_cache.py

+        compiled_bw = super().post_compile(result, fx_config)
+        # This is done by _wrapped_bw_compiler in torch/_dynamo/backends/common.py
+        # But since on cache hit we do not call the bw_compiler, we need to reapply the disable
+        return torch._dynamo.disable(compiled_bw, reason="do not trace generated backwards pass")  # type: ignore[return-value]


nit: one convention in the codebase I've found helpful when there are two different areas of the codebase that need to be kept in sync is something like:

<file 1> # Note [Wrapping bw_compiler in disable] # <comment explaining why we wrap in disable <file 2> # See Note [Wrapping bw_compiler in disable]

That way if e.g. someone wants to tweak the disable behavior in the future, they can easily grep for the note name and find any other code locations that need to be kept in sync (random example where we do it here: https://github.com/pytorch/pytorch/blob/main/c10/core/DispatchKey.h#L133C16-L133C49)

bdhirsh

nice!

…nsors" On this line, we see that the bw_compiler that dynamo uses for AotAutograd automatically disables the backward runnable: https://github.com/pytorch/pytorch/blob/05dd638ee98b36254c84095894c36fd0e7d95544/torch/_dynamo/backends/common.py#L76 This disables dynamo in the bw_compiler but also disables the runnable the compiler returns. On a AOTAutogradCache hit, however, we never call the bw_compiler! So we don't disable dynamo properly. This only has an effect on certain cases of cpu tensors' backwards, where the backward is being done in python land, and dynamo unnecessarily tries to trace through the inductor generated code. It also only matters if the backward is being accessed outside of dynamo itself (say, in a graph break in eager mode), since dynamo properly disables the forward function already. ``` I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] TorchDynamo attempted to trace the following frames: [ I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] ] ``` This PR fixes the issue and adds a unit test showing that with or without cache hit, the frames dynamo is tracing is identical. Fixes #154536 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

…tensors" On this line, we see that the bw_compiler that dynamo uses for AotAutograd automatically disables the backward runnable: https://github.com/pytorch/pytorch/blob/05dd638ee98b36254c84095894c36fd0e7d95544/torch/_dynamo/backends/common.py#L76 This disables dynamo in the bw_compiler but also disables the runnable the compiler returns. On a AOTAutogradCache hit, however, we never call the bw_compiler! So we don't disable dynamo properly. This only has an effect on certain cases of cpu tensors' backwards, where the backward is being done in python land, and dynamo unnecessarily tries to trace through the inductor generated code. It also only matters if the backward is being accessed outside of dynamo itself (say, in a graph break in eager mode), since dynamo properly disables the forward function already. ``` I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] TorchDynamo attempted to trace the following frames: [ I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] ] ``` This PR fixes the issue and adds a unit test showing that with or without cache hit, the frames dynamo is tracing is identical. Fixes #154536 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

ghstack-source-id: 767701a Pull Request resolved: #155251

…nsors" On this line, we see that the bw_compiler that dynamo uses for AotAutograd automatically disables the backward runnable: https://github.com/pytorch/pytorch/blob/05dd638ee98b36254c84095894c36fd0e7d95544/torch/_dynamo/backends/common.py#L76 This disables dynamo in the bw_compiler but also disables the runnable the compiler returns. On a AOTAutogradCache hit, however, we never call the bw_compiler! So we don't disable dynamo properly. This only has an effect on certain cases of cpu tensors' backwards, where the backward is being done in python land, and dynamo unnecessarily tries to trace through the inductor generated code. It also only matters if the backward is being accessed outside of dynamo itself (say, in a graph break in eager mode), since dynamo properly disables the forward function already. ``` I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] TorchDynamo attempted to trace the following frames: [ I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35 I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] ] ``` This PR fixes the issue and adds a unit test showing that with or without cache hit, the frames dynamo is tracing is identical. Fixes #154536 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

ghstack-source-id: 57ec57c Pull Request resolved: #155251

jamesjwu · 2025-06-06T21:23:50Z

Rebased, let's see if I can get clean tests

jamesjwu · 2025-06-09T01:58:50Z

@pytorchbot merge

pytorchmergebot · 2025-06-09T02:00:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fix dynamo tracing into AOTAutogradCache results

0dcc80e

[ghstack-poisoned]

jamesjwu requested a review from bdhirsh as a code owner June 5, 2025 19:23

pytorch-bot bot added ciflow/inductor module: dynamo labels Jun 5, 2025

jamesjwu added the topic: not user facing topic category label Jun 5, 2025

jamesjwu requested review from anijain2305 and oulgen June 5, 2025 19:27

jamesjwu changed the title ~~Fix dynamo tracing into AOTAutogradCache results~~ Fix dynamo tracing into AOTAutogradCache results in cpu tensors Jun 5, 2025

jamesjwu mentioned this pull request Jun 5, 2025

autograd.grad in compiled function can't run with code cache hit #154536

Closed

jamesjwu commented Jun 5, 2025

View reviewed changes

bdhirsh reviewed Jun 5, 2025

View reviewed changes

bdhirsh approved these changes Jun 5, 2025

View reviewed changes

jamesjwu added a commit that referenced this pull request Jun 5, 2025

Fix dynamo tracing into AOTAutogradCache results

7293146

ghstack-source-id: 767701a Pull Request resolved: #155251

anijain2305 approved these changes Jun 5, 2025

View reviewed changes

jamesjwu added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 6, 2025

jamesjwu added a commit that referenced this pull request Jun 6, 2025

Fix dynamo tracing into AOTAutogradCache results

742aa69

ghstack-source-id: 57ec57c Pull Request resolved: #155251

pytorchmergebot added the merging label Jun 9, 2025

pytorchmergebot added the Merged label Jun 9, 2025

pytorchmergebot closed this in be2ad70 Jun 9, 2025

pytorchmergebot removed the merging label Jun 9, 2025

github-actions bot deleted the gh/jamesjwu/162/head branch July 9, 2025 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix dynamo tracing into AOTAutogradCache results in cpu tensors #155251

Fix dynamo tracing into AOTAutogradCache results in cpu tensors #155251

Uh oh!

jamesjwu commented Jun 5, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 5, 2025 •

edited

Loading

Uh oh!

jamesjwu Jun 5, 2025

Uh oh!

bdhirsh Jun 5, 2025

Uh oh!

bdhirsh left a comment

Uh oh!

jamesjwu commented Jun 6, 2025

Uh oh!

jamesjwu commented Jun 9, 2025

Uh oh!

pytorchmergebot commented Jun 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix dynamo tracing into AOTAutogradCache results in cpu tensors #155251

Fix dynamo tracing into AOTAutogradCache results in cpu tensors #155251

Uh oh!

Conversation

jamesjwu commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155251

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

jamesjwu Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

bdhirsh Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

jamesjwu commented Jun 6, 2025

Uh oh!

jamesjwu commented Jun 9, 2025

Uh oh!

pytorchmergebot commented Jun 9, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jamesjwu commented Jun 5, 2025 •

edited

Loading

pytorch-bot bot commented Jun 5, 2025 •

edited

Loading