AOT Compilation for torch.compile (Bundled) #24274

zhxchen17 · 2025-09-04T19:35:45Z

This PR consists of two parts/commits, to make it easier to review the overall change being added from aot precompile.

The first commit add a new option VLLM_USE_AOT_COMPILE which will make torch.compile wrapper to always use torch.compile().aot_compile(inputs) in the first run, and reuse the aot compiled function in subsequent runs.
The second commit provides a basic implementation of VLLM's custom compiler backend to be plugged into aot compilation serialization. We are not introducing new compilation behavior here because we simply stores the dynamo graph and example inputs, on load we just rerun vllm backend again in hope the compilation has been cached. In the future we plan to increase the serialization coverage so that we can always store backend artifacts as part of the package.

Overall the change should be orthogonal to other changes we are doing for dynamo, aot dispatcher and inductor, because

dynamo <-> vllm backend surface is stable
aot dispatcher <-> inductor surface is internal to this PR, i.e. they can be treated as hidden in blackbox and not affecting the work in this PR.

To not interfere with the existing workflow, we create a new cache directory torch_aot_compile and stores aot compiled artifacts there, in the future torch_aot_compile should contain all the compiled artifacts but right now we require both torch_aot_compile and torch_compile_cache to be present to avoid recompilation. We plan to gradually migrate the contents into torch_aot_compile in the long term.

Purpose

Add AOT compilation workflow for torch.compile without changing the existing caching behavior.

Mechanically how it works:

We hook this into supports_torch_compile decorator layer, so that it intercepts call to model's forward function directly.
Check if we have an AOT compiled function already in memory, if so, use that.
If there's no AOT compiled function in memory, we calculate a cache key based on vllm config + model forward name, and try to load an AOT compiled function into memory. (this is a different key from the one in torch_compile_cache which has access to traced source files, but that's not present in AOT workflow)
If we don't have any AOT compiled function available to use, just kick off AOT compilation and save it to disk for future use.

Essentially, 2. and 3. should be the warm start paths, and 4. is the cold start path.

Test Plan

tests/test_aot_compile.py

Test Result

==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0
rootdir: /data/users/zhxchen17/vllm
configfile: pyproject.toml
plugins: xdoctest-1.1.0, hypothesis-5.35.1, xdist-3.3.1, subtests-0.13.1, rerunfailures-14.0, flakefinder-1.1.0, cpp-2.3.0, anyio-4.9.0
collected 3 items                                                                                                                                                                            

tests/compile/test_aot_compile.py ...                                                                                                                                                  [100%]

====================================================================================== warnings summary ======================================================================================
tests/compile/test_aot_compile.py: 12 warnings
  /data/users/zhxchen17/pytorch/torch/fx/_graph_pickler.py:124: DeprecationWarning: Pickle, copy, and deepcopy support will be removed from itertools in Python 3.14.
    pickler.dump(obj)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================== 3 passed, 12 warnings in 54.35s ===============================================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces Ahead-Of-Time (AOT) compilation for torch.compile, aiming to improve performance by caching compiled artifacts. The changes include new environment variables to control AOT compilation, a custom serializable compiled function class, and updates to the compilation decorators and wrappers. My review has identified a critical issue with cache isolation that could lead to corruption, a high-severity issue with the cache key hashing strategy that could cause stale cache hits, and a high-severity maintainability concern due to heavy reliance on internal PyTorch APIs. Addressing these points will improve the robustness and long-term stability of this new feature.

vllm/envs.py

gemini-code-assist · 2025-09-04T19:37:26Z

vllm/compilation/backends.py

+class VllmCompiledFunction(SerializableCallable):
+
+    def __init__(self, graph_module, example_inputs, vllm_config, prefix,
+                 optimized_call):
+        assert isinstance(graph_module, torch.fx.GraphModule)
+        self.graph_module = graph_module
+        self.example_inputs = example_inputs
+        self.vllm_config = vllm_config
+        self.prefix = prefix
+        self.optimized_call = optimized_call
+
+    def __call__(self, *args, **kwargs):
+        return self.optimized_call(*args, **kwargs)
+
+    @classmethod
+    def serialize_compile_artifacts(
+            cls, compiled_fn: "VllmCompiledFunction") -> bytes:
+        import sympy
+        from torch._subclasses import FakeTensorMode
+        from torch.fx._graph_pickler import GraphPickler, Options
+        state = compiled_fn.__dict__.copy()
+        state.pop("optimized_call")
+        for node in state["graph_module"].graph.nodes:
+            node.meta.pop("source_fn_stack", None)
+            node.meta.pop("nn_module_stack", None)
+
+        graph_reducer_override = GraphPickler.reducer_override
+
+        def _graph_reducer_override(self, obj):
+            if (inspect.isclass(obj) and issubclass(obj, sympy.Function)
+                    and hasattr(obj, "_torch_unpickler")):
+                return obj._torch_unpickler, (obj._torch_handler_name, )
+            if isinstance(obj, FakeTensorMode):
+                return type(None), ()
+            return graph_reducer_override(self, obj)
+
+        with patch.object(GraphPickler, 'reducer_override',
+                          _graph_reducer_override):
+            state["graph_module"] = GraphPickler.dumps(
+                state["graph_module"], Options(ops_filter=None))
+            state["example_inputs"] = GraphPickler.dumps(
+                state["example_inputs"])
+        return pickle.dumps(state)
+
+    @classmethod
+    def deserialize_compile_artifacts(cls,
+                                      data: bytes) -> "VllmCompiledFunction":
+        from torch._guards import TracingContext, tracing
+        from torch._subclasses import FakeTensorMode
+        from torch.fx._graph_pickler import GraphPickler
+        from torch.fx.experimental.symbolic_shapes import ShapeEnv
+
+        state = pickle.loads(data)
+        fake_mode = FakeTensorMode(shape_env=ShapeEnv())
+        state["graph_module"] = GraphPickler.loads(state["graph_module"],
+                                                   fake_mode)
+        state["example_inputs"] = GraphPickler.loads(state["example_inputs"],
+                                                     fake_mode)
+        vllm_backend = VllmBackend(state["vllm_config"], state["prefix"])
+        with tracing(TracingContext(fake_mode)):
+            optimized_call = vllm_backend(state["graph_module"],
+                                          state["example_inputs"])
+
+        return cls(
+            state["graph_module"],
+            state["example_inputs"],
+            state["vllm_config"],
+            state["prefix"],
+            optimized_call,
+        )


The implementation of VllmCompiledFunction relies heavily on internal PyTorch APIs (e.g., torch._dynamo.aot_compile, torch.fx._graph_pickler, torch._subclasses). While this might be necessary for this feature, it creates a significant maintenance burden. These APIs are not guaranteed to be stable and can change without notice in future PyTorch releases, which could break this functionality. It would be good to add comments explaining why each internal API is used and potentially explore ways to reduce this dependency if possible in the future.

Will be happy to add some comments to show what these APIs are. IMO the function names should be self evident, and I do consider these parts to be relatively stable in torch.

vllm/compilation/decorators.py

vadiklyutiy · 2025-09-05T03:41:48Z

@zhxchen17
Do I understand correctly that the goal is to avoid/deprecate the use of internal (and possibly unstable) torch interfaces that we currently rely on for compilation?

ilmarkov

Thank you for your work! Generally PR looks good to me. Left some minor comments.

vllm/compilation/backends.py

vllm/compilation/wrapper.py

vllm/compilation/decorators.py

ilmarkov · 2025-09-05T14:23:19Z

vllm/compilation/decorators.py

+            aot_compilation_path = os.path.join(cache_dir, "model")
+            try:
+                with open(aot_compilation_path, "rb") as f:
+                    aot_compiled_fn = torch.compiler.load_compiled_function(f)


Is there an internal verification on cuda version, hardware in torch.compile loading?

For now we haven't implemented it yet, since this feature is relatively new (why we start with opt-in). Here is a list of guards we plan to implement for loading:

torch version

python version

cuda version

hardware

traced source files

Let me know if there's anything I'm missing, thanks.

Hardware check landed on torch side pytorch/pytorch#162438

zhxchen17 · 2025-09-05T15:36:20Z

@zhxchen17 Do I understand correctly that the goal is to avoid/deprecate the use of internal (and possibly unstable) torch interfaces that we currently rely on for compilation?

@vadiklyutiy Yes, that's part of our goal. Overall we are changing the usage of torch.compile from JIT mode to AOT mode, and the major benefit will be reduced warm start time for torch.compile() (since we skip dynamo as well in the second run). One side effect of the work should be more clear and stable boundary between torch compiler and vllm's custom backend.

vllm/compilation/decorators.py

vllm/compilation/backends.py

zou3519

looks good to me overall. I need a bit of time to think about how this fits into the CompilerManager and CompilerInterface abstraction. My initial gut reaction is that this is something completely separate

vllm/compilation/backends.py

vllm/compilation/decorators.py

zou3519

Okay I think I'm fine on the structure (this PR adds something that is different from CompileManager/CompilerInterface), so mostly minor comments. We can always figure out the right abstraction for this (if it needs abstraction) later.

Mostly some questions/comments about code reuse, the cache directory structure, and what exactly aot_compile returns

vllm/compilation/wrapper.py

zhxchen17 · 2025-09-28T15:54:54Z

Benchmark result (regarding cold start vs warm start):

Test environment: Nvidia 8xB200 node, pytorch 2.10 main branch + cuda 12.9
Test script: https://gist.github.com/zhxchen17/75ad6c2576794607ee2cd2ff6e421b9e

nvidia/Llama-3.3-70B-Instruct-FP8 (TP=2)

Cold start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 62.86 s in total
Cold start (VLLM_USE_AOT_COMPILE=0): [monitor.py:32] torch.compile takes 69.35 s in total
Warm start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 10.68 s in total
Warm start (VLLM_USE_AOT_COMPILE=0): [monitor.py:34] torch.compile takes 17.61 s in total

Qwen/Qwen3-32B

Cold start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 37.62 s in total
Cold start (VLLM_USE_AOT_COMPILE=0): [monitor.py:32] torch.compile takes 37.64 s in total
Warm start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 8.09 s in total
Warm start (VLLM_USE_AOT_COMPILE=0): [monitor.py:34] torch.compile takes 10.86 s in total

deepseek-ai/DeepSeek-V3.1 (TP=8)

Cold start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 73.92 s in total
Cold start (VLLM_USE_AOT_COMPILE=0): [monitor.py:32] torch.compile takes 67.44 s in total
Warm start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 7.38 s in total
Warm start (VLLM_USE_AOT_COMPILE=0): [monitor.py:34] torch.compile takes 11.54 s in total

openai/gpt-oss-120b (TP=2)

Cold start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 33.51 s in total
Cold start (VLLM_USE_AOT_COMPILE=0): [monitor.py:32] torch.compile takes 37.27 s in total
Warm start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 3.53 s in total
Warm start (VLLM_USE_AOT_COMPILE=0): [monitor.py:34] torch.compile takes 5.73 s in total

zai-org/GLM-4.5-Air (TP=2)

Cold start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 41.60 s in total
Cold start (VLLM_USE_AOT_COMPILE=0): [monitor.py:32] torch.compile takes 45.74 s in total
Warm start (VLLM_USE_AOT_COMPILE=1): [monitor.py:34] torch.compile takes 4.69 s in total
Warm start (VLLM_USE_AOT_COMPILE=0): [monitor.py:34] torch.compile takes 7.49 s in total

zhxchen17 · 2025-09-29T14:49:03Z

Updates:

Enabled VLLM_USE_AOT_COMPILE=1 for torch>=2.10
Added some benchmark re cold start and warm start.
Rebased on main

zhxchen17 · 2025-10-06T15:48:49Z

rebased

ProExpertProg

LGTM overall!

ProExpertProg · 2025-09-29T14:58:38Z

tests/compile/test_aot_compile.py

+
+            m.setenv("VLLM_USE_AOT_COMPILE", "1")
+            torch._dynamo.reset()
+            with use_vllm_config(vllm_config), torch.compiler.set_stance(


why duplicate use_vllm_config?

nice catch. I think it's by accident and I will remove this.

ProExpertProg · 2025-09-29T14:59:32Z

tests/compile/test_aot_compile.py

+            torch._dynamo.reset()
+            with use_vllm_config(vllm_config), torch.compiler.set_stance(
+                    "fail_on_recompile"):
+                actual = CompiledMod(vllm_config=vllm_config)(*args)


Why doesn't this fail - where does the compiled code come from? Does the previous run that raised a recompile error create it? Or does it come from the cache?

I think the name of API torch.compiler.set_stance('fail_on_recompile') is the source of confusion here. Basically torch.compile() has 2 modes now: JIT and AOT. torch.compiler.set_stance('fail_on_recompile') means torch.compile() will fail when we recompile in JIT mode.

Here by setting VLLM_USE_AOT_COMPILE=1, we're testing that torch.compile() JIT mode is not triggered. We're not testing the loading behavior in this unit test yet (we'll test loading part in the following tests). In other words, we are just testing we're using the correct AOT compile API from torch.

I think it's possible to address this by naming our API to be something like set_stance("fail_on_new_cache_entry") or better, but the behavior here is just about JIT vs AOT.

tests/compile/test_aot_compile.py

ProExpertProg · 2025-10-10T13:02:19Z

vllm/compilation/caching.py

+
+
+class VllmSerializableFunction(SerializableCallable):
+    """


Can we add some more comments on what this does, not just why it's needed? Devs will navigate here from the use and should be informed that this is mostly a wrapper around graph_module so they can just skip through it if they're not interested in the serialization

ProExpertProg · 2025-10-10T13:04:27Z

vllm/compilation/caching.py

+        state["example_inputs"] = GraphPickler.loads(state["example_inputs"], fake_mode)
+        vllm_backend = VllmBackend(get_current_vllm_config(), state["prefix"])
+
+        def optimized_call(*example_inputs):


The control flow here seems a bit complex, could we add a comment or two?

zhxchen17 · 2025-10-10T16:06:01Z

Added comments and rebased.

Signed-off-by: zhxchen17 <[email protected]>

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

huydhn · 2025-10-15T17:29:39Z

vllm/envs.py

+def use_aot_compile() -> bool:
+    from vllm.utils import is_torch_equal_or_newer
+
+    default_value = "1" if is_torch_equal_or_newer("2.10.0.dev") else "0"


@zou3519 From this line, it makes sense that the logic here runs on PyTorch CI when we tests against PyTorch main branch. So, there are a couple of failures there https://github.com/pytorch/pytorch/actions/runs/18522236183/job/52791051622 blocking the vLLM commit pin update

cc @zhxchen17 these tests are failing on PyTorch main, can you take a look please?

Sure I can take a look @zou3519 @huydhn

pytorch/pytorch#165702 should partially fix the test failures on main branch. I will do a full test on test_basic_correctness and report back.

If you need to test your fix on PyTorch side, you could bump the pinned vLLM commit we have there https://github.com/pytorch/pytorch/blob/main/.github/ci_commit_pins/vllm.txt to a recent one on your PR, then add ciflow/vllm to run the tests on vLLM x PyTorch main

thanks. I found 3 issues on vllm side and working on fixing them:
#27285
#27288
#27350

Once they are merged (should be all around 1-2 lines of minor change), I will bump vllm pin to the latest and send a pytorch PR with label ciflow/vllm.

all fixes are landed for now. trying to update pin with pytorch/pytorch#166494

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: bbartels <[email protected]>

Signed-off-by: zhxchen17 <[email protected]>

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: zhxchen17 <[email protected]>

zhxchen17 requested review from ProExpertProg, youkaichao and zou3519 as code owners September 4, 2025 19:35

mergify bot added the ci/build label Sep 4, 2025

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

zhxchen17 force-pushed the zhxchen17/precompile/2 branch 2 times, most recently from a4662c8 to 2124e79 Compare September 4, 2025 20:06

ilmarkov reviewed Sep 5, 2025

View reviewed changes

zhxchen17 force-pushed the zhxchen17/precompile/2 branch 5 times, most recently from c35b8ae to c396bb7 Compare September 10, 2025 20:13

zou3519 reviewed Sep 10, 2025

View reviewed changes

vllm/compilation/decorators.py Show resolved Hide resolved

zou3519 reviewed Sep 10, 2025

View reviewed changes

vllm/compilation/decorators.py Show resolved Hide resolved

zou3519 reviewed Sep 10, 2025

View reviewed changes

vllm/compilation/decorators.py Outdated Show resolved Hide resolved

zou3519 reviewed Sep 10, 2025

View reviewed changes

vllm/compilation/decorators.py Show resolved Hide resolved

zou3519 reviewed Sep 10, 2025

View reviewed changes

vllm/compilation/backends.py Outdated Show resolved Hide resolved

zou3519 reviewed Sep 11, 2025

View reviewed changes

vllm/compilation/backends.py Outdated Show resolved Hide resolved

zou3519 reviewed Sep 11, 2025

View reviewed changes

vllm/compilation/backends.py Outdated Show resolved Hide resolved

zou3519 reviewed Sep 11, 2025

View reviewed changes

vllm/compilation/decorators.py Show resolved Hide resolved

zou3519 reviewed Sep 11, 2025

View reviewed changes

vllm/compilation/decorators.py Show resolved Hide resolved

zou3519 reviewed Sep 11, 2025

View reviewed changes

zhxchen17 requested a review from zou3519 September 11, 2025 17:06

zhxchen17 force-pushed the zhxchen17/precompile/2 branch from c396bb7 to beccd65 Compare September 11, 2025 21:22

zhxchen17 commented Sep 12, 2025

View reviewed changes

vllm/compilation/wrapper.py Outdated Show resolved Hide resolved

zhxchen17 force-pushed the zhxchen17/precompile/2 branch from beccd65 to 73b971e Compare September 15, 2025 19:47

zhxchen17 force-pushed the zhxchen17/precompile/2 branch from 6d27095 to 955e518 Compare October 6, 2025 15:45

zhxchen17 mentioned this pull request Oct 7, 2025

[Bug]: Launching multiple vLLM processes at the same time doesn't work well with vLLM's compile cache #24601

Open

1 task

ProExpertProg approved these changes Oct 10, 2025

View reviewed changes

zhxchen17 force-pushed the zhxchen17/precompile/2 branch from 955e518 to de876f4 Compare October 10, 2025 16:05

zou3519 approved these changes Oct 10, 2025

View reviewed changes

zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 10, 2025

zhxchen17 force-pushed the zhxchen17/precompile/2 branch from de876f4 to 7935089 Compare October 10, 2025 16:35

zou3519 enabled auto-merge (squash) October 10, 2025 16:43

auto-merge was automatically disabled October 10, 2025 18:49
Head branch was pushed to by a user without write access

zhxchen17 force-pushed the zhxchen17/precompile/2 branch from 31bca0d to df40fe6 Compare October 10, 2025 18:49

zhxchen17 added 2 commits October 10, 2025 12:14

AOT compilation workflow [1/n]

208af72

Signed-off-by: zhxchen17 <[email protected]>

AOT compilation workflow [2/n]

aba7a85

Signed-off-by: zhxchen17 <[email protected]>

zhxchen17 force-pushed the zhxchen17/precompile/2 branch from df40fe6 to aba7a85 Compare October 10, 2025 19:16

zou3519 merged commit eef921f into vllm-project:main Oct 10, 2025
46 checks passed

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

36e3f08

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

huydhn reviewed Oct 15, 2025

View reviewed changes

This was referenced Oct 15, 2025

Update vLLM pinned commit to #25845 pytorch/pytorch#165270

Draft

[vllm hash update] update the pinned vllm hash pytorch/pytorch#165274

Open

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

8a7a17b

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: bbartels <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

0cd5b0a

Signed-off-by: zhxchen17 <[email protected]>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

19154f8

Signed-off-by: zhxchen17 <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

d5540b1

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

75863a4

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

8d44b88

Signed-off-by: zhxchen17 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

16b84c6

Signed-off-by: zhxchen17 <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

AOT Compilation for torch.compile (Bundled) (vllm-project#24274)

df5e0c1

Signed-off-by: zhxchen17 <[email protected]>

Uh oh!

AOT Compilation for torch.compile (Bundled) #24274

AOT Compilation for torch.compile (Bundled) #24274

Uh oh!

Conversation

zhxchen17 commented Sep 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vadiklyutiy commented Sep 5, 2025

Uh oh!

ilmarkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhxchen17 Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhxchen17 commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zou3519 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhxchen17 commented Sep 28, 2025

nvidia/Llama-3.3-70B-Instruct-FP8 (TP=2)

Qwen/Qwen3-32B

deepseek-ai/DeepSeek-V3.1 (TP=8)

openai/gpt-oss-120b (TP=2)

zai-org/GLM-4.5-Air (TP=2)

Uh oh!

zhxchen17 commented Sep 29, 2025

Uh oh!

zhxchen17 commented Oct 6, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhxchen17 commented Sep 4, 2025 •

edited by github-actions bot

Loading

zhxchen17 Sep 5, 2025 •

edited

Loading

zou3519 left a comment •

edited

Loading

huydhn Oct 21, 2025 •

edited

Loading