[CUDA] replace 90a-virtual by 90-virtual for forward compatible #26230

tianleiwu · 2025-10-03T18:15:30Z

Users with RTX 5090 GPUs are experiencing runtime errors when using onnxruntime-gpu:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node. 
Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice:
no kernel image is available for execution on the device

This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM 12.0). The incompatibility of onnxruntime-gpu 1.23 was built with 90a-virtual. The 90a architecture is a specialized, non-forward-compatible version of the Hopper architecture, making it incompatible with future GPU generations like Blackwell.

This change will revert 90a-virtual back to 90-virtual as used in 1.22. This shall bring back the compatibility in Blackwell GPU.

The FPA_INTB_GEMM is disabled by default. It need some extra work to make it compatible with 90-virtual and no 90a-real use case.

Related:
#26002
#26226
#26181

hariharans29 · 2025-10-03T18:52:43Z

Just curious - What is the binary size impact for this ?

tianleiwu · 2025-10-03T22:32:20Z

Just curious - What is the binary size impact for this ?

Binary will be smaller (for example 4MB is reduced for Linux wheel).

onnxruntime/contrib_ops/cuda/llm/fpA_intB_gemm_profiler.cc

JulienMaille · 2025-10-08T21:43:26Z

Does this affect both the cuda backend and the dml backend or only the former?

snnn · 2025-10-09T18:28:42Z

Sorry it did not get into the 1.23.1 patch release.

Users with RTX 5090 GPUs are experiencing runtime errors when using onnxruntime-gpu: ``` [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node. Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device ``` This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM 12.0). The incompatibility of `onnxruntime-gpu` 1.23 was built with `90a-virtual`. The `90a` architecture is a specialized, non-forward-compatible version of the Hopper architecture, making it incompatible with future GPU generations like Blackwell. This change will revert `90a-virtual` back to `90-virtual` as used in 1.22. This shall bring back the compatibility in Blackwell GPU. The FPA_INTB_GEMM is disabled by default. It need some extra work to make it compatible with 90-virtual and no 90a-real use case. Related: #26002 #26226 #26181

Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Yateng Hong <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: quic-calvnguy <[email protected]> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <[email protected]> Co-authored-by: yifei <[email protected]>

apsonawane · 2025-10-21T23:43:45Z

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

replace 90a-virtual by 90-virtual for forward compatible

97adfcb

tianleiwu marked this pull request as draft October 3, 2025 19:52

tianleiwu marked this pull request as ready for review October 5, 2025 20:18

Fix test

4675385

tianleiwu force-pushed the tlwu/90_virtual branch from 6a22a18 to 4675385 Compare October 6, 2025 17:45

disable FPA INTB GEMM

8e644e7

tianleiwu closed this Oct 7, 2025

tianleiwu reopened this Oct 7, 2025

tianleiwu requested review from chilo-ms and nenad1002 October 7, 2025 20:25

nenad1002 approved these changes Oct 7, 2025

View reviewed changes

onnxruntime/contrib_ops/cuda/llm/fpA_intB_gemm_profiler.cc Show resolved Hide resolved

tianleiwu merged commit 11b23ad into main Oct 7, 2025
154 of 170 checks passed

tianleiwu deleted the tlwu/90_virtual branch October 7, 2025 22:17

tianleiwu added the release:1.23.1 label Oct 7, 2025

chilo-ms mentioned this pull request Oct 8, 2025

Rel-1.23.0 incompatible with Nvidia DGX Spark on CUDA13 #26096

Closed

tianleiwu mentioned this pull request Oct 8, 2025

onnxruntime-gpu 1.23 not running on NVIDIA Blackwell with CUDAExecutionProvider #26245

Closed

devang-ml added release:1.23.2 and removed release:1.23.1 labels Oct 14, 2025

apsonawane mentioned this pull request Oct 17, 2025

ORT 1.23.2 cherrypick 1 #26347

Closed

apsonawane mentioned this pull request Oct 20, 2025

ORT 1.23.2 cherrypick 1 #26368

Merged

apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] replace 90a-virtual by 90-virtual for forward compatible #26230

[CUDA] replace 90a-virtual by 90-virtual for forward compatible #26230

tianleiwu commented Oct 3, 2025 •

edited

Loading

Uh oh!

hariharans29 commented Oct 3, 2025

Uh oh!

tianleiwu commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

JulienMaille commented Oct 8, 2025

Uh oh!

snnn commented Oct 9, 2025

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[CUDA] replace 90a-virtual by 90-virtual for forward compatible #26230

[CUDA] replace 90a-virtual by 90-virtual for forward compatible #26230

Conversation

tianleiwu commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hariharans29 commented Oct 3, 2025

Uh oh!

tianleiwu commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

JulienMaille commented Oct 8, 2025

Uh oh!

snnn commented Oct 9, 2025

Uh oh!

apsonawane commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tianleiwu commented Oct 3, 2025 •

edited

Loading