Skip to content

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Oct 3, 2025

Users with RTX 5090 GPUs are experiencing runtime errors when using onnxruntime-gpu:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node. 
Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice:
no kernel image is available for execution on the device

This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM 12.0). The incompatibility of onnxruntime-gpu 1.23 was built with 90a-virtual. The 90a architecture is a specialized, non-forward-compatible version of the Hopper architecture, making it incompatible with future GPU generations like Blackwell.

This change will revert 90a-virtual back to 90-virtual as used in 1.22. This shall bring back the compatibility in Blackwell GPU.

The FPA_INTB_GEMM is disabled by default. It need some extra work to make it compatible with 90-virtual and no 90a-real use case.

Related:
#26002
#26226
#26181

@hariharans29
Copy link
Member

Just curious - What is the binary size impact for this ?

@tianleiwu tianleiwu marked this pull request as draft October 3, 2025 19:52
@tianleiwu
Copy link
Contributor Author

Just curious - What is the binary size impact for this ?

Binary will be smaller (for example 4MB is reduced for Linux wheel).

@tianleiwu tianleiwu marked this pull request as ready for review October 5, 2025 20:18
@tianleiwu tianleiwu closed this Oct 7, 2025
@tianleiwu tianleiwu reopened this Oct 7, 2025
@tianleiwu tianleiwu merged commit 11b23ad into main Oct 7, 2025
154 of 170 checks passed
@tianleiwu tianleiwu deleted the tlwu/90_virtual branch October 7, 2025 22:17
@JulienMaille
Copy link
Contributor

Does this affect both the cuda backend and the dml backend or only the former?

@snnn
Copy link
Member

snnn commented Oct 9, 2025

Sorry it did not get into the 1.23.1 patch release.

apsonawane pushed a commit that referenced this pull request Oct 17, 2025
Users with RTX 5090 GPUs are experiencing runtime errors when using
onnxruntime-gpu:
```
[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node.
Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice:
no kernel image is available for execution on the device
```
This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM
12.0). The incompatibility of `onnxruntime-gpu` 1.23 was built with
`90a-virtual`. The `90a` architecture is a specialized,
non-forward-compatible version of the Hopper architecture, making it
incompatible with future GPU generations like Blackwell.

This change will revert `90a-virtual` back to `90-virtual` as used in
1.22. This shall bring back the compatibility in Blackwell GPU.

The FPA_INTB_GEMM is disabled by default. It need some extra work to
make it compatible with 90-virtual and no 90a-real use case.

Related:
#26002
#26226
#26181
apsonawane pushed a commit that referenced this pull request Oct 20, 2025
Users with RTX 5090 GPUs are experiencing runtime errors when using
onnxruntime-gpu:
```
[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node.
Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice:
no kernel image is available for execution on the device
```
This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM
12.0). The incompatibility of `onnxruntime-gpu` 1.23 was built with
`90a-virtual`. The `90a` architecture is a specialized,
non-forward-compatible version of the Hopper architecture, making it
incompatible with future GPU generations like Blackwell.

This change will revert `90a-virtual` back to `90-virtual` as used in
1.22. This shall bring back the compatibility in Blackwell GPU.

The FPA_INTB_GEMM is disabled by default. It need some extra work to
make it compatible with 90-virtual and no 90a-real use case.

Related:
#26002
#26226
#26181
apsonawane added a commit that referenced this pull request Oct 21, 2025
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2:

- [TensorRT] Fix DDS output bug during engine update
  - PR: #26272
  - commit id: 00e85dd
- Fix shape inference failure with in-memory external data
   - PR: #26263
   - commit id: d955476
- [CUDA] replace 90a-virtual by 90-virtual for forward compatible 
  - PR: #26230
  - commit id: b58911f
- [QNN-EP] Fix logic flow bug
  - PR: #26148
  - commit id: b282379
- Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread
partition opt
  - PR: #26103
  - commit id: 7362518
- Update qMoE spec to support block quantization
  - PR: #25641
  - commit id: 7a8ffa8
- [VitisAI] add new api to VitisAI to save graph as a string
  - PR: #25602
  - commit id: 3361d72
- [[Build] Lock torch, onnxscript and onnx-ir versions to latest]
  - PR: #26315
  - commit id: ea69c4d

---------

Co-authored-by: Hariharan Seshadri <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Yateng Hong <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: quic-calvnguy <[email protected]>
Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
Co-authored-by: yifei410 <[email protected]>
Co-authored-by: yifei <[email protected]>
@apsonawane
Copy link
Contributor

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

@apsonawane apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants