-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Closed
Labels
ep:CUDAissues related to the CUDA execution providerissues related to the CUDA execution provider
Description
Describe the issue
Unable to run inference on Nvidia DGX Spark. Unable to reproduce on other platforms though (e.g. A6000, A100, B100, AGX Thor). Seems to be an issue on this particular new platform.
To reproduce
- Setup environment/container for build
- Generate a simple add model
add.onnx. generate_add_model.py
pip install onnx==1.18.0
python3 generate_add_model.py
- Clone repo branch
rel-1.23.0
git clone -b rel-1.23.0 --recursive https://github.com/microsoft/onnxruntime.git onnxruntime
- Build command (used in Triton onnxruntime_backend)
./build.sh --config Release --skip_submodule_sync --parallel --build_shared_lib --compile_no_warning_as_error --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES='80-real;86-real;90-real;100-real;110-real;120' --cmake_extra_defines CMAKE_POLICY_VERSION_MINIMUM=3.5 --update --build --use_cuda --cuda_home "/usr/local/cuda" --cudnn_home "/usr" --allow_running_as_root
- Compile and run infer_add.cpp
g++ infer_add.cpp -I<ORT_INCLUDE_DIR> -L<ORT_LIB_DIR> -lonnxruntime -o infer_add -std=c++17 -DUSE_CUDA
LD_LIBRARY_PATH=<ORT_LIB_DIR>:$LD_LIBRARY_PATH ./infer_add
You should see error
2025-09-19 05:33:20.001722449 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running Add node. Name:'' Status Message: CUDA error cudaErrorSymbolNotFound:named symbol not found
ONNX Runtime error: 1: Non-zero status code returned while running Add node. Name:'' Status Message: CUDA error cudaErrorSymbolNotFound:named symbol not found
Urgency
Urgent. It's blocking our 25.08 and 25.09 release.
Platform
Linux
OS Version
24.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.23.0
ONNX Runtime API
C++
Architecture
ARM64
Execution Provider
CUDA
Execution Provider Library Version
13.0
Metadata
Metadata
Assignees
Labels
ep:CUDAissues related to the CUDA execution providerissues related to the CUDA execution provider