We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi everyone.
I have a model which consist of 4 onnx files.
I use TensorrtExecutionProvider. Cuda, onnxruntime and tenorrt versions are same.
What's the reason of such differ? Is there any way to reduce memory utilization for H100?
No response
Linux
RHEL8
Released Package
1.18.1
Python
X64
TensorRT
CUDA 11.8 / CUDA 12.4
No
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the issue
Hi everyone.
I have a model which consist of 4 onnx files.
I use TensorrtExecutionProvider. Cuda, onnxruntime and tenorrt versions are same.
What's the reason of such differ? Is there any way to reduce memory utilization for H100?
To reproduce
Urgency
No response
Platform
Linux
OS Version
RHEL8
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8 / CUDA 12.4
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: