[Performance] Onnx session utilizes more GPU and CPU ram on Nvidia H100 than on Nvidia A100 #24543

razor182 · 2025-04-25T06:55:20Z

Describe the issue

Hi everyone.

I have a model which consist of 4 onnx files.

I load it on Nvidia A100 and get next RAM utilization results: CPU - 4,3 Gb, GPU - 800 Mb.

I load it on Nvidia H100 and get next RAM utilization results: CPU - 6,7 Gb, GPU - 1854 Mb.

I use TensorrtExecutionProvider. Cuda, onnxruntime and tenorrt versions are same.

What's the reason of such differ? Is there any way to reduce memory utilization for H100?

To reproduce

Urgency

No response

Platform

Linux

OS Version

RHEL8

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8 / CUDA 12.4

Model File

No response

Is this a quantized model?

No

razor182 added the performance issues related to performance regressions label Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Onnx session utilizes more GPU and CPU ram on Nvidia H100 than on Nvidia A100 #24543

[Performance] Onnx session utilizes more GPU and CPU ram on Nvidia H100 than on Nvidia A100 #24543

razor182 commented Apr 25, 2025

[Performance] Onnx session utilizes more GPU and CPU ram on Nvidia H100 than on Nvidia A100 #24543

[Performance] Onnx session utilizes more GPU and CPU ram on Nvidia H100 than on Nvidia A100 #24543

Comments

razor182 commented Apr 25, 2025

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?