[OMNIML-2244] enable fp8 and int8 ONNX export #594

ajrasane · 2025-11-21T00:12:21Z

What does this PR do?

Type of change:
Example update

Overview:

Support ONNX export for fp8 and int8 precisions
Added utility functions to check for fp8 and int8 quantization (will be used in ONNXExporter)
Fixed a bug in evaluation API for high batch sizes
Added function to replace zeros from scales to smallest positive value in fp16

Usage

python torch_quant_to_onnx.py \
    --quantize_mode fp8/int8 \
    --onnx_save_path <onnx_path>

Testing

Validated the accuracy and latency of int8 and fp8 models:

Metric	INT8	FP8
Top1 Accuracy	84.584%	85.062%
Top5 Accuracy	97.3%	97.534%
Inference Latency	8.4825 ms	8.15096 ms

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

Signed-off-by: ajrasane <[email protected]>

cjluo-nv · 2025-11-21T00:26:09Z

modelopt/torch/_deploy/utils/torch_onnx.py

    return False


+def is_int8_quantized(model: nn.Module) -> bool:


do we still consider mixed precision here?

This MR targets models with only FP8 and INT8 quantization. For mixed precision, I will update the following function with this utility function.

codecov · 2025-11-21T00:30:00Z

Codecov Report

❌ Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.39%. Comparing base (1aaa77d) to head (6d35c45).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/qdq_utils.py	8.33%	11 Missing ⚠️
modelopt/torch/_deploy/utils/torch_onnx.py	18.18%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #594      +/-   ##
==========================================
- Coverage   74.45%   74.39%   -0.07%     
==========================================
  Files         182      182              
  Lines       18250    18273      +23     
==========================================
+ Hits        13588    13594       +6     
- Misses       4662     4679      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[OMNIML-2244] enable fp8 and int8 ONNX export

d89b026

Signed-off-by: ajrasane <[email protected]>

ajrasane requested review from a team as code owners November 21, 2025 00:12

ajrasane requested review from cjluo-nv and vishalpandya1990 November 21, 2025 00:12

Update documentation

6d35c45

Signed-off-by: ajrasane <[email protected]>

cjluo-nv reviewed Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OMNIML-2244] enable fp8 and int8 ONNX export #594

[OMNIML-2244] enable fp8 and int8 ONNX export #594

ajrasane commented Nov 21, 2025

Uh oh!

cjluo-nv Nov 21, 2025

Uh oh!

ajrasane Nov 21, 2025

Uh oh!

codecov bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return False


		def is_int8_quantized(model: nn.Module) -> bool:

[OMNIML-2244] enable fp8 and int8 ONNX export #594

Are you sure you want to change the base?

[OMNIML-2244] enable fp8 and int8 ONNX export #594

Conversation

ajrasane commented Nov 21, 2025

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Uh oh!

cjluo-nv Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ajrasane Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 21, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants