Skip to content

Conversation

@ajrasane
Copy link
Contributor

What does this PR do?

Type of change:
Example update

Overview:

  • Support ONNX export for fp8 and int8 precisions
  • Added utility functions to check for fp8 and int8 quantization (will be used in ONNXExporter)
  • Fixed a bug in evaluation API for high batch sizes
  • Added function to replace zeros from scales to smallest positive value in fp16

Usage

python torch_quant_to_onnx.py \
    --quantize_mode fp8/int8 \
    --onnx_save_path <onnx_path>  

Testing

Validated the accuracy and latency of int8 and fp8 models:

Metric INT8 FP8
Top1 Accuracy 84.584% 85.062%
Top5 Accuracy 97.3% 97.534%
Inference Latency 8.4825 ms 8.15096 ms

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

@ajrasane ajrasane requested review from a team as code owners November 21, 2025 00:12
Signed-off-by: ajrasane <[email protected]>
return False


def is_int8_quantized(model: nn.Module) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still consider mixed precision here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MR targets models with only FP8 and INT8 quantization. For mixed precision, I will update the following function with this utility function.

@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.39%. Comparing base (1aaa77d) to head (6d35c45).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/qdq_utils.py 8.33% 11 Missing ⚠️
modelopt/torch/_deploy/utils/torch_onnx.py 18.18% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #594      +/-   ##
==========================================
- Coverage   74.45%   74.39%   -0.07%     
==========================================
  Files         182      182              
  Lines       18250    18273      +23     
==========================================
+ Hits        13588    13594       +6     
- Misses       4662     4679      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants