[Float8] add non-decomposed version of quantize/dequantize ops for fp8 #2961

shiyang-weng · 2025-09-09T03:18:11Z

What we want to do is to enable FP8 quantization in PyTorch. Similar to INT8 quantization, this requires inserting quantize and dequantize operations into the computational graph. In order to reuse pattern matching logic of int8, we need register FP8 quant and dequant.

To address this, we attempted to register quant in #2379, but the PR was reverted in #2672 because it caused performance regression on H100 GPUs. And there is no need to register q/dq on CUDA.

Based on the above reasons, I register quant specifically for CPU.

pytorch-bot · 2025-09-09T03:18:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2961

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 85614a4 with merge base 18dbe87 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168

seems OK to me, wondering if @vkuzo has additional thoughts, not sure if there is a better alternative here to support preserving ops for cpu

shiyang-weng · 2025-09-11T01:53:56Z

@vkuzo Could you help review this PR?

jerryzh168 · 2025-09-11T02:13:00Z

@vkuzo Could you help review this PR?

is this urgent? Vasiliy is not available recently and will be back next week

shiyang-weng · 2025-09-11T03:23:26Z

is this urgent? Vasiliy is not available recently and will be back next week

Thanks for letting me know. Not urgent. We can wait for him back next week

shiyang-weng · 2025-09-16T06:55:51Z

@vkuzo Could you help review this PR?

vkuzo

we should keep cuda and cpu logic consistent, device is supposed to be orthogonal to quantization workflows

I'd recommend a flag named around something like AOTI or pt2e (cc @jerryzh168 for the right name) to control whether you want to decompose the quant/dequant or not decompose them.

Xia-Weiwen · 2025-09-16T14:10:29Z

Hi @vkuzo Thanks for your suggestion. May I know what kind of flag you were talking about? A global flag, or an argument passed to quantization APIs?

Xia-Weiwen · 2025-09-17T01:23:23Z

Hi @vkuzo Thanks for your suggestion. May I know what kind of flag you were talking about? A global flag, or an argument passed to quantization APIs?

Hi @vkuzo @jerryzh168 Could you share a little more about the design? Thanks.

jerryzh168 · 2025-09-17T01:32:40Z

I think the decision of decompose or not decompose should be static? if we want consistent behavior for the same op across cuda and cpu, it might be better to have separate ops I feel

Xia-Weiwen · 2025-09-17T01:46:23Z

Hi @jerryzh168 Would you think it better to have a non-decomposed and a decomposed version of the op than a CPU and a CUDA version? We did a similar thing here: https://github.com/pytorch/pytorch/blob/df4ebddbe0fa2306fb8acd09b20265046d968c10/torch/ao/quantization/fx/_decomposed.py#L1206
also @vkuzo

jerryzh168 · 2025-09-18T00:18:07Z

yeah just a different op seems to be the only alternative here

shiyang-weng · 2025-09-18T02:29:35Z

Created quantize_affine_float8_non_decomposed and dequantize_affine_float8_non_decomposed separately for non-decomposed

jerryzh168 · 2025-09-19T00:13:18Z

LGTM, cc @vkuzo can you take a look again

jerryzh168

looks good to me

register fp8 quant/dequant only on CPU

88d5158

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2025

shiyang-weng marked this pull request as draft September 9, 2025 03:18

Xia-Weiwen approved these changes Sep 9, 2025

View reviewed changes

Xia-Weiwen requested review from andrewor14, drisspg and jerryzh168 September 9, 2025 07:42

Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Sep 9, 2025

shiyang-weng marked this pull request as ready for review September 10, 2025 01:28

jerryzh168 requested a review from vkuzo September 11, 2025 00:10

jerryzh168 approved these changes Sep 11, 2025

View reviewed changes

vkuzo requested changes Sep 16, 2025

View reviewed changes

shiyang-weng added 2 commits September 17, 2025 22:09

add non-decomposed quantize_affine_float8 and dequantize_affine_float8

d6d63f0

Merge remote-tracking branch 'origin/main' into wengshiy/register

85614a4

shiyang-weng requested a review from Xia-Weiwen September 18, 2025 02:15

Xia-Weiwen approved these changes Sep 18, 2025

View reviewed changes

Xia-Weiwen requested review from jerryzh168 and vkuzo September 18, 2025 02:26

Valentine233 mentioned this pull request Sep 19, 2025

[Bug fix][CPU] Fix fp8 sdpa compiling issue with latest PyTorch #2991

Merged

jerryzh168 approved these changes Sep 19, 2025

View reviewed changes

Xia-Weiwen changed the title ~~[Float8] register fp8 quant/dequant only on CPU~~ [Float8] add non-decomposed version of quantize/dequantize ops for fp8 Sep 21, 2025

Xia-Weiwen merged commit 8525185 into pytorch:main Sep 21, 2025
25 of 26 checks passed

[Float8] add non-decomposed version of quantize/dequantize ops for fp8 #2961

[Float8] add non-decomposed version of quantize/dequantize ops for fp8 #2961

Uh oh!

Conversation

shiyang-weng commented Sep 9, 2025

Uh oh!

pytorch-bot bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2961

✅ No Failures

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

shiyang-weng commented Sep 11, 2025

Uh oh!

jerryzh168 commented Sep 11, 2025

Uh oh!

shiyang-weng commented Sep 11, 2025

Uh oh!

shiyang-weng commented Sep 16, 2025

Uh oh!

vkuzo left a comment

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Sep 16, 2025

Uh oh!

Xia-Weiwen commented Sep 17, 2025

Uh oh!

jerryzh168 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xia-Weiwen commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 commented Sep 18, 2025

Uh oh!

shiyang-weng commented Sep 18, 2025

Uh oh!

jerryzh168 commented Sep 19, 2025

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Sep 9, 2025 •

edited

Loading

jerryzh168 commented Sep 17, 2025 •

edited

Loading

Xia-Weiwen commented Sep 17, 2025 •

edited

Loading