You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on "[ET-VK] Introduce generic export pass for fusing Q/DQ nodes"
## Context
When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime.
Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns.
## Changes
Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK.
Remove the existing `FuseDequantLinearPass()`
Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass.
Add `test_vulkan_passes` Python test to test export passes.
Some small refactors to `test_vulkan_delegate` Python test to improve code organizations.
Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/)
[ghstack-poisoned]
0 commit comments