You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pull Request resolved: #11479
# Context
Currently the cpu implementation for the quantization operator (which includes `quantize_per_token`, `quantize_per_tensor`, and `quantize_per_channel`), does not inherently support half (fp16) input scalar types. In order to align with the PyTorch implementation that accepts fp16 and bfp16 inputs, this diff aims to enable half input dtype support for the quantization operators. We will be comparing this implementation against the vulkan operators.
# Changes
As defined in ExecuTorch [scalar_type_util.h](https://github.com/pytorch/executorch/blob/053686242c1687f0d51b3bb8befd14b047d7b025/runtime/core/exec_aten/util/scalar_type_util.h#L190) file, there is a method to enable support simply changing which preprocessor is called to ET_FORALL_FLOATH_TYPES. This enables support for Half (fp16), Float (fp32), and Double (fp64).
I have also included more comprehensive testing against the input dtypes, including adding double testing since it didn't already exist before. Instead of just confirming that all the output dtypes are supported, we also have a check that all input dtypes are supported now as well.
ghstack-source-id: 290376481
@exported-using-ghexport
Differential Revision: [D76053764](https://our.internmc.facebook.com/intern/diff/D76053764/)
0 commit comments