You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When exporting the MobileBert model with XNNPACK quantization (via --quantize flag), the Android app encounters a fatal assertion error during inference:
The error occurs only with the quantized model (non-quantized FP32 model works normally). Adjusting eps values (e.g., 2**-8 or 2**-6) in quantization configuration did not resolve the issue.
After exporting the unquantized MobileBERT model using the command:
I believe this assertion arises with quantized add when the ratio of the input scale to the output scale is too large. Specifically when in_scale / out_scale > 256. I believe this is to prevent clipping because of the very off quantization values. I have noticed this to happen occasionally from the quantizer.
Do you mind trying to increase the samples used for calibration here:
executorch/examples/xnnpack/quantization/utils.py?lines=26
Uh oh!
There was an error while loading. Please reload this page.
Description
When exporting the MobileBert model with XNNPACK quantization (via
--quantize
flag), the Android app encounters a fatal assertion error during inference:The error occurs only with the quantized model (non-quantized FP32 model works normally). Adjusting
eps
values (e.g.,2**-8
or2**-6
) in quantization configuration did not resolve the issue.After exporting the unquantized MobileBERT model using the command:
python -m examples.xnnpack.aot_compiler \ --model_name="mobilebert" --delegate
the deployment and invocation in the Android app work correctly.
Steps to Reproduce
python -m examples.xnnpack.aot_compiler \ --model_name="mobilebert" \ --quantize \ --delegate
Observed Behavior
scale >= 2^-10
).eps
values in quantization configuration:Expected Behavior
Quantized model should execute without triggering XNNPACK assertion errors, similar to the FP32 model.
Environment
cc @digantdesai @mcr229 @cbilgin @mergennachin @cccclai @helunwencser @jackzhxng
The text was updated successfully, but these errors were encountered: