Quantized MobileBert Model Fails with XNNPACK Assertion Error on Android Deployment #8994

qxuan512 · 2025-03-06T07:57:49Z

Description

When exporting the MobileBert model with XNNPACK quantization (via --quantize flag), the Android app encounters a fatal assertion error during inference:

executorch/backends/xnnpack/third-party/XNNPACK/src/microparams-init.c:1001: 
size_t xnn_init_qs8_add_minmax_scalar_params(...): 
assertion "abs_a_output_scale >= 0x1.0p-10f" failed

The error occurs only with the quantized model (non-quantized FP32 model works normally). Adjusting eps values (e.g., 2**-8 or 2**-6) in quantization configuration did not resolve the issue.

After exporting the unquantized MobileBERT model using the command:

python -m examples.xnnpack.aot_compiler \
    --model_name="mobilebert"  
    --delegate

the deployment and invocation in the Android app work correctly.

Steps to Reproduce

Export Quantized Model:

python -m examples.xnnpack.aot_compiler \
    --model_name="mobilebert" \
    --quantize \
    --delegate

Deploy to Android:

// Model Loading
Module mModule = Module.load(assetFilePath(context, "mobilebert_xnnpack_q8.pte"));
Map<String, Integer> vocab = BertUtils.loadVocab(context);
FullTokenizer tokenizer = new FullTokenizer(vocab, true);

// Inference
EValue[] outputs = mModule.forward(new EValue[]{
    EValue.from(inputs[0]),  // input_ids
    EValue.from(inputs[1])   // attention_mask
});
Tensor outputTensor = outputs[0].toTensor();

Run inference with sample inputs.

Observed Behavior

Quantized model triggers XNNPACK assertion failure indicating violation of minimum scale requirement (scale >= 2^-10).

Error persists after modifying eps values in quantization configuration:

extra_args: dict[str, Any] = {"eps": 2**-12}  # Original
extra_args: dict[str, Any] = {"eps": 2**-8}   # Attempted fix 1
extra_args: dict[str, Any] = {"eps": 2**-6}   # Attempted fix 2

Expected Behavior
Quantized model should execute without triggering XNNPACK assertion errors, similar to the FP32 model.

Environment

ExecuTorch Version: 0.5.0a0+1bc0699
Android NDK Version: r26b
Device: Oneplus 13 and SA8255P

cc @digantdesai @mcr229 @cbilgin @mergennachin @cccclai @helunwencser @jackzhxng

The text was updated successfully, but these errors were encountered:

mcr229 · 2025-03-12T18:44:29Z

I believe this assertion arises with quantized add when the ratio of the input scale to the output scale is too large. Specifically when in_scale / out_scale > 256. I believe this is to prevent clipping because of the very off quantization values. I have noticed this to happen occasionally from the quantizer.

Do you mind trying to increase the samples used for calibration here:
executorch/examples/xnnpack/quantization/utils.py?lines=26

github-actions bot mentioned this issue Mar 10, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#15

Open

mergennachin added module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/ module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code labels Mar 12, 2025

github-project-automation bot added this to ExecuTorch Core Mar 12, 2025

github-project-automation bot moved this to To triage in ExecuTorch Core Mar 12, 2025

mergennachin removed the module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code label Mar 12, 2025

mergennachin removed this from ExecuTorch Core Mar 12, 2025

mergennachin added this to ExecuTorch - CPU Mar 12, 2025

github-project-automation bot moved this to To triage in ExecuTorch - CPU Mar 12, 2025

This was referenced Mar 17, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#17

Open

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#19

Open

github-actions bot mentioned this issue Mar 31, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#21

Open

github-actions bot mentioned this issue Apr 7, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantized MobileBert Model Fails with XNNPACK Assertion Error on Android Deployment #8994

Quantized MobileBert Model Fails with XNNPACK Assertion Error on Android Deployment #8994

qxuan512 commented Mar 6, 2025 •

edited by pytorch-bot bot

Loading

mcr229 commented Mar 12, 2025 •

edited by digantdesai

Loading

Uh oh!

Quantized MobileBert Model Fails with XNNPACK Assertion Error on Android Deployment #8994

Quantized MobileBert Model Fails with XNNPACK Assertion Error on Android Deployment #8994

Comments

qxuan512 commented Mar 6, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mcr229 commented Mar 12, 2025 • edited by digantdesai Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qxuan512 commented Mar 6, 2025 •

edited by pytorch-bot bot

Loading

mcr229 commented Mar 12, 2025 •

edited by digantdesai

Loading