Skip to content

Quantized MobileBert Model Fails with XNNPACK Assertion Error on Android Deployment #8994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
qxuan512 opened this issue Mar 6, 2025 · 1 comment
Labels
module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/

Comments

@qxuan512
Copy link

qxuan512 commented Mar 6, 2025

Description

When exporting the MobileBert model with XNNPACK quantization (via --quantize flag), the Android app encounters a fatal assertion error during inference:

executorch/backends/xnnpack/third-party/XNNPACK/src/microparams-init.c:1001: 
size_t xnn_init_qs8_add_minmax_scalar_params(...): 
assertion "abs_a_output_scale >= 0x1.0p-10f" failed

The error occurs only with the quantized model (non-quantized FP32 model works normally). Adjusting eps values (e.g., 2**-8 or 2**-6) in quantization configuration did not resolve the issue.

After exporting the unquantized MobileBERT model using the command:

python -m examples.xnnpack.aot_compiler \
    --model_name="mobilebert"  
    --delegate

the deployment and invocation in the Android app work correctly.

Steps to Reproduce

  1. Export Quantized Model:
python -m examples.xnnpack.aot_compiler \
    --model_name="mobilebert" \
    --quantize \
    --delegate
  1. Deploy to Android:
    // Model Loading
    Module mModule = Module.load(assetFilePath(context, "mobilebert_xnnpack_q8.pte"));
    Map<String, Integer> vocab = BertUtils.loadVocab(context);
    FullTokenizer tokenizer = new FullTokenizer(vocab, true);
    
    // Inference
    EValue[] outputs = mModule.forward(new EValue[]{
        EValue.from(inputs[0]),  // input_ids
        EValue.from(inputs[1])   // attention_mask
    });
    Tensor outputTensor = outputs[0].toTensor();
  2. Run inference with sample inputs.

Observed Behavior

  • Quantized model triggers XNNPACK assertion failure indicating violation of minimum scale requirement (scale >= 2^-10).
  • Error persists after modifying eps values in quantization configuration:
    extra_args: dict[str, Any] = {"eps": 2**-12}  # Original
    extra_args: dict[str, Any] = {"eps": 2**-8}   # Attempted fix 1
    extra_args: dict[str, Any] = {"eps": 2**-6}   # Attempted fix 2

Expected Behavior
Quantized model should execute without triggering XNNPACK assertion errors, similar to the FP32 model.

Environment

  • ExecuTorch Version: 0.5.0a0+1bc0699
  • Android NDK Version: r26b
  • Device: Oneplus 13 and SA8255P

cc @digantdesai @mcr229 @cbilgin @mergennachin @cccclai @helunwencser @jackzhxng

@mergennachin mergennachin added module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/ module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code labels Mar 12, 2025
@github-project-automation github-project-automation bot moved this to To triage in ExecuTorch Core Mar 12, 2025
@mergennachin mergennachin removed the module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code label Mar 12, 2025
@mcr229
Copy link
Contributor

mcr229 commented Mar 12, 2025

I believe this assertion arises with quantized add when the ratio of the input scale to the output scale is too large. Specifically when in_scale / out_scale > 256. I believe this is to prevent clipping because of the very off quantization values. I have noticed this to happen occasionally from the quantizer.

Do you mind trying to increase the samples used for calibration here:
executorch/examples/xnnpack/quantization/utils.py?lines=26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/
Projects
Status: To triage
Development

No branches or pull requests

3 participants