You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add NPU (Ascend) backend support for INT4 weight-only quantization workflow (#3172)
* Add NPU (Ascend) backend support for INT4 weight-only quantization workflow
* use torch.ops.npu prefix and drop redundant torch_npu import
* Modify test file and update comments
* add: merge NPU(Ascend) backend logic in Int4PlainInt32Tensor subclass
* ruff format cleanup, replace error types, add torch version check
* add torch_npu version assertion and show downstream testing result
* add downstream testing result
* unify NPU and XPU test cases into a single class
* move CI display to quantization README and update test file
Note: The quantization error incurred by applying int4 quantization to your model can be fairly significant, so using external techniques like GPTQ may be necessary to obtain a usable model.
75
-
74
+
Note:
75
+
- The quantization error incurred by applying int4 quantization to your model can be fairly significant, so using external techniques like GPTQ may be necessary to obtain a usable model.
0 commit comments