-
Notifications
You must be signed in to change notification settings - Fork 12
Closed
Description
Hi,
Thanks for your great job! I can use the model_quant.py to ‘realquant’ or 'pseudoquant' large models successfully. However, when I used the transformers library to inference(code as the readme's example), I got some ValueErrors, such as "ValueError: Only 'mxfp4' is supported for forward_dtype for now.". I was very confused since it seemed that nvfp4 kernels is already supported? So if I remove the raise valueerror logic in transformers library and add changes to fp_quant.py as below, it can work well. So whether NVFP4-type inference is truly supported or not,or the issue actually lies in the Transformers library?
def adapt_fp_quant_config(config: FPQuantConfig):
if config.forward_dtype == "mxfp4":
forward_dtype = FPQuantDtype.MXFP4
else:
# raise ValueError(f"Unsupported forward dtype: {config.forward_dtype}")
print(f"Warning: Unsupported forward dtype: {config.forward_dtype}, using NVFP4 instead.")
forward_dtype = FPQuantDtype.NVFP4
if config.backward_dtype == "bf16":
backward_dtype = FPQuantDtype.BF16
else:
raise ValueError(f"Unsupported backward dtype: {config.backward_dtype}")
return FPQuantLinearConfig(
forward_dtype=forward_dtype,
forward_method=config.forward_method,
backward_dtype=backward_dtype,
store_master_weights=config.store_master_weights,
hadamard_group_size=config.hadamard_group_size,
pseudoquantization=config.pseudoquantization,
modules_to_not_convert=config.modules_to_not_convert,
)Metadata
Metadata
Assignees
Labels
No labels