Skip to content

Does nvfp4 inference be supported? #10

@ZhuYe0729

Description

@ZhuYe0729

Hi,

Thanks for your great job! I can use the model_quant.py to ‘realquant’ or 'pseudoquant' large models successfully. However, when I used the transformers library to inference(code as the readme's example), I got some ValueErrors, such as "ValueError: Only 'mxfp4' is supported for forward_dtype for now.". I was very confused since it seemed that nvfp4 kernels is already supported? So if I remove the raise valueerror logic in transformers library and add changes to fp_quant.py as below, it can work well. So whether NVFP4-type inference is truly supported or not,or the issue actually lies in the Transformers library?

def adapt_fp_quant_config(config: FPQuantConfig):
    if config.forward_dtype == "mxfp4":
        forward_dtype = FPQuantDtype.MXFP4
    else:

        # raise ValueError(f"Unsupported forward dtype: {config.forward_dtype}")
        print(f"Warning: Unsupported forward dtype: {config.forward_dtype}, using NVFP4 instead.")
        forward_dtype = FPQuantDtype.NVFP4

    if config.backward_dtype == "bf16":
        backward_dtype = FPQuantDtype.BF16
    else:
        raise ValueError(f"Unsupported backward dtype: {config.backward_dtype}")

    return FPQuantLinearConfig(
        forward_dtype=forward_dtype,
        forward_method=config.forward_method,
        backward_dtype=backward_dtype,
        store_master_weights=config.store_master_weights,
        hadamard_group_size=config.hadamard_group_size,
        pseudoquantization=config.pseudoquantization,
        modules_to_not_convert=config.modules_to_not_convert,
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions