Does nvfp4 inference be supported？

Hi,

Thanks for your great job! I can use the model_quant.py to ‘realquant’ or 'pseudoquant' large models successfully. However, when I used the transformers library to inference(code as the readme's example), I got some ValueErrors, such as "ValueError: Only 'mxfp4' is supported for forward_dtype for now.". I was very confused since it seemed that nvfp4 kernels is already supported? So if I remove the raise valueerror logic in transformers library and add changes to fp_quant.py as below, it can work well. So whether NVFP4-type inference is truly supported or not,or  the issue actually lies in the Transformers library?

```python
def adapt_fp_quant_config(config: FPQuantConfig):
    if config.forward_dtype == "mxfp4":
        forward_dtype = FPQuantDtype.MXFP4
    else:

        # raise ValueError(f"Unsupported forward dtype: {config.forward_dtype}")
        print(f"Warning: Unsupported forward dtype: {config.forward_dtype}, using NVFP4 instead.")
        forward_dtype = FPQuantDtype.NVFP4

    if config.backward_dtype == "bf16":
        backward_dtype = FPQuantDtype.BF16
    else:
        raise ValueError(f"Unsupported backward dtype: {config.backward_dtype}")

    return FPQuantLinearConfig(
        forward_dtype=forward_dtype,
        forward_method=config.forward_method,
        backward_dtype=backward_dtype,
        store_master_weights=config.store_master_weights,
        hadamard_group_size=config.hadamard_group_size,
        pseudoquantization=config.pseudoquantization,
        modules_to_not_convert=config.modules_to_not_convert,
    )
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does nvfp4 inference be supported？ #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does nvfp4 inference be supported？ #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions