-
Couldn't load subscription status.
- Fork 19
Closed
Description
Great paper and thanks for open sourcing the code.
A couple questions:
- Is the benchmarking code in section 4 of the paper available (
GEMM,FastFP16toInt8)? - In the per-group
W4A8kernel, why is there a need for an additional channel-wise scale factor inFusedDequantQuant? I.e., theInt4weights are dequantized toFP16using group-wise scale factors, then quantized toInt8using an additional channel-wise scale then fed toInt8GEMM. In contrast, in the channel-wiseW4A8kernel, theInt4weights are directly converted toInt8then fed toInt8GEMM.
Metadata
Metadata
Assignees
Labels
No labels