[QST] Scale factors and benchmarks

Great paper and thanks for open sourcing the code.

A couple questions:
1) Is the benchmarking code in section 4 of the paper available (`GEMM`, `FastFP16toInt8`)?
2) In the per-group `W4A8` kernel, why is there a need for an additional channel-wise scale factor in `FusedDequantQuant`?  I.e., the `Int4` weights are dequantized to `FP16` using group-wise scale factors, then quantized to `Int8` using an additional channel-wise scale then fed to `Int8` GEMM.  In contrast, in the channel-wise `W4A8` kernel, the `Int4` weights are directly converted to `Int8` then fed to `Int8` GEMM.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[QST] Scale factors and benchmarks #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[QST] Scale factors and benchmarks #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions