-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Labels
feature requestNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersquantization
Description
🚀 The feature, motivation and pitch
Like what was added in #16850 for enabling marlin in fp8.py MoE layers, we should enable FP8 Marlin MoE for compressed tensors models to support users wanting to run them on older hardware.
Basically you want to take the changes in fp8.py's moe method (https://github.com/vllm-project/vllm/pull/16850/files#diff-5511bfcc9c53f7d96517ad43e4087f6777bef21302da983f42cafae40a866644) and apply them to CompressedTensorsW8A8Fp8MoEMethod
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersquantization