[RFC]: Blackwell Enablement for vLLM (SM100)

### Motivation.

We are in the process of making incremental changes for Blackwell Support in vLLM. This issue is a tracker for all the items that are planned.

### Planned or In Progress Features

The following items are either planned or currently in progress to enable vLLM support on Blackwell.

- [x] Enable NVFP4 Support
  - [x] (NVIDIA) Add functional support for NVFP4 Kernels for linear layers
  - [x] (NVIDIA) Add functional support for NVFP4 MoE Kernels
  - [x] (NVIDIA) Add Model Integration for nvidia/*-FP4 models
  - [x] Finetune GEMM configurations for Blackwell 
  - [x] (NVIDIA) Optimize MoE for Latency
  - [x] (NVIDIA) Optimize MoE for Throughput [FI: PR !1113](https://github.com/flashinfer-ai/flashinfer/pull/1113)
  - [x] (NVIDIA) MoE All Reduce Fusion [FI: PR !1108 ](https://github.com/flashinfer-ai/flashinfer/pull/1108)
  

- [ ] Optimize communication overlap ops
    - [x] (NVIDIA) Enable NCCL’s symmetric memory https://github.com/vllm-project/vllm/pull/24532
    - [ ] (NVIDIA) Add support for Gemm + comm overlap

- [x] Blackwell Attention Kernels
   - [x] (NVIDIA) Integrate Cutlass MLA Kernels #17625 
   - [x] (NVIDIA) Integrate vLLM v1-compatible Blackwell prefill and decode GQA kernels [FI: PR !1051](https://github.com/flashinfer-ai/flashinfer/pull/1051) 
   

- [x] FP8 Blockscale Gemm and MoE

   - [x] (NVIDIA) FP8 Blockscale GEMM
   - [x] (NVIDIA) FP8 Blockscale gemm optimizations: #18564 
   - [x] (NVIDIA) FP8 Blockscale MoE
   - [x] (NVIDIA) Latency and throughput optimizations 

- [x] MTP support 
 


### Feedback Period.

_No response_

### CC List.

@kushanam @kaixih 

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Blackwell Enablement for vLLM (SM100) #18153

Motivation.

Planned or In Progress Features

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Blackwell Enablement for vLLM (SM100) #18153

Description

Motivation.

Planned or In Progress Features

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions