[Feature]: Batch Invariant Feature and Performance Optimization

### 🚀 The feature, motivation and pitch

We have basically support Batch Invariant based on https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

https://github.com/orgs/vllm-project/projects/29/views/1

But there are still some work to be done, so here is the issue to track the work

TODOs:

- [x] Basic framework https://github.com/vllm-project/vllm/pull/25603 @bwasti 
- [x] Flashinfer support https://github.com/vllm-project/vllm/pull/26373 @bwasti 
- [x] Deepseek-v3 https://github.com/vllm-project/vllm/pull/26609 @bwasti 
- [x] DeepGEMM on Blackwell https://github.com/vllm-project/vllm/pull/27127  @yewentao256 
- [x] Batch Invariant for R1 TP 8 on Blackwell https://github.com/vllm-project/vllm/pull/27229 @yewentao256 
- [x] Torch compile & Cuda Graph support  https://github.com/vllm-project/vllm/pull/27660  @PaulZhang12 
- [x] Usability & Documentation @bwasti  https://github.com/vllm-project/vllm/pull/27839
- [x] an RL example @bwasti  https://github.com/bwasti/spirl
- [x] Adds Batch invariant tests to CI https://github.com/vllm-project/vllm/pull/27842 @yewentao256 
- [ ] TRITON_MLA support https://github.com/vllm-project/vllm/pull/29125/files @yewentao256 
- [ ] FLASHINFER_MLA support 🙋Help needed, context: https://github.com/flashinfer-ai/flashinfer/issues/2107


- [ ] Accelerate batch invariant triton kernels @bwasti @yewentao256 
    - [x] BMM optimization https://github.com/vllm-project/vllm/pull/29345 @yewentao256 


Nice to have:

- [ ] NVFP4 support
- [ ] Cutlass support
- [ ] AMD testing/support
- [ ] Speculative decoding support (this might be hard)
- [ ] vLLM Support for Generic Model Definitions @bwasti https://github.com/vllm-project/vllm/issues/28326
- [ ] (Out of scope) DP Support https://github.com/vllm-project/vllm/issues/30321


And currently, the performance of batch invariant mode is still not that good, let's optimize it together if you have a free hand!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Batch Invariant Feature and Performance Optimization #27433

🚀 The feature, motivation and pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Batch Invariant Feature and Performance Optimization #27433

Description

🚀 The feature, motivation and pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions