-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Description
🚀 The feature, motivation and pitch
We have basically support Batch Invariant based on https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
Batch-invariant Inference (view)
But there are still some work to be done, so here is the issue to track the work
TODOs:
-
Basic framework Kernel-override Determinism [1/n] #25603 @bwasti
-
Flashinfer support [unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] #26373 @bwasti
-
Deepseek-v3 Deepseek-v3 Batch Invariant on 8xH100 #26609 @bwasti
-
DeepGEMM on Blackwell [Feature] Batch Invariant: Support DeepGEMM and Blackwell #27127 @yewentao256
-
Batch Invariant for R1 TP 8 on Blackwell [Feature] Batch Invariant for R1 TP 8 on Blackwell #27229 @yewentao256
-
Torch compile & Cuda Graph support [Feature] Batch invariant torch.compile #27660 @PaulZhang12
-
Usability & Documentation @bwasti Batch invariance doc #27839
-
an RL example @bwasti https://github.com/bwasti/spirl
-
Adds Batch invariant tests to CI [CI] Add batch invariant test to ci #27842 @yewentao256
-
TRITON_MLA support https://github.com/vllm-project/vllm/pull/29125/files @yewentao256
-
FLASHINFER_MLA support 🙋Help needed, context: [Feature Request]
trtllm_batch_decode_with_kv_cache_mlaBatch Invariant support flashinfer-ai/flashinfer#2107 -
Accelerate batch invariant triton kernels @bwasti @yewentao256
Nice to have:
- NVFP4 support
- Cutlass support
- AMD testing/support
- Speculative decoding support (this might be hard)
- vLLM Support for Generic Model Definitions @bwasti [RFC]: vLLM Support for Generic Model Definitions #28326
- (Out of scope) DP Support [Feature]: Batch Invariant Feature in DP+EP #30321
And currently, the performance of batch invariant mode is still not that good, let's optimize it together if you have a free hand!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status