Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7cb3b19
first impl of common MLAAttentionLayer - needs review
therealnaveenkamal Sep 17, 2025
055e3ee
major fixes2
therealnaveenkamal Sep 17, 2025
89ac015
mla wrapper abstraction and impl use_direct_call
therealnaveenkamal Sep 19, 2025
577917f
added unified_mla funcs and few fixes
therealnaveenkamal Sep 20, 2025
40a3c02
final fix
therealnaveenkamal Sep 24, 2025
832d316
fix precommit
therealnaveenkamal Sep 24, 2025
1bcb134
fix kv_c_normed
therealnaveenkamal Sep 24, 2025
b824ffa
implemented attn_backend for MLAAttention
therealnaveenkamal Sep 25, 2025
3876417
quick fix of kv_b_proj
therealnaveenkamal Sep 25, 2025
0873006
included MLA layers wherever Attention layers were collected, impleme…
therealnaveenkamal Sep 25, 2025
5ca30e8
precommit fixes
therealnaveenkamal Sep 25, 2025
9989959
replaced todo
therealnaveenkamal Sep 26, 2025
52e749f
rebased and made few changes
therealnaveenkamal Oct 2, 2025
6f1463d
lint fix
therealnaveenkamal Oct 2, 2025
349de26
mypy fix
therealnaveenkamal Oct 2, 2025
bd5812a
Merge branch 'main' into mla_attn
ProExpertProg Oct 2, 2025
4bc9e86
using MLAAttentionSpec in gpu_model_runner
therealnaveenkamal Oct 3, 2025
f574bb4
Merge branch 'vllm-project:main' into mla_attn
therealnaveenkamal Oct 3, 2025
494577d
Merge branch 'main' into mla_attn
therealnaveenkamal Oct 3, 2025
8216e1c
fix AttentionLayerBase
therealnaveenkamal Oct 7, 2025
97784fb
Merge pre-format main (17edd8a) into mla_attn as baseline
therealnaveenkamal Oct 7, 2025
c563dd0
Apply ruff/format fixes on files changed since 17edd8a
therealnaveenkamal Oct 7, 2025
a53c70e
Merge post-format + conflict resolution
therealnaveenkamal Oct 7, 2025
9068354
Merge branch 'main' into mla_attn
therealnaveenkamal Oct 7, 2025
8202371
pre-commit fixes
therealnaveenkamal Oct 7, 2025
e955784
fixed attentionlayerbase issue
therealnaveenkamal Oct 8, 2025
2422830
final fix
therealnaveenkamal Oct 8, 2025
dca6734
Merge branch 'main' into mla_attn
therealnaveenkamal Oct 8, 2025
2ddf547
Merge branch 'main' into mla_attn
ProExpertProg Oct 8, 2025
b52ac89
Remove unnecessary blank line in layer.py
ProExpertProg Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions vllm/attention/backends/abstract.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

import torch

from vllm.model_executor.layers.linear import ColumnParallelLinear
from vllm.model_executor.layers.quantization.utils.quant_utils import QuantKey


Expand Down Expand Up @@ -184,6 +185,31 @@ def fused_output_quant_supported(self, quant_key: QuantKey):


class MLAAttentionImpl(AttentionImpl[T], Generic[T]):
@abstractmethod
def __init__(
self,
num_heads: int,
head_size: int,
scale: float,
num_kv_heads: int,
alibi_slopes: Optional[list[float]],
sliding_window: Optional[int],
kv_cache_dtype: str,
logits_soft_cap: Optional[float],
attn_type: str,
kv_sharing_target_layer_name: Optional[str],
# MLA Specific Arguments
q_lora_rank: Optional[int],
kv_lora_rank: int,
qk_nope_head_dim: int,
qk_rope_head_dim: int,
qk_head_dim: int,
v_head_dim: int,
kv_b_proj: ColumnParallelLinear,
indexer: Optional[object] = None,
) -> None:
raise NotImplementedError

@abstractmethod
def forward(
self,
Expand Down
Loading