-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
status: trackingTracking work in progressTracking work in progress
Description
Overview
Mistral-7B introduces a new kind of attention, Sliding Window Attention (SWA). It looks like Mistral works with the Llama architecture, but in order to support longer sequences Mistral needs SWA (>4k). It would be interesting to introduce SWA in MLC, since probably in the future more models will come up with SWA too.
Action Items
- Overwrite KV Cache logic [KV Cache] Overwrite Cache - SW Attention relax#297
- SWA Causal Mask
- Implement SWA
- Adjust llm_chat.cc to support SWA
- Check performance for >4k context windows
Links to Related Issues and PRs
GameOverFlowChart, changtimwu, ruskvm, masahi, Solido and 4 more
Metadata
Metadata
Assignees
Labels
status: trackingTracking work in progressTracking work in progress
Type
Projects
Status
Done