Skip to content

[Tracking] Sliding Window Attention (Mistral AI) #1003

@davidpissarra

Description

@davidpissarra

Overview

Mistral-7B introduces a new kind of attention, Sliding Window Attention (SWA). It looks like Mistral works with the Llama architecture, but in order to support longer sequences Mistral needs SWA (>4k). It would be interesting to introduce SWA in MLC, since probably in the future more models will come up with SWA too.

Action Items

Links to Related Issues and PRs

Metadata

Metadata

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions