[Tracking] Sliding Window Attention (Mistral AI)

## Overview

Mistral-7B introduces a new kind of attention, Sliding Window Attention (SWA). It looks like Mistral works with the Llama architecture, but in order to support longer sequences Mistral needs SWA (>4k). It would be interesting to introduce SWA in MLC, since probably in the future more models will come up with SWA too.

## Action Items

- [x] Overwrite KV Cache logic https://github.com/mlc-ai/relax/pull/297
- [x] SWA Causal Mask
- [x] Implement SWA
- [x] Adjust llm_chat.cc to support SWA
- [x] Check performance for >4k context windows

## Links to Related Issues and PRs

- #986


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tracking] Sliding Window Attention (Mistral AI) #1003

Overview

Action Items

Links to Related Issues and PRs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracking] Sliding Window Attention (Mistral AI) #1003

Description

Overview

Action Items

Links to Related Issues and PRs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions