[Mistral][SWA] Add Mistral sliding window support #208

CharlieFRuan · 2023-11-08T04:53:33Z

This PR adds support for Mistral. The implementation follows the Mistral paper, specifically including sliding window attention (SWA), rolling buffer cache, and chunking, as discussed in Section 2 in the paper.

This PR is largely analogous to the changes in llm_chat.cc in mlc-llm's PR mlc-ai/mlc-llm#1087.

Different from the approaches in #202, this PR's implementation takes advantage of SWA, so that there is no max window size anymore--one of the main benefits of Mistral.

Tested:

Works well with:
- 4096 sliding window size, with 1024 chunk size
  - Note that the small chunk size has no effect on generated output, but only less memory requirement with slower speed
- 2048 sliding window size with 2048 chunk size
Tested with prompts that are multiple times in length of the sliding window size / chunk size

Irrelevantly, we also make wizard models reuse llama model libraries given the dynamic vocab size support (updated just now due to shuffle support).

cc @tqchen

tqchen · 2023-11-08T22:36:39Z

great, @CharlieFRuan let us make a new npm release and update the demo

Add Mistral sliding window support

7944bc1

CharlieFRuan mentioned this pull request Nov 8, 2023

[Mistral][SWA] Add sliding window to metadata mlc-ai/mlc-llm#1217

Merged

Change to 4096 window size with 1024 chunk size

0ad0fc6

tqchen approved these changes Nov 8, 2023

View reviewed changes

tqchen merged commit a9efc67 into mlc-ai:main Nov 8, 2023

CharlieFRuan deleted the pr-1106-shuffle branch November 9, 2023 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Mistral][SWA] Add Mistral sliding window support #208

[Mistral][SWA] Add Mistral sliding window support #208

Uh oh!

CharlieFRuan commented Nov 8, 2023 •

edited

Loading

Uh oh!

tqchen commented Nov 8, 2023

Uh oh!

Uh oh!

[Mistral][SWA] Add Mistral sliding window support #208

[Mistral][SWA] Add Mistral sliding window support #208

Uh oh!

Conversation

CharlieFRuan commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented Nov 8, 2023

Uh oh!

Uh oh!

CharlieFRuan commented Nov 8, 2023 •

edited

Loading