-
Notifications
You must be signed in to change notification settings - Fork 31.1k
[Core generation] Adds support for static KV cache
#27931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
121 commits
Select commit
Hold shift + click to select a range
17b8b38
initial commit
ArthurZucker 80ef815
lol
ArthurZucker 2639b5d
nits
ArthurZucker 9f2e1e4
nits nits nits nits nits
ArthurZucker 271260c
Merge branch 'main' of github.com:huggingface/transformers into stati…
ArthurZucker 5be65ff
Merge branch 'main' of github.com:huggingface/transformers into stati…
ArthurZucker c6b6d35
some nits and some testing
ArthurZucker 90224dd
nits
ArthurZucker 24ffbfb
Wrong implementation but creates good masks in general and is pretty …
ArthurZucker cd95e98
what seems to work for now
ArthurZucker 7cd3655
nites
ArthurZucker eeebc66
re-init cache
ArthurZucker 5819a85
make it automatic
ArthurZucker 216dd8f
nits and nits
ArthurZucker a48ae88
more nits
ArthurZucker aeefa26
nits
ArthurZucker e05f8da
nits
ArthurZucker 07f5cdc
more nits
ArthurZucker f769b0e
nits
ArthurZucker bb6a160
fastest working cache for now
ArthurZucker dd1e42c
also include the attention mask
ArthurZucker a3b0003
updates
ArthurZucker dacd0ff
current state
ArthurZucker 021f674
working code
ArthurZucker 98af852
dummy mask for now
ArthurZucker 8594670
Merge branch 'main' of github.com:huggingface/transformers into stati…
ArthurZucker 60af293
Merge branch 'static-cache' of github.com:huggingface/transformers in…
ArthurZucker 05166fe
Merge branch 'main' of github.com:huggingface/transformers into stati…
ArthurZucker 9c1a3b4
a better design
ArthurZucker d5395af
some fix
ArthurZucker a20a183
make outputs match
ArthurZucker bce7653
fastest yet
ArthurZucker 0e59f70
remove chunck qkv
ArthurZucker e573000
cleanup
ArthurZucker fce7e46
some test
ArthurZucker 24ef3cf
goat changes
ArthurZucker 344309f
nits
ArthurZucker 42e5a38
dynamic was not working anymore
ArthurZucker 6637755
cache reverts
ArthurZucker 6ec92df
small nits
ArthurZucker d784927
sdpa
ArthurZucker 0332d3f
Merge branch 'static-cache' of github.com:huggingface/transformers in…
ArthurZucker 4e40703
make sure sdpa passed
ArthurZucker 770c5e6
nit
ArthurZucker 7bd1fca
cleqnups
ArthurZucker 25fd440
cleanup
ArthurZucker 4c3220f
nits
ArthurZucker d51acfa
Merge branch 'main' of github.com:huggingface/transformers into stati…
ArthurZucker 2b2e0c2
pass sdpa
ArthurZucker 4b93379
make sure dynamic is BC
ArthurZucker ab07e80
update check on the attn weight
ArthurZucker 77ccdce
Merge branch 'static-cache' of https://github.com/huggingface/transfo…
ArthurZucker ad6832a
faster?
ArthurZucker 1cb6a16
add `_reset_cache`
ArthurZucker d044263
Merge branch 'static-cache' of github.com:huggingface/transformers in…
ArthurZucker c838352
nit
ArthurZucker e80b6a1
Merge branch 'static-cache' of https://github.com/huggingface/transfo…
ArthurZucker 8308809
nit
ArthurZucker 0132a2c
Merge branch 'static-cache' of github.com:huggingface/transformers in…
ArthurZucker 87b3064
merges
ArthurZucker 4d88605
Styling
ArthurZucker 011931e
nites
ArthurZucker e838f57
revert some BC breaking changes
ArthurZucker c23815a
make all tests pass
ArthurZucker c985064
torch long not float for attention mask
ArthurZucker 6a954d5
try to remove the guard
ArthurZucker 45760d6
BC
ArthurZucker 64f5455
even more cleanup
ArthurZucker f103454
fix `past_key_value.get_usable_length(kv_seq_len, self.layer_idx)`
ArthurZucker c7b5d2c
pushh a fast version
ArthurZucker 538ccf0
what actually works
ArthurZucker ce42624
no contigious()
ArthurZucker 33832d2
push for eager as well
ArthurZucker 8a53f53
simplest and best way to do it yet
ArthurZucker f560fe5
merge
ArthurZucker 5f90ed4
style
ArthurZucker e5c731e
Merge branch 'main' of github.com:huggingface/transformers into stati…
ArthurZucker b6c9180
dix dtype
ArthurZucker 8de700f
fix dtype issues
ArthurZucker e92b1a0
nits
ArthurZucker d9f7f16
nit
ArthurZucker d98f277
support export to torchscript
ArthurZucker 65217de
Credit helpers
ArthurZucker a219236
nits
ArthurZucker 7a6b57d
handle SDPA edge cases
ArthurZucker 2822423
handle sdpa quircks
ArthurZucker 70df80e
revert performance break
ArthurZucker b4fbf3f
Apply suggestions from code review
ArthurZucker 70d5ded
fix merges
ArthurZucker ec22fb1
revert removing ```
ArthurZucker 9968b0e
add another test
ArthurZucker dc885ca
update test
ArthurZucker 0c2a66f
Merge branch 'static-cache' of https://github.com/huggingface/transfo…
ArthurZucker e087adc
use a model that is not protected
ArthurZucker c0cf294
only test generation
ArthurZucker da720c8
update the cache utils to define the position_ids in the cache class
ArthurZucker 8f4c49d
fix static cache
ArthurZucker c22d564
add subtest to llama tests
ArthurZucker 89929b9
update testing suite
ArthurZucker d4b24ee
nuke whatever we can
ArthurZucker d7e400e
smthing wrong with cache
ArthurZucker 9d9eec3
nit
ArthurZucker 4eb8a9e
latest changes
ArthurZucker dad35d6
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 6f516a0
don't use einsum
ArthurZucker f25ac8e
nit
ArthurZucker 17f0350
remove one unused var
ArthurZucker b91efbb
update test value
ArthurZucker 256c324
let style be happy
ArthurZucker 327b77a
make sure cache tests are slow
ArthurZucker 8509e91
slow was removed add it back to test cach utils
ArthurZucker 60aa86d
fix flash_attention_2
ArthurZucker 7de4ace
very small nit
ArthurZucker 453df24
revert test change
ArthurZucker 0a1f8d2
make mistral the default copied from
ArthurZucker 040b2f1
fix copies
ArthurZucker 1763ec7
nits
ArthurZucker c4242c8
finishup
ArthurZucker af097af
fixup
ArthurZucker 5bbde6f
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 7f8ca33
skip tests
ArthurZucker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.