-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Closed
Description
Feature request
Add a cache_limit argument for generate, limiting the size of the cache (past_key_values).
Motivation
In some contexts one might want to generate long sequences. When doing so, the system can easily run out of memory. Keeping the cache to a maximum size would allow users to have more control and tweak others parameters such as batch size or number of beams, to generate faster and take the most out of their hardware.
Your contribution
I implemented it in GPT2 (PyTorch & TF, PR is ready), but I guess this could be implemented more broadly in generate so that every models could benefit it.
It might relate to #17574.
Waiting for your opinion on this, I can probably add it to generate.
Metadata
Metadata
Assignees
Labels
No labels