Skip to content

Cache size limit for generation #20767

@Natooz

Description

@Natooz

Feature request

Add a cache_limit argument for generate, limiting the size of the cache (past_key_values).

Motivation

In some contexts one might want to generate long sequences. When doing so, the system can easily run out of memory. Keeping the cache to a maximum size would allow users to have more control and tweak others parameters such as batch size or number of beams, to generate faster and take the most out of their hardware.

Your contribution

I implemented it in GPT2 (PyTorch & TF, PR is ready), but I guess this could be implemented more broadly in generate so that every models could benefit it.
It might relate to #17574.
Waiting for your opinion on this, I can probably add it to generate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions