Skip to content

Conversation

@gante
Copy link
Contributor

@gante gante commented Apr 9, 2025

What does this PR do?

WIP, see this comment


Uses deprecate_kwarg to rename max_batch_size to batch_size in all compilable caches. max_batch_size is a bad arg name: it implies that batch sizes smaller than max_batch_size can use the cache too, which is not the case.


Note that this deprecation was started before, but we messed it up along the way:

  1. Deprecation process started: Cache: use batch_size instead of max_batch_size #32657
  2. Deprecation message got changed to the opposite of the goal: Offloaded cache: fix generate #34921
  3. User-contributed PR that respected the (modified) deprecation message: Remove deprecated batch_size parameter #37007

@github-actions github-actions bot marked this pull request as draft April 9, 2025 10:04
@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@gante gante marked this pull request as ready for review April 9, 2025 10:04
@gante gante requested review from zucchini-nlp and removed request for Rocketknight1 April 9, 2025 10:06
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I even forgot we were deprecating the other way. Thanks for digging into it!

Since we talk about batch sizes, I remember this issue (#35444) where user wanted to contribute an actual max_batch_size (especially for enc-dec model cases). Similar to seq length, unused batches are all zeros. I think the feature is nice to have, but I also see we can mess up with users who manipulate directly cache._key_cache. WDYT about it, is it worth supporting?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante
Copy link
Contributor Author

gante commented Apr 9, 2025

@zucchini-nlp I see, the export use cache with a batch size > input batch size makes sense (export once, reuse with any batch size).

It's feasible, the questions are a) code complexity; b) throughput. I'm going to give it a go 🤞

@gante gante removed the request for review from ArthurZucker April 9, 2025 10:40
@zucchini-nlp
Copy link
Member

Yeah, I am also concerned if it will add too much complexity. Thanks, will be cool if it's doable with minimal maintenance cost :)

@gante
Copy link
Contributor Author

gante commented Apr 9, 2025

It's actually quite clean and doesn't seem to have throughput disadvantages 👀 closing this PR in favor of expanding capabilities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants