change max_shard_size to 10GB #8445

sayakpaul · 2024-06-09T08:39:23Z

What does this PR do?

HuggingFaceDocBuilderDev · 2024-06-09T08:45:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin · 2024-06-10T09:04:12Z

TBH I'm not a fan of increasing the max_shard_size limit. The official recommendation is to have shards of 5GB and it would be great to keep this limit consistent in our libraries. To me, this is not per-se a breaking change though. All models released before #7830 are still backward compatible with previous versions of diffusers. I don't have much solutions to offer though, except suggesting to users to upgrade diffusers to the latest version.

sayakpaul · 2024-06-10T09:08:44Z

Hmm, I see transformers also has 5GB:
https://github.com/huggingface/transformers/blob/807483edba45ee9707e55a36c572f8e2c3cd347e/src/transformers/modeling_utils.py#L2336

bghira · 2024-06-10T14:17:19Z

just as not every version of diffusers will work for these new sharded weights, sometimes you have external diffusers forks that contain research code that wasn't accepted into this repo - and it'll possibly be as old as Diffusers 0.13... and despite the hub recommendation it's been inconsistent for diffusers this whole time. it's a breaking change, and what's the gain?

yiyixuxu · 2024-06-11T00:56:22Z

I would not call this breaking, but I think the user experience is not nice. If I were a fine-tuner, I would not want my checkpoints to be shared, knowing that many users would not be able to use them.

can we make sure we only shard the checkpoints when user explicitly choose to? i.e. I would not mind adding an additional argument if we decide to keep the default consistent across the libraries

sayakpaul · 2024-06-11T05:44:57Z

@Wauplin WDYT?

In that case we will just have to add another codepath doing what we were doing before we added the sharding support.

Wauplin · 2024-06-11T07:07:39Z

I'm not sure to understand the suggestion. There is a max_shard_size argument to determine if checkpoints should be sharded. Adding a new parameter next to it to also determine if a checkpoint should be sharded would not make sense in my opinion.

Maybe what you can do is:

make a release with "10GB" as default value when saving a model
in ~3months (?), lower down this default value to "5GB". This way, many users will have time to upgrade to a version of diffusers that is able to load sharded models.

WDYT?

pcuenca · 2024-06-11T07:39:18Z

Can we maybe not default to saving sharded weights (maybe until some time into the future), but distribute "official" releases that require newer versions of diffusers already sharded, and document how users can do it? Can sharding be disabled with a special default of max_shard_size, or is a large size the only way to go?

Wauplin · 2024-06-11T07:42:23Z

Let's say that a large size is the easiest way to go (I'm fine even with 50GB if it's temporary for a few versions). This way we would only have to change 1 default value in the future to enable sharding again. If doing so, better to explain it in the docstring instead of writing "defaults to 50GB" which wouldn't make sense if a user reads it.

sayakpaul · 2024-06-11T09:12:46Z

So, we are in agreement that increasing the default size is the easiest option at the moment. When doing so, we can definitely include a note about our plan around decreasing the size in ~3 months. Is there anything else we wanna log?

Wauplin

Ok with it 👍

sayakpaul · 2024-06-11T12:15:29Z

@Wauplin @pcuenca @yiyixuxu WDYT about the documentation changes?

src/diffusers/models/modeling_utils.py

Co-authored-by: Lucain <[email protected]>

yiyixuxu

thanks!

sayakpaul · 2024-06-12T08:46:11Z

I think the failing test is unrelated. So, will merge this when the rest of the CI is done.

* change max_shard_size to 10GB * add notes to the documentation * Update src/diffusers/models/modeling_utils.py Co-authored-by: Lucain <[email protected]> * change to abs limit --------- Co-authored-by: Lucain <[email protected]>

change max_shard_size to 10GB

539304d

sayakpaul requested review from Wauplin and yiyixuxu June 9, 2024 08:39

Wauplin approved these changes Jun 11, 2024

View reviewed changes

sayakpaul added 2 commits June 11, 2024 13:11

Merge branch 'main' into change-max-shard-size

67eefc5

add notes to the documentation

20f14b5

Wauplin reviewed Jun 11, 2024

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

sayakpaul and others added 2 commits June 11, 2024 13:19

Update src/diffusers/models/modeling_utils.py

1bf14d7

Co-authored-by: Lucain <[email protected]>

change to abs limit

e4c2730

yiyixuxu approved these changes Jun 11, 2024

View reviewed changes

Merge branch 'main' into change-max-shard-size

a7f43dc

sayakpaul merged commit d38f69e into main Jun 12, 2024

sayakpaul deleted the change-max-shard-size branch June 12, 2024 12:49

change max_shard_size to 10GB #8445

change max_shard_size to 10GB #8445

Uh oh!

Conversation

sayakpaul commented Jun 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 9, 2024

Uh oh!

Wauplin commented Jun 10, 2024

Uh oh!

sayakpaul commented Jun 10, 2024

Uh oh!

bghira commented Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Jun 11, 2024

Uh oh!

sayakpaul commented Jun 11, 2024

Uh oh!

Wauplin commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcuenca commented Jun 11, 2024

Uh oh!

Wauplin commented Jun 11, 2024

Uh oh!

sayakpaul commented Jun 11, 2024

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jun 11, 2024

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jun 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

sayakpaul commented Jun 9, 2024 •

edited

Loading

bghira commented Jun 10, 2024 •

edited

Loading

Wauplin commented Jun 11, 2024 •

edited

Loading