Skip to content

Conversation

@sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Jun 9, 2024

What does this PR do?

Fixes #8443.

@bghira for vis.

@sayakpaul sayakpaul requested review from Wauplin and yiyixuxu June 9, 2024 08:39
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin
Copy link
Collaborator

Wauplin commented Jun 10, 2024

TBH I'm not a fan of increasing the max_shard_size limit. The official recommendation is to have shards of 5GB and it would be great to keep this limit consistent in our libraries. To me, this is not per-se a breaking change though. All models released before #7830 are still backward compatible with previous versions of diffusers. I don't have much solutions to offer though, except suggesting to users to upgrade diffusers to the latest version.

@sayakpaul
Copy link
Member Author

@bghira
Copy link
Contributor

bghira commented Jun 10, 2024

just as not every version of diffusers will work for these new sharded weights, sometimes you have external diffusers forks that contain research code that wasn't accepted into this repo - and it'll possibly be as old as Diffusers 0.13... and despite the hub recommendation it's been inconsistent for diffusers this whole time. it's a breaking change, and what's the gain?

@yiyixuxu
Copy link
Collaborator

I would not call this breaking, but I think the user experience is not nice. If I were a fine-tuner, I would not want my checkpoints to be shared, knowing that many users would not be able to use them.

can we make sure we only shard the checkpoints when user explicitly choose to? i.e. I would not mind adding an additional argument if we decide to keep the default consistent across the libraries

@sayakpaul
Copy link
Member Author

@Wauplin WDYT?

In that case we will just have to add another codepath doing what we were doing before we added the sharding support.

@Wauplin
Copy link
Collaborator

Wauplin commented Jun 11, 2024

I'm not sure to understand the suggestion. There is a max_shard_size argument to determine if checkpoints should be sharded. Adding a new parameter next to it to also determine if a checkpoint should be sharded would not make sense in my opinion.

Maybe what you can do is:

  1. make a release with "10GB" as default value when saving a model
  2. in ~3months (?), lower down this default value to "5GB". This way, many users will have time to upgrade to a version of diffusers that is able to load sharded models.

WDYT?

@pcuenca
Copy link
Member

pcuenca commented Jun 11, 2024

Can we maybe not default to saving sharded weights (maybe until some time into the future), but distribute "official" releases that require newer versions of diffusers already sharded, and document how users can do it? Can sharding be disabled with a special default of max_shard_size, or is a large size the only way to go?

@Wauplin
Copy link
Collaborator

Wauplin commented Jun 11, 2024

Let's say that a large size is the easiest way to go (I'm fine even with 50GB if it's temporary for a few versions). This way we would only have to change 1 default value in the future to enable sharding again. If doing so, better to explain it in the docstring instead of writing "defaults to 50GB" which wouldn't make sense if a user reads it.

@sayakpaul
Copy link
Member Author

So, we are in agreement that increasing the default size is the easiest option at the moment. When doing so, we can definitely include a note about our plan around decreasing the size in ~3 months. Is there anything else we wanna log?

Copy link
Collaborator

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with it 👍

@sayakpaul
Copy link
Member Author

@Wauplin @pcuenca @yiyixuxu WDYT about the documentation changes?

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@sayakpaul
Copy link
Member Author

I think the failing test is unrelated. So, will merge this when the rest of the CI is done.

@sayakpaul sayakpaul merged commit d38f69e into main Jun 12, 2024
@sayakpaul sayakpaul deleted the change-max-shard-size branch June 12, 2024 12:49
sayakpaul added a commit that referenced this pull request Dec 23, 2024
* change max_shard_size to 10GB

* add notes to the documentation

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: Lucain <[email protected]>

* change to abs limit

---------

Co-authored-by: Lucain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

backwards compatibility breaking change from sharded checkpoints

7 participants