-
Notifications
You must be signed in to change notification settings - Fork 6.6k
change max_shard_size to 10GB #8445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
TBH I'm not a fan of increasing the |
|
Hmm, I see |
|
just as not every version of diffusers will work for these new sharded weights, sometimes you have external diffusers forks that contain research code that wasn't accepted into this repo - and it'll possibly be as old as Diffusers 0.13... and despite the hub recommendation it's been inconsistent for diffusers this whole time. it's a breaking change, and what's the gain? |
|
I would not call this breaking, but I think the user experience is not nice. If I were a fine-tuner, I would not want my checkpoints to be shared, knowing that many users would not be able to use them. can we make sure we only shard the checkpoints when user explicitly choose to? i.e. I would not mind adding an additional argument if we decide to keep the default consistent across the libraries |
|
@Wauplin WDYT? In that case we will just have to add another codepath doing what we were doing before we added the sharding support. |
|
I'm not sure to understand the suggestion. There is a Maybe what you can do is:
WDYT? |
|
Can we maybe not default to saving sharded weights (maybe until some time into the future), but distribute "official" releases that require newer versions of diffusers already sharded, and document how users can do it? Can sharding be disabled with a special default of |
|
Let's say that a large size is the easiest way to go (I'm fine even with 50GB if it's temporary for a few versions). This way we would only have to change 1 default value in the future to enable sharding again. If doing so, better to explain it in the docstring instead of writing "defaults to 50GB" which wouldn't make sense if a user reads it. |
|
So, we are in agreement that increasing the default size is the easiest option at the moment. When doing so, we can definitely include a note about our plan around decreasing the size in ~3 months. Is there anything else we wanna log? |
Wauplin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok with it 👍
Co-authored-by: Lucain <[email protected]>
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
|
I think the failing test is unrelated. So, will merge this when the rest of the CI is done. |
* change max_shard_size to 10GB * add notes to the documentation * Update src/diffusers/models/modeling_utils.py Co-authored-by: Lucain <[email protected]> * change to abs limit --------- Co-authored-by: Lucain <[email protected]>
What does this PR do?
Fixes #8443.
@bghira for vis.