-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[Stable Diffusion Inpaint] Fix incorrect noise addition #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix a bug of incorrect timestep noise addition to latents. Introduce addition which improves results when using large number of iterations. It may also improve results for small number of iterations, doesn't make things worse.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
avoid blending latents improperly. should be either 1.0 or 0.0 and nothing in between.
|
Test appears to hardcode inpainting results?? |
|
But even with those modifications, still no meat 😄 |
change strength default, strength > 0 means img2img for inpainting region, should be expressly desired by the user. results will be very poor if the user is not aiming for img2img - prompt conflict etc...
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py
Outdated
Show resolved
Hide resolved
he may be doing something incorrect or perhaps unnecessary with strength and init_latents there. Your particular problem I suspect is a logic problem, you have incorrect mask processing somewhere. It would be peculiar for the whites to survive the noising process. See: this means that anywhere the mask is 0.0 the original image information is destroyed at that value. having a mask with values other than 0.0 or 1.0 is probably undefined behavior, I think. Blending latents seems like it would be a bad idea. And there would be opportunity there for original image to bleed through via blending with a bad mask. |
Please try to reproduce with the images examples in PR #261, you'll see this problem is still there and goes away with the modifications from @nagolinc (it does not mean that it's really better, we are still missing something imho).
Yes, that's what we want, but the problem probably comes from the start with the |
|
If after 50 steps of noise to the initial latents some meaningful information from the original survives, it may have implications for noise scheduling, mainly that it needs to be improved for all applications and not just inpainting. Initial t-step is usually seeded with noise, and if it has such a large unintended effect on the output that could be a problem. alternatively, there may be a bug in scheduler.add_noise(). |
|
@anton-l @patil-suraj could you take a look here? :-) |
|
I made a web frontend using Vue to test the inpainting of Stable Diffusion with diffusers. |
|
IMO for in-painting we should more or less just follow wants been shown here: https://github.com/CompVis/stable-diffusion/blob/main/scripts/inpaint.py @patil-suraj @anton-l - is that more or less how the official script looks like? |
patil-suraj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the pr @jackloomen !
The changes look good. Could you also update the tests here accordingly ?
https://github.com/huggingface/diffusers/blob/main/tests/test_pipelines.py#L592
| mask = np.array(mask).astype(np.float32) / 255.0 | ||
| mask = np.tile(mask, (4, 1, 1)) | ||
| mask = mask[None].transpose(0, 1, 2, 3) # what does this step do? | ||
| mask[np.where(mask != 0.0 )] = 1.0 # make sure mask is properly valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we maybe the just convert the mask to bool like this
mask = torch.from_numpy(mask).bool()
maks = (~mask).long()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the stable diffusion repo, it looks like this:
mask[mask < 0.5] = 0
mask[mask >= 0.5] = 1mask change
|
@patil-suraj I won't be able to compute expected_slice for some days, traveling. edit: i will see if the github test details show expected_slice and use that. |
|
Hey @jackloomen, Sorry for being so late here with our reply. We've played around with the implementation of this PR and it seems that it's not strictly better for all examples. Would it be ok for you to instead add your pipeline script to the community folder: https://github.com/huggingface/diffusers/tree/main/examples/community ? :-) |
👀 |
Fix a bug of incorrect timestep noise addition to latents.
Introduce addition which improves results when using large number of iterations.
It may also improve results for small number of iterations, doesn't make things worse.
edit: also fix the mask
edit2:
change strength default, strength < 1.0 means img2img for inpainting region, should be expressly desired by the user.
results will be very poor if the user is not aiming for img2img - prompt conflict etc...