Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Sep 1, 2022

Fix a bug of incorrect timestep noise addition to latents.
Introduce addition which improves results when using large number of iterations.
It may also improve results for small number of iterations, doesn't make things worse.

edit: also fix the mask

edit2:
change strength default, strength < 1.0 means img2img for inpainting region, should be expressly desired by the user.
results will be very poor if the user is not aiming for img2img - prompt conflict etc...

Fix a bug of incorrect timestep noise addition to latents.
Introduce addition which improves results when using large number of iterations.
It may also improve results for small number of iterations, doesn't make things worse.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

avoid blending latents improperly. should be either 1.0 or 0.0 and nothing in between.
@ghost
Copy link
Author

ghost commented Sep 1, 2022

Test appears to hardcode inpainting results??

@ghost ghost mentioned this pull request Sep 1, 2022
2 tasks
@leszekhanusz
Copy link
Contributor

Maybe you could also incorporate the modifications made here by @nagolinc so that the white part of the mask is not used at the beginning.
It has been discussed in PR #261

@leszekhanusz
Copy link
Contributor

But even with those modifications, still no meat 😄

change strength default, strength > 0 means img2img for inpainting region, should be expressly desired by the user.
results will be very poor if the user is not aiming for img2img - prompt conflict etc...
@ghost
Copy link
Author

ghost commented Sep 2, 2022

Maybe you could also incorporate the modifications made here by @nagolinc so that the white part of the mask is not used at the beginning. It has been discussed in PR #261

he may be doing something incorrect or perhaps unnecessary with strength and init_latents there.

Your particular problem I suspect is a logic problem, you have incorrect mask processing somewhere. It would be peculiar for the whites to survive the noising process.

See:
latents = init_latents_proper * mask + latents * (1-mask)

this means that anywhere the mask is 0.0 the original image information is destroyed at that value.

having a mask with values other than 0.0 or 1.0 is probably undefined behavior, I think. Blending latents seems like it would be a bad idea. And there would be opportunity there for original image to bleed through via blending with a bad mask.

@leszekhanusz
Copy link
Contributor

Your particular problem I suspect is a logic problem, you have incorrect mask processing somewhere. It would be peculiar for the whites to survive the noising process.

Please try to reproduce with the images examples in PR #261, you'll see this problem is still there and goes away with the modifications from @nagolinc (it does not mean that it's really better, we are still missing something imho).

See: latents = init_latents_proper * mask + latents * (1-mask)
this means that anywhere the mask is 0.0 the original image information is destroyed at that value.

Yes, that's what we want, but the problem probably comes from the start with the init_latents which contain the original image information and is transmitted at each iteration in the latents * (1-mask) part.
That's why @nagolinc added the line init_latents_noised=init_latents*(mask)+rand_latents*(1-mask) at the beginning so that even in the initial latents, there is absolutely no original image information in the parts that we want to inpaint.

@ghost
Copy link
Author

ghost commented Sep 2, 2022

If after 50 steps of noise to the initial latents some meaningful information from the original survives, it may have implications for noise scheduling, mainly that it needs to be improved for all applications and not just inpainting.

Initial t-step is usually seeded with noise, and if it has such a large unintended effect on the output that could be a problem.

alternatively, there may be a bug in scheduler.add_noise().

@patrickvonplaten
Copy link
Contributor

@anton-l @patil-suraj could you take a look here? :-)

@patrickvonplaten patrickvonplaten changed the title fix incorrect noise addition [Stable Diffusion Inpaint] Fix incorrect noise addition Sep 2, 2022
leszekhanusz added a commit to leszekhanusz/diffusers that referenced this pull request Sep 2, 2022
@leszekhanusz
Copy link
Contributor

I made a web frontend using Vue to test the inpainting of Stable Diffusion with diffusers.
It could help to test the inpainting pipelines faster.
It is available here and looks like this for now
It is still a work in progress but you can already generate images and draw inpainting mask to generate editions. Please tell me what you think.

@patrickvonplaten
Copy link
Contributor

IMO for in-painting we should more or less just follow wants been shown here: https://github.com/CompVis/stable-diffusion/blob/main/scripts/inpaint.py

@patil-suraj @anton-l - is that more or less how the official script looks like?

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the pr @jackloomen !

The changes look good. Could you also update the tests here accordingly ?

https://github.com/huggingface/diffusers/blob/main/tests/test_pipelines.py#L592

mask = np.array(mask).astype(np.float32) / 255.0
mask = np.tile(mask, (4, 1, 1))
mask = mask[None].transpose(0, 1, 2, 3) # what does this step do?
mask[np.where(mask != 0.0 )] = 1.0 # make sure mask is properly valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we maybe the just convert the mask to bool like this

mask = torch.from_numpy(mask).bool()
maks = (~mask).long()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the stable diffusion repo, it looks like this:

mask[mask < 0.5] = 0
mask[mask >= 0.5] = 1

@ghost
Copy link
Author

ghost commented Sep 6, 2022

@patil-suraj I won't be able to compute expected_slice for some days, traveling.
I changed the mask in accordance to the stable diffusion repo.

edit: i will see if the github test details show expected_slice and use that.

@patrickvonplaten
Copy link
Contributor

Hey @jackloomen,

Sorry for being so late here with our reply.
Stability AI will soon release a new inpainting checkpoint so that we will then adopt CompVis/Stable-Diffusion's way of doing in-painting to the main "Inpainting" class here. The current version is more of an experimental hack than an official implementation so we think it makes most sense to not change it too much.

We've played around with the implementation of this PR and it seems that it's not strictly better for all examples. Would it be ok for you to instead add your pipeline script to the community folder: https://github.com/huggingface/diffusers/tree/main/examples/community ? :-)

@nagolinc
Copy link
Contributor

Stability AI will soon release a new inpainting checkpoint

👀

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants