-
Notifications
You must be signed in to change notification settings - Fork 419
vae tiling improvements: encoding support and adaptative overlap #484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9edc59f
to
af0491b
Compare
6f49af0
to
6f96903
Compare
Nice! It would also be interesting to see larger tile sizes. Depending on VAE, larger tiles would still fit into the memory budget and would speed things up. |
5495c93
to
e9a8d17
Compare
e9a8d17
to
d103632
Compare
I am not familiar with AI programming, and this is a machine translation, so please ignore me if I say something strange. The However, the brightness issue (issue #588) seems to remain. |
I'm sorry, I don't quite understand what you mean.
You mean this PR doesn't fix it for you? If so this is strange, I can't reproduce the issue with this PR...
As you can see the image made with the master branch is much brighter than the one with this PR. |
Umm.. Sometimes the bottom of the generated image is grayed out; this did not happen when I initialized
I think I have misled you, but there is no doubt that this PR is very good, as you have demonstrated in your examples. |
Hmm that's really weird. I don't see how this could happen.
The result with tiled VAE is always going to be slightly lower quality than the full VAE, because it's lacking information about the whole image. (That's especially true with SD1.x/SD2.x models, because the VAE is quirky ). When using TAE (or with SDXL/SD3 or FLUX), the results are almost completely identical with the PR, while on master, there is the "overexposure" issue on the tiled TAE.
(there is a lot of difference here, mostly because of the kl-f8 vae doesn't handle tiling very well)
(these are almost identical however)
(no tiling) |
I can't vouch for the new code being 100% correct, not knowing the code base, but the approach looks reasonable (but does it have to match how the upscaler model was trained, and if so, does it match? I do not know enough for that). But I can definitely vouch for the old code being incorrect. It assumes that tile overlap is equal in x and y direction, and that's unlikely assuming "somewhat arbitrary" total sizes and a fixed tile size. And I can definitely confirm that #666 is fixed by this change - all five test cases pass (up from only the base case), and the more complex case I discovered this with appears to be fixed as well. Now the difference pictures from pre- and post-upscale pictures (after scaling both back to the same size) look like an edge detector without any noticeable artifacts/biases - precisely what's expected. |
d1fcf84
to
fb5f5b7
Compare
@stduhpf , would you please rebase this onto master? There are a few conflicts due to the VAE changes for Wan support. |
fb5f5b7
to
c0c0097
Compare
I haven't tested it, I know it builds fine, but I'm not sure if it actually works (I can't test it right now) |
c0c0097
to
e78d630
Compare
No problem, I'll test it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only noticed a single, very familiar issue 😀
Encode is also broken, fixing it |
e78d630
to
d7b671c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some unrelated whitespace changes, but apart from that it looks good to me. Working fine with image to image in SDXL, SD3 and Chroma, and image edit with Kontext.
Thank you for your contribution. Please change the environment variables to command-line arguments. Once this change is completed, I will merge the PR. |
@leejet , a quick question: the VAE tiling doesn't really change the loaded models, and its usefulness depends on the chosen image resolution (we even have a workaround on Koboldcpp to enable or disable it at inference time). Could we move it, together with the new tiling size parameters, to the sd_img_gen_params_t struct (possibly a new sd_tiling_params_t struct)? (I can do this change, either separately or on top of this PR) |
This is what I expect. |
@wbruna do you want to do these changes yourself or should I go ahead and start working on it? |
I won't be able to until Friday, so feel free to do it. From what I looked, the hardest part will be choosing how to implement the tile size parameters: that env var syntax kind of does too much 😃 |
101dcbb
to
0a2b4e4
Compare
* refactor tile number calculation * support non-square tiles * add env var to change tile overlap * add safeguards and better error messages for SD_TILE_OVERLAP * add safeguards and include overlapping factor for SD_TILE_SIZE * avoid rounding issues when specifying SD_TILE_SIZE as a factor * lower SD_TILE_OVERLAP limit * zero-init empty output buffer
0a2b4e4
to
987ced8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, and working fine on a few tests. The only issue I notice was a crash on an invalid tile size (more detailed comments below).
stable-diffusion.h
Outdated
int tile_size_x; | ||
int tile_size_y; | ||
float target_overlap; | ||
bool relative; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this boolean could be dropped, and we apply the same logic as the command-line: if rel_size_
is bigger than 0.0, it overrides the tile_size_
?
stable-diffusion.cpp
Outdated
int64_t t0 = ggml_time_ms(); | ||
if (!use_tiny_autoencoder) { | ||
float tile_overlap = vae_tiling_params.target_overlap; | ||
int tile_size_x = vae_tiling_params.tile_size_x; | ||
int tile_size_y = vae_tiling_params.tile_size_y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relative size of 0.0 applies a 4x4 tile (doesn't look good, but works); but a zeroed-out tile size crashes. Maybe an invalid value could just mean the default?
int tile_size_x = vae_tiling_params.tile_size_x < 4 ? 32 : vae_tiling_params.tile_size_x;
int tile_size_y = vae_tiling_params.tile_size_y < 4 ? 32 : vae_tiling_params.tile_size_y;
Then we could also simplify the default values for the param struct, by using zeroed-out fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue with the tile_overlap above.
Thanks for the review! I probably won't be able to work on these changes before tomorrow though. |
* avoid crash with invalid tile sizes, use 0 for default * refactor default tile size, limit overlap factor * remove explicit parameter for relative tile size * limit encoding tile to latent size
SD_TILE_SIZE
environment variable. (tile size for vae encoding will be about 1.3x bigger, for the same memory requirement)With these changes, vae encoding can now be tiled, and a small (unreported?) issue with the upscaler (or tae) over-brightening some parts of the image is now fixed. This could also be a step toward tiled diffusion support.
Fixes #588, #564 and #666