Skip to content

Conversation

@GreggHelt2
Copy link
Contributor

This PR adds support for ControlNet (and multiple ControlNets) within the nodes backend and nodes UI.
It adds:

  • image processing nodes for each of the ControlNet v1.1 annotators
  • a ControlNet node with options to specify:
    • ControlNet model
    • proprocessed image,
    • weight: how much influence ControlNet has on generated image
    • start and end for range of diffusion steps to apply ControlNet to (specified as fraction of total steps)
  • "control" input port to the TextToLatents node

Usage:

Single controlnet using preprocessed image

Screenshot from 2023-05-12 10-17-42

Multiple controlnets using image preprocessors

Screenshot from 2023-05-12 10-28-18

One limitation on current implementation is that there must be a Collect node between the controlnet(s) control output and the TextToLatents control input. Directly connecting a ControlNet node to TextToLatent node will result in an error. This is because I haven't figured out how to set up a polymorphic input port that can take either a single ControlField item or a list of ControlField items. I'm pretty sure it can be done but everything I've tried so far results in errors. Will reach out on discord for help on this, but I don't think it's a big enough issue to block the PR.

@psychedelicious
Copy link
Contributor

It works!

I've just done some playing around and found one issue - the SD 2.1 768x768 model causes an error:
image

Traceback (most recent call last):
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/processor.py", line 70, in __process
    outputs = invocation.invoke(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/app/invocations/latent.py", line 326, in invoke
    result_latents, result_attention_map_saver = model.latents_from_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 545, in latents_from_embeddings
    result: PipelineIntermediateState = infer_latents_from_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 212, in __call__
    for result in self.generator_method(*args, **kwargs):
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 600, in generate_latents_from_embeddings
    step_output = self.step(
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/bat/Documents/Code/InvokeAI/invokeai/backend/stable_diffusion/diffusers_pipeline.py", line 682, in step
    down_samples, mid_sample = control_datum.model(
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/controlnet.py", line 526, in forward
    sample, res_samples = downsample_block(
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 867, in forward
    hidden_states = attn(
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 265, in forward
    hidden_states = block(
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 331, in forward
    attn_output = self.attn2(
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 267, in forward
    return self.processor(
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 733, in __call__
    key = attn.to_k(encoder_hidden_states)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (154x1024 and 768x320)

Tried with Canny and OpenPose models.

@GreggHelt2
Copy link
Contributor Author

@psychedelicious

I've just done some playing around and found one issue - the SD 2.1 768x768 model causes an error:

Looking at your node graph, what should be happening is that the OpenPose preprocessor node resizes the image to 512x512 pixels before it is fed to the OpenPose algorithm. And redundantly resizing the resulting image again to 512x512 pixels before it is sent to the ControlNet node, which doesn't do any resizing. However OpenPose node is also sending width=512 and height=512 params to Noise node. So possibly the 768x768 SD2.1 model doesn't like the 64x64 latent being passed in from Noise? Or it could be something in ControlNet code in latent.py -- I'll do some testing...

@hipsterusername
Copy link
Member

hipsterusername commented May 16, 2023

I think it is likely because he's not using 2.1 Controlnet models, but is using SD 2.1.

@psychedelicious
Copy link
Contributor

@GreggHelt2 ill have to get back to you on if this breaks when I use the correct 2.1 control net models

However this is unexpected, why is the control net preprocessor resizing my image? The control image is not 512x512 and the results are not either.

Also, what are the two integer parameters use for on the preprocessor? If we are expression some image size or resolution, wouldn't be want to include both width and height?

@GreggHelt2
Copy link
Contributor Author

@GreggHelt2 ill have to get back to you on if this breaks when I use the correct 2.1 control net models

However this is unexpected, why is the control net preprocessor resizing my image? The control image is not 512x512 and the results are not either.

Also, what are the two integer parameters use for on the preprocessor? If we are expression some image size or resolution, wouldn't be want to include both width and height?

The image preprocessors are definitely a mixed bag. Some have detect_resolution and image_resolution parameters that I've exposed. The idea is that, depending on the image, for some processors it makes sense to internally resize up/down before doing the core image analysis, then resize back down/up for output. The convention with these preprocessors is to assume uniform height/width scaling. The single resolution specified is used for the min(height,weight), and the size of other dimension is calculated based on uniform scaling.

See Mikubill/sd-webui-controlnet#924 for discussion of resizing complexity that led to adding a "pixel_perfect" option in ControlNet auto1111 extension that autoresizes. I have not yet implemented that option but it is on my TODO list (probably another PR after this one gets merged).

Copy link
Member

@hipsterusername hipsterusername left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested - Aside from minor nits I've called out previously (follow-on fixes) this thing is ready for main!

@hipsterusername
Copy link
Member

@GreggHelt2 - Seeing a number of conflicts in here - May be due to other PRs being merged in?

@psychedelicious
Copy link
Contributor

Just realized, this is going to need some care while rebasing due to the images refactor pr. I'm happy to have a go at that tomorrow since I'm familiar with the changes, and the resolution may not be trivial.

@blessedcoolant
Copy link
Collaborator

Tested - Aside from minor nits I've called out previously (follow-on fixes) this thing is ready for main!

+1 .. I ran it quite a bit today. The ControlNet part itself is pretty solid. There's some UI/UX stuff that might need refining but can happen in a future PR. Also the Model Management PR needs to be merged in so it can handle the ControlNet models from disk rather than loading from HF.

@blessedcoolant
Copy link
Collaborator

Whoever merges this, maybe do a squash merge coz there's a large number of commits and a traceback might get harder. Unless @GreggHelt2 wants to retain the commit history.

@GreggHelt2
Copy link
Contributor Author

Update: Recent modifications to PR including pinning to the newly released controlnet_aux v0.0.4, reinstating the Zoe depth preprocessor node, and adding a Mediapipeface preprocessor node. Also, thanks to a great session with @psychedelicious , we got polymorphic input ports on nodes working. So now the control input port on TextToLatents can take either a single ControlField input or a list of ControlFields. So a single ControlNet can connect directly to TextToLatents without going through a Collect node, like:
Screenshot from 2023-05-23 18-17-47

@GreggHelt2
Copy link
Contributor Author

GreggHelt2 commented May 24, 2023

If you've manually installed controlnet_aux v0.0.4 to test this PR, you may want to check what version of the timm package is installed. It needs to be <= 0.6.13 in order for Zoe processors to work (see issue isl-org/ZoeDepth#26). Currently in the InvokeAI pyproject.toml I'm forcing the timm version to 0.6.13, so I think a pip install -e . on this branch will fix too.

@GreggHelt2
Copy link
Contributor Author

GreggHelt2 commented May 24, 2023

@GreggHelt2 - Seeing a number of conflicts in here - May be due to other PRs being merged in?

I'm not worried about the conflicts. Most of the currently reported conflicts are in the autogenerated stuff, I think from doing a yarn api:web. So only "real" conflict is in latent.py, which makes sense.

@GreggHelt2
Copy link
Contributor Author

Whoever merges this, maybe do a squash merge coz there's a large number of commits and a traceback might get harder. Unless @GreggHelt2 wants to retain the commit history.

I do like retaining commit history. "git bisect" with more precise history has saved me more than once. Maybe see how messy a final rebase is before deciding? I've been rebasing pretty much every time there's another PR merged to main just to make sure this PR isn't straying too far away. Though I haven't rebased after image refactoring PR (mentioned by @psychedelicious above) was merged yesterday/yesternight.

@hipsterusername
Copy link
Member

@GreggHelt2 - I think mediapipe may need to be added to pyproject.toml after your latest commits.

Were you going to address conflicts or were you waiting for @psychedelicious to do that?

@GreggHelt2
Copy link
Contributor Author

@GreggHelt2 - I think mediapipe may need to be added to pyproject.toml after your latest commits.

Thanks for catching the issue with mediapipe. For now like you suggested I added requirement to pyproject.toml. I'll put in a PR to controlnet_aux repo to add to its requirements instead, so eventually we should be able to remove again from pyproject.toml.

Were you going to address conflicts or were you waiting for @psychedelicious to do that?

I'll deal with those conflicts today.

@hipsterusername
Copy link
Member

Sounds great - pushing the big ol' merge button once they are! :)

…Txt2Img in backend/generator. Although backend/generator will likely disappear by v3.x, right now they are very useful for testing core ControlNet and MultiControlNet functionality while node codebase is rapidly evolving.
GreggHelt2 and others added 26 commits May 26, 2023 14:26
      MidasDepth
      ZoeDepth
      MLSD
      NormalBae
      Pidi
      LineartAnime
      ContentShuffle
Removed pil_output options, ControlNet preprocessors should always output as PIL. Removed diagnostics and other general cleanup.
… node, stripped controlnet stuff form image processing/analysis nodes.
…data struct. Also redid how multiple controlnets are handled.
each ControlNet, and which step to end using each controlnet (specified as fraction of total steps)
…gnostic printing. Also fixed error when there is no controlnet input.
…urned of pre-processor params that were added post v0.0.3. Also change defaults for shuffle.
…extToLatents.invoke(), and make upcoming integration with LatentsToLatents easier.
Also hacked in ability to specify HF subfolder when loading ControlNet models from string.
…ntrolnet_aux package adds mediapipe to its requirements.
@GreggHelt2 GreggHelt2 force-pushed the feat/controlnet-nodes branch from 3a23956 to a4b0140 Compare May 26, 2023 23:55
@GreggHelt2
Copy link
Contributor Author

Sounds great - pushing the big ol' merge button once they are! :)

All working now and rebased to main!

@hipsterusername hipsterusername merged commit 9a79636 into main May 27, 2023
@hipsterusername hipsterusername deleted the feat/controlnet-nodes branch May 27, 2023 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants