-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Remove image generation node dependencies on generate.py #2902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
So I'm thinking about attention maps, which need to be bubbled back from the innards of the diffusion step loop all the way up to the UI. They can readily be passed around as a PIL image or a dict of PIL images - i assume that will be easy enough to incorporate? |
|
Yes, at this point we can easily add new fields to InvokeAIGeneratorOutput
to return to the caller. I see that there is already an
`attention_maps_callback` in the Generator class, and it is generating a
list of PIL images. Is it correct that the last image in the list is the
final generated image and that all the rest are some sort of representation
of the mapping from token to image area? I haven't explored this at all.
How many of these maps are there?
|
keturn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given this a read-through, now for giving feedback and I'm trying to sort out which parts are inherent to the proposed design and which are temporary considerations that come from doing things one layer at a time -- and this is the Generate layer not the Generator layer (wow names are confusing).
I'm not opposed to ever using the factory pattern, but I am leery of it, so I'd like to make sure it's justified here.
We've got
manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
factory = InvokeAIGeneratorFactory(manager)
txt2img = factory.make_generator(Txt2Img)
generator = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a')
output = generate()So my question would be: What keeps that from being more like this?
manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
model = manager.current_model # or somesuch
output = txt2img(model, prompt='banana sushi', steps=12, scheduler='k_euler_a')|
I'll do a full review later, but commenting on the PR summary here. It would be nice to have this sort of pattern instead: model_manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
model = model_manager.get('stable-diffusion-2.0') # Whatever the standard string identifiers are
txt2img = Txt2Img(model)
outputs = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a', iterations=5)When I had originally suggested "factory", I was looking for something to create and manage the model objects that are used to generate. Factory might not even be the right pattern, but it's what came to mind at the time when I was looking at code. Anything that would help us utilize Diffusers more directly while still respecting model management (caching, GPU memory, etc.) would be great as well. That would allow us to more rapidly integrate newly-developed functionality in the space. |
|
Do you do plan to do anything with Wondering why that's two steps instead of passing the model along with the rest of the parameters. |
Yeah, some of the awkwardness of the current code is that I didn't want to break With respect to the factory pattern, it certainly isn't required but there is an argument to be made for retaining it. What isn't shown in the code example I put up is that there is a shared generation configuration parameter block that can be created and passed to the factory at the time it is initialized. A fuller example would look like this: This fits the style of having a common set of generation parameters that are usually static and shared among all the generators created by the factory. They can be changed (e.g. by webUI), after which time any new generators created get the changed configuration. The alternative is simply to keep the generation_parameters data object around and pass it directly to each of the InvokeAIGenerator initializers: Is this preferable? |
From my perspective, yes, definitely: all of the information i need to understand what txt2img is going to contain is right there on that line of code, there's no state or side effects that have been hidden from me as the reader of the code. |
See previous comment. If we're dropping the factory pattern, then I'd like to retain the InvokeAIGeneratorBasicParams so that the basic generation parameters are kept together in a coherent data structure. Could you also advise on whether Txt2Img needs to be an object? With respect to making it easier to use Diffusers directly, I need to ask you and @keturn for advice on how this will be used. The objects currently returned by the model manager are |
Ah my comment was less about how we call the method and more about getting the model separately from the model manager. (Though the node is already an object that contains all the parameters, has defaults, ranges, validation, etc... so doing things directly in there would make sense).
Mostly what I'd like to be able to achieve is the ease with which I was able to write this node: https://gist.github.com/Kyle0654/57c337f7c005662b98a53f4e1ed7a960 There's a lot in there I'm aware isn't correct for how we do things. In particular, the |
Would it be preferable to reduce everything to one class method call as in:
[edit] However, lots of things get done to the pipeline in the lower layers of the code, so this pseudocode would not work. |
personally i prefer instance methods over class methods, but that's a preference not an "ought". |
|
So I'm hearing a need for the model manager to handle the GPU/CPU caching of arbitrary Internally it would handle the See any problems with this? |
I'm super wary of statements like this because there's a reason why it took three months to do the diffusers integration instead of three days: there are still nearly a thousand lines of code in our You should not plan on using stock 🧨diffusers pipelines unless you're planning to do so without any InvokeAI features, like using masks with a non-inpainting model, or not using masks with an inpainting model, or cross-attention controls or thresholding or symmetry or attention maps or adjusting slice size based on available capacity.
These aren't bad ideas, and may well be things we end up doing, but I'd call that beyond the scope of this phase. |
If you never need an instance of |
Some of this might be okay for nodes that don't have exposed UI? And then if we want to expose UI we can always more fully integrate features? Is there some other way to help contributors more rapidly integrate new features? |
As long we have a When we add support for some other type of diffusers pipeline, we can do the work to see what we need to do to support whatever additional inputs and parameters it needs. After we do a couple of those, hopefully we're coming up with ways to factor out composible pieces so we're not tying ourselves in knots. I'm confident we'll figure out how to improve on this over time. But |
|
Hey folks, great discussion and I'm really grateful for the advice and guidance. I'm up against a two week vacation that is starting on Sunday, and I'm not sure how much time I'll have to work on this phase of the refactor. Can I get confirmation that this part of the refactor is not going to block other critical parts of the nodes integration, such as porting the WebUI to nodes? Realistically before I fly off Sunday morning I think I'll be able to (1) replace the factory paradigm with a simpler direct calls to Txt2Img, Img2Img and Inpaint classes; and (2) wire nodes up to the postprocessing classes so they are not calling in to Generate. After that, further work will drop way off. However if this is a blocker, then I'm happy to relinquish the task to anyone who would like to volunteer. |
Factory pattern is now removed. Typical usage of the InvokeAIGenerator is now:
```
from invokeai.backend.generator import (
InvokeAIGeneratorBasicParams,
Txt2Img,
Img2Img,
Inpaint,
)
params = InvokeAIGeneratorBasicParams(
model_name = 'stable-diffusion-1.5',
steps = 30,
scheduler = 'k_lms',
cfg_scale = 8.0,
height = 640,
width = 640
)
print ('=== TXT2IMG TEST ===')
txt2img = Txt2Img(manager, params)
outputs = txt2img.generate(prompt='banana sushi', iterations=2)
for i in outputs:
print(f'image={output.image}, seed={output.seed}, model={output.params.model_name}, hash={output.model_hash}, steps={output.params.steps}')
```
The `params` argument is optional, so if you wish to accept default
parameters and selectively override them, just do this:
```
outputs = Txt2Img(manager).generate(prompt='banana sushi',
steps=50,
scheduler='k_heun',
model_name='stable-diffusion-2.1'
)
```
…vokeAI into refactor/nodes-on-generator
|
Factory pattern is now removed. Typical usage of the InvokeAIGenerator is now: The |
Kyle0654
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great start at a larger refactor. We can keep working on it on top of these changes. As long as these work correctly, I'd approve.
|
I'm ready to declare victory on this phase of the refactor and turn it back over to @Kyle0654 . I got the safety checker working again, and actually improved it a bit by unloading it from GPU when not in use, thereby recovering about 0.6 GB of VRAM. The face restoration and upscaling routines have been moved into a new restoration service, and a bit of redundant code has been ripped out. I've fully tested the legacy WebUI and CLI to make sure they continue to work during the transition, and am happy to see that there are no longer any calls into the Please consider this ready for a review. After this, I plan to do documentation and a bit more refactoring - in particularly I would like to refactor the two facial reconstruction routines and esrgan upscaling. I will then rework the legacy CLI so that it no longer uses |
Remove node dependencies on generate.py
This is a draft PR in which I am replacing
generate.pywith a cleaner, more structured interface to the underlying image generation routines. The basic code pattern to generate an image using the new API is this:model management
The
ModelManagerhandles model selection and initialization. Itsget_model()method will return adictwith the following keys:model,model_name,hash,width, andheight, wheremodelis the actual StableDiffusionGeneratorPIpeline. Ifget_model()is called without a model name, it will return whatever is defined as the default inmodels.yaml, or the first entry if no default is designated.InvokeAIGenerator
The abstract base class
InvokeAIGeneratoris subclassed into intoTxt2Img,Img2Img,InpaintandEmbiggen. The constructor for these classes takes the model dict returned bymodel_manager.get_model()and optionally anInvokeAIGeneratorBasicParamsobject, which encapsulates all the parameters in common amongTxt2Img,Img2Imgetc. If you don't provide the basic params, a reasonable set of defaults will be chosen. Any of these parameters can be overridden atgenerate()time.These classes are defined in
invokeai.backend.generator, but they are also exported byinvokeai.backendas shown in the example below.Note that we were able to override the basic params in the call to
generate()The
generate()method will returns an iterator over a series ofInvokeAIGeneratorOutputobjects. These objects contain the PIL image, the seed, the model name and hash, and attributes for all the parameters used to generate the object (you can also get these as a dict). Theiterationsargument controls how many objects will be returned, defaulting to 1. PassNoneto get an infinite iterator.Given the proposed use of
compelto generate a templated series of prompts, I thought the API would benefit from a style that lets you loop over the output results indefinitely. I did consider returning a singleInvokeAIGeneratorOutputobject in the event thatiterations=1, but I think it's dangerous for a method to return different types of result under different circumstances.Changing the model is as easy as this:
Node and legacy support
With respect to
Nodes, I have writtenmodel_manager_initializerandrestoration_servicesmodules that returnmodel_managerandrestorationservices respectively. The latter is used by the face reconstruction and upscaling nodes. There is no longer any reference toGeneratein theapptree.I have confirmed that
txt2imgandimg2imgwork in the nodes client. I have not testedembiggenorinpaintyet. pytests are passing, with some warnings that I don't think are related to what I did.The legacy WebUI and CLI are still working off
Generate(which has not yet been removed from the source tree) and fully functional.I've finished all the tasks on my TODO list:
generatereconstruct.pyandupscale.pynodes to call directly into the postprocessing modules rather than going throughGenerategenerate