Skip to content

Conversation

@lstein
Copy link
Collaborator

@lstein lstein commented Mar 9, 2023

Remove node dependencies on generate.py

This is a draft PR in which I am replacing generate.py with a cleaner, more structured interface to the underlying image generation routines. The basic code pattern to generate an image using the new API is this:

from invokeai.backend import ModelManager, Txt2Img, Img2Img

manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
model = manager.get_model('stable-diffusion-1.5')
txt2img = Txt2Img(model)
outputs = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a', iterations=5)

# generate() returns an iterator
for next_output in outputs:
    print(next_output.image, next_output.seed)

outputs = Img2Img(model).generate(prompt='strawberry` sushi', init_img='./banana_sushi.png')
output = next(outputs)
output.image.save('strawberries.png')

model management

The ModelManager handles model selection and initialization. Its get_model() method will return a dict with the following keys: model, model_name,hash, width, and height, where model is the actual StableDiffusionGeneratorPIpeline. If get_model() is called without a model name, it will return whatever is defined as the default in models.yaml, or the first entry if no default is designated.

InvokeAIGenerator

The abstract base class InvokeAIGenerator is subclassed into into Txt2Img, Img2Img, Inpaint and Embiggen. The constructor for these classes takes the model dict returned by model_manager.get_model() and optionally an InvokeAIGeneratorBasicParams object, which encapsulates all the parameters in common among Txt2Img, Img2Img etc. If you don't provide the basic params, a reasonable set of defaults will be chosen. Any of these parameters can be overridden at generate() time.

These classes are defined in invokeai.backend.generator, but they are also exported by invokeai.backend as shown in the example below.

from invokeai.backend import InvokeAIGeneratorBasicParams, Img2Img
params = InvokeAIGeneratorBasicParams(
    perlin = 0.15
    steps = 30
   scheduler = 'k_lms'
)
img2img = Img2Img(model, params)
outputs = img2img.generate(scheduler='k_heun')

Note that we were able to override the basic params in the call to generate()

The generate() method will returns an iterator over a series of InvokeAIGeneratorOutput objects. These objects contain the PIL image, the seed, the model name and hash, and attributes for all the parameters used to generate the object (you can also get these as a dict). The iterations argument controls how many objects will be returned, defaulting to 1. Pass None to get an infinite iterator.

Given the proposed use of compel to generate a templated series of prompts, I thought the API would benefit from a style that lets you loop over the output results indefinitely. I did consider returning a single InvokeAIGeneratorOutput object in the event that iterations=1, but I think it's dangerous for a method to return different types of result under different circumstances.

Changing the model is as easy as this:

model = manager.get_model('inkspot-2.0`)
txt2img = Txt2Img(model)

Node and legacy support

With respect to Nodes, I have written model_manager_initializer and restoration_services modules that return model_manager and restoration services respectively. The latter is used by the face reconstruction and upscaling nodes. There is no longer any reference to Generate in the app tree.

I have confirmed that txt2img and img2img work in the nodes client. I have not tested embiggen or inpaint yet. pytests are passing, with some warnings that I don't think are related to what I did.

The legacy WebUI and CLI are still working off Generate (which has not yet been removed from the source tree) and fully functional.

I've finished all the tasks on my TODO list:

  • Update the pytests, which are failing due to dangling references to generate
  • Rewrite the reconstruct.py and upscale.py nodes to call directly into the postprocessing modules rather than going through Generate
  • Update the pytests, which are failing due to dangling references to generate

@lstein lstein marked this pull request as draft March 9, 2023 06:46
@damian0815
Copy link
Contributor

These objects contain the PIL image, the seed, the model name and hash, and attributes for all the parameters used to generate the object (you can also get these as a dict).

So I'm thinking about attention maps, which need to be bubbled back from the innards of the diffusion step loop all the way up to the UI. They can readily be passed around as a PIL image or a dict of PIL images - i assume that will be easy enough to incorporate?

@lstein
Copy link
Collaborator Author

lstein commented Mar 9, 2023 via email

Copy link
Contributor

@keturn keturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a read-through, now for giving feedback and I'm trying to sort out which parts are inherent to the proposed design and which are temporary considerations that come from doing things one layer at a time -- and this is the Generate layer not the Generator layer (wow names are confusing).

I'm not opposed to ever using the factory pattern, but I am leery of it, so I'd like to make sure it's justified here.

We've got

manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
factory = InvokeAIGeneratorFactory(manager)

txt2img = factory.make_generator(Txt2Img)
generator = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a')
output = generate()

So my question would be: What keeps that from being more like this?

manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
model = manager.current_model  # or somesuch
output = txt2img(model, prompt='banana sushi', steps=12, scheduler='k_euler_a')

@Kyle0654
Copy link
Contributor

Kyle0654 commented Mar 9, 2023

I'll do a full review later, but commenting on the PR summary here. It would be nice to have this sort of pattern instead:

model_manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
model = model_manager.get('stable-diffusion-2.0') # Whatever the standard string identifiers are
txt2img = Txt2Img(model)
outputs = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a', iterations=5)

When I had originally suggested "factory", I was looking for something to create and manage the model objects that are used to generate. Factory might not even be the right pattern, but it's what came to mind at the time when I was looking at code.

Anything that would help us utilize Diffusers more directly while still respecting model management (caching, GPU memory, etc.) would be great as well. That would allow us to more rapidly integrate newly-developed functionality in the space.

@Kyle0654
Copy link
Contributor

Kyle0654 commented Mar 9, 2023

With respect to Nodes, I have updated the generate_initializer service and the generate invocations so that they call into the new InvokeAIGeneratorFactory.

generate_initializer should probably just not exist once we're done refactoring. It was something temporary to stick all the "initialization" code in to get Generator initialized. We'd ideally just have different well-contained and constructed services (with abstract base classes) that get created and added to the service collection.

I have confirmed that txt2img works in the nodes client. However I'm a bit confused about how to use the node CLI's img2img, as it seems to take an image from memory rather than a path to an image, but then the load_image doesn't work the way I think it should.

load_image is set up to load an image from the image service, which we assume already exists (or that you've uploaded it through that service). It might make since to replicate the upload functionality in a CLI command (that can download an image or load it from disk and add it to the image service). Without that, you can pipe txt2img to img2img to test things out, with a show_image between to show the original result.

@keturn
Copy link
Contributor

keturn commented Mar 9, 2023

txt2img = Txt2Img(model)
outputs = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a', iterations=5)

Do you do plan to do anything with txt2img besides call that single method on it?

Wondering why that's two steps instead of passing the model along with the rest of the parameters.

@lstein
Copy link
Collaborator Author

lstein commented Mar 9, 2023

I've given this a read-through, now for giving feedback and I'm trying to sort out which parts are inherent to the proposed design and which are temporary considerations that come from doing things one layer at a time -- and this is the Generate layer not the Generator layer (wow names are confusing).

I'm not opposed to ever using the factory pattern, but I am leery of it, so I'd like to make sure it's justified here.

We've got

manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
factory = InvokeAIGeneratorFactory(manager)

txt2img = factory.make_generator(Txt2Img)
generator = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a')
output = generate()

So my question would be: What keeps that from being more like this?

manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
model = manager.current_model  # or somesuch
output = txt2img(model, prompt='banana sushi', steps=12, scheduler='k_euler_a')

Yeah, some of the awkwardness of the current code is that I didn't want to break base.py up into separate refactored files at this time because this would create problems for generate.py. I want to keep things working as much as possible during the transition. There will definitely be more refactoring to do.

With respect to the factory pattern, it certainly isn't required but there is an argument to be made for retaining it. What isn't shown in the code example I put up is that there is a shared generation configuration parameter block that can be created and passed to the factory at the time it is initialized. A fuller example would look like this:

manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
generation_parameters =  InvokeAIGeneratorBasicParams(
        model_name = 'stable-diffusion-1.5',
        steps = 30,
        scheduler = 'k_lms',
        cfg_scale = 8.0,
        height = 640,
        width = 640
        )

factory = InvokeAIGeneratorFactory(manager, params=generation_parameters)
txt2img = factory.make_generator(Txt2Img)
img2img = factory.make_generator(Img2Img)

factory.params.update(height=512, width=512)
inpaint = factory.make_generator(inpaint)

This fits the style of having a common set of generation parameters that are usually static and shared among all the generators created by the factory. They can be changed (e.g. by webUI), after which time any new generators created get the changed configuration.

The alternative is simply to keep the generation_parameters data object around and pass it directly to each of the InvokeAIGenerator initializers:

txt2img = Txt2Img(manager, generation_parameters)

Is this preferable?

@damian0815
Copy link
Contributor

The alternative is simply to keep the generation_parameters data object around and pass it directly to each of the InvokeAIGenerator initializers:
txt2img = Txt2Img(manager, generation_parameters)
Is this preferable?

From my perspective, yes, definitely: all of the information i need to understand what txt2img is going to contain is right there on that line of code, there's no state or side effects that have been hidden from me as the reader of the code.

@lstein
Copy link
Collaborator Author

lstein commented Mar 9, 2023

I'll do a full review later, but commenting on the PR summary here. It would be nice to have this sort of pattern instead:

model_manager = ModelManager('/data/lstein/invokeai-main/configs/models.yaml')
model = model_manager.get('stable-diffusion-2.0') # Whatever the standard string identifiers are
txt2img = Txt2Img(model)
outputs = txt2img.generate(prompt='banana sushi', steps=12, scheduler='k_euler_a', iterations=5)

When I had originally suggested "factory", I was looking for something to create and manage the model objects that are used to generate. Factory might not even be the right pattern, but it's what came to mind at the time when I was looking at code.

Anything that would help us utilize Diffusers more directly while still respecting model management (caching, GPU memory, etc.) would be great as well. That would allow us to more rapidly integrate newly-developed functionality in the space.

See previous comment. If we're dropping the factory pattern, then I'd like to retain the InvokeAIGeneratorBasicParams so that the basic generation parameters are kept together in a coherent data structure. Could you also advise on whether Txt2Img needs to be an object? generate() could be a class method:

from invokeai.backend.generator import Txt2Img
#... get model
#.. build parameters
outputs = Txt2Img.generate(model, basic_params, prompt="banana sushi")

With respect to making it easier to use Diffusers directly, I need to ask you and @keturn for advice on how this will be used. The objects currently returned by the model manager are StableDiffusionGeneratorPipeline objects, but this could be changed to any other DiffusionPipeline class fairly easily by having an optional parameter that selects the type. My question is whether these alternative pipelines are stanzas in the models.yaml file, or are we configuring them on the fly?

@Kyle0654
Copy link
Contributor

Kyle0654 commented Mar 9, 2023

See previous comment. If we're dropping the factory pattern, then I'd like to retain the InvokeAIGeneratorBasicParams so that the basic generation parameters are kept together in a coherent data structure. Could you also advise on whether Txt2Img needs to be an object? generate() could be a class method:

Ah my comment was less about how we call the method and more about getting the model separately from the model manager. (Though the node is already an object that contains all the parameters, has defaults, ranges, validation, etc... so doing things directly in there would make sense).

With respect to making it easier to use Diffusers directly, I need to ask you and @keturn for advice on how this will be used. The objects currently returned by the model manager are StableDiffusionGeneratorPipeline objects, but this could be changed to any other DiffusionPipeline class fairly easily by having an optional parameter that selects the type. My question is whether these alternative pipelines are stanzas in the models.yaml file, or are we configuring them on the fly?

Mostly what I'd like to be able to achieve is the ease with which I was able to write this node:

https://gist.github.com/Kyle0654/57c337f7c005662b98a53f4e1ed7a960

There's a lot in there I'm aware isn't correct for how we do things. In particular, the from_pretrained seems to be what loads the model. If we had a way of creating these from our own model manager models though, then generation code could potentially look like this (though I'm aware a lot of stuff is happening under the hood here that is probably not correct - e.g. it extracts a depth map by getting a depth model itself on-demand, which isn't probably the behavior we want).

@lstein
Copy link
Collaborator Author

lstein commented Mar 9, 2023

The alternative is simply to keep the generation_parameters data object around and pass it directly to each of the InvokeAIGenerator initializers:
txt2img = Txt2Img(manager, generation_parameters)
Is this preferable?

From my perspective, yes, definitely: all of the information i need to understand what txt2img is going to contain is right there on that line of code, there's no state or side effects that have been hidden from me as the reader of the code.

Would it be preferable to reduce everything to one class method call as in:

model = manager.get_model('stable-diffusion-1.5')
params = InvokeAIGeneratorBasicParams(steps=50)
outputs = Txt2Img.generate(model, params=params, prompt="banana sushi")

params could be an optional keyword and default to the default initializer of InvokeAIGeneratorBasicParams, allowing for a one line invocation:

outputs = Txt2Img.generate(manager.get_model('stable-diffusion-1.5'), steps=50, prompt="banana sushi")

[edit]
@Kyle0654 This last pattern removes the dependency on the model manager. In theory you could do this:

my_pipeline = NewDiffusionPipelineOfSomeSort.from_pretrained(repo_id='foo/bar')
output = Txt2Img.generate(my_pipeline,...)

However, lots of things get done to the pipeline in the lower layers of the code, so this pseudocode would not work.

@damian0815
Copy link
Contributor

Would it be preferable to reduce everything to one class method call as in:

model = manager.get_model('stable-diffusion-1.5')
params = InvokeAIGeneratorBasicParams(steps=50)
outputs = Txt2Img.generate(model, params=params, prompt="banana sushi")

personally i prefer instance methods over class methods, but that's a preference not an "ought".

@lstein
Copy link
Collaborator Author

lstein commented Mar 9, 2023

So I'm hearing a need for the model manager to handle the GPU/CPU caching of arbitrary diffusers pipelines. I think this can be done with a method that looks something like this:

pipeline = manager.get_pipeline('stabilityai/stable-diffusion-2-depth')

Internally it would handle the from_pretrained() call, using the configured cache directory, precision and device type, and then use the existing caching system to manage offloading. It would use the repo_id to track the object in lieu of the "model name" in order to avoid having to configure a stanza in models.yaml.

See any problems with this?

@keturn
Copy link
Contributor

keturn commented Mar 10, 2023

So I'm hearing a need for the model manager to handle the GPU/CPU caching of arbitrary diffusers pipelines.

I'm super wary of statements like this because there's a reason why it took three months to do the diffusers integration instead of three days: there are still nearly a thousand lines of code in our diffusers_pipeline module — "bespoke" as hipsterusername would say.

You should not plan on using stock 🧨diffusers pipelines unless you're planning to do so without any InvokeAI features, like using masks with a non-inpainting model, or not using masks with an inpainting model, or cross-attention controls or thresholding or symmetry or attention maps or adjusting slice size based on available capacity.

Internally it would handle the from_pretrained() call, using the configured cache directory, precision and device type, and then use the existing caching system to manage offloading.

These aren't bad ideas, and may well be things we end up doing, but I'd call that beyond the scope of this phase.

@keturn
Copy link
Contributor

keturn commented Mar 10, 2023

Would it be preferable to reduce everything to one class method call as in:

outputs = Txt2Img.generate(model, params=params, prompt="banana sushi")

personally i prefer instance methods over class methods, but that's a preference not an "ought".

If you never need an instance of Txt2Img(), then this can simply be a txt2img function. I doubt we need Txt2Img to be a class at all.

@Kyle0654
Copy link
Contributor

You should not plan on using stock 🧨diffusers pipelines unless you're planning to do so without any InvokeAI features, like using masks with a non-inpainting model, or not using masks with an inpainting model, or cross-attention controls or thresholding or symmetry or attention maps or adjusting slice size based on available capacity.

Some of this might be okay for nodes that don't have exposed UI? And then if we want to expose UI we can always more fully integrate features?

Is there some other way to help contributors more rapidly integrate new features?

@keturn
Copy link
Contributor

keturn commented Mar 10, 2023

Is there some other way to help contributors more rapidly integrate new features?

As long we have a InvokeAIGeneratorBasicParams with parameters for a dozen orthogonal features -- and we do, because we have a Universal Canvas that tells artists they can do all these things to any part of the canvas at any time, instead of each operation being its own independent workflow -- there's no shortcut for doing integration work.

When we add support for some other type of diffusers pipeline, we can do the work to see what we need to do to support whatever additional inputs and parameters it needs. After we do a couple of those, hopefully we're coming up with ways to factor out composible pieces so we're not tying ourselves in knots.

I'm confident we'll figure out how to improve on this over time. But DiffusionPipelines as they're built today are not the answer to all our integration needs. Each is written with a very narrow usage in mind (usually just enough to look cool on its own in a notebook) and they're just not extensible enough.

@lstein
Copy link
Collaborator Author

lstein commented Mar 10, 2023

Hey folks, great discussion and I'm really grateful for the advice and guidance. I'm up against a two week vacation that is starting on Sunday, and I'm not sure how much time I'll have to work on this phase of the refactor. Can I get confirmation that this part of the refactor is not going to block other critical parts of the nodes integration, such as porting the WebUI to nodes?

Realistically before I fly off Sunday morning I think I'll be able to (1) replace the factory paradigm with a simpler direct calls to Txt2Img, Img2Img and Inpaint classes; and (2) wire nodes up to the postprocessing classes so they are not calling in to Generate. After that, further work will drop way off.

However if this is a blocker, then I'm happy to relinquish the task to anyone who would like to volunteer.

lstein added 2 commits March 10, 2023 19:33
Factory pattern is now removed. Typical usage of the InvokeAIGenerator is now:

```
from invokeai.backend.generator import (
    InvokeAIGeneratorBasicParams,
    Txt2Img,
    Img2Img,
    Inpaint,
)
    params = InvokeAIGeneratorBasicParams(
        model_name = 'stable-diffusion-1.5',
        steps = 30,
        scheduler = 'k_lms',
        cfg_scale = 8.0,
        height = 640,
        width = 640
        )
    print ('=== TXT2IMG TEST ===')
    txt2img = Txt2Img(manager, params)
    outputs = txt2img.generate(prompt='banana sushi', iterations=2)

    for i in outputs:
        print(f'image={output.image}, seed={output.seed}, model={output.params.model_name}, hash={output.model_hash}, steps={output.params.steps}')
```

The `params` argument is optional, so if you wish to accept default
parameters and selectively override them, just do this:

```
    outputs = Txt2Img(manager).generate(prompt='banana sushi',
                                        steps=50,
					scheduler='k_heun',
					model_name='stable-diffusion-2.1'
					)
```
@lstein
Copy link
Collaborator Author

lstein commented Mar 11, 2023

Factory pattern is now removed. Typical usage of the InvokeAIGenerator is now:

```
from invokeai.backend.generator import (
    InvokeAIGeneratorBasicParams,
    Txt2Img,
    Img2Img,
    Inpaint,
)
    params = InvokeAIGeneratorBasicParams(
        model_name = 'stable-diffusion-1.5',
        steps = 30,
        scheduler = 'k_lms',
        cfg_scale = 8.0,
        height = 640,
        width = 640
        )
    model = manager.get_model('stable-diffusion-1.5')
    print ('=== TXT2IMG TEST ===')
    txt2img = Txt2Img(model, params)
    outputs = txt2img.generate(prompt='banana sushi', iterations=2)

    for i in outputs:
        print(f'image={output.image}, seed={output.seed}, model={output.params.model_name}, hash={output.model_hash}, steps={output.params.steps}')
```

The params argument is optional, so if you wish to accept default parameters and selectively override them, just do this:

```
    outputs = Txt2Img(model).generate(prompt='banana sushi',
                                        steps=50,
                                        scheduler='k_heun',
                                        model_name='stable-diffusion-2.1'
                                        )

Copy link
Contributor

@Kyle0654 Kyle0654 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great start at a larger refactor. We can keep working on it on top of these changes. As long as these work correctly, I'd approve.

@lstein
Copy link
Collaborator Author

lstein commented Mar 11, 2023

I'm ready to declare victory on this phase of the refactor and turn it back over to @Kyle0654 . I got the safety checker working again, and actually improved it a bit by unloading it from GPU when not in use, thereby recovering about 0.6 GB of VRAM. The face restoration and upscaling routines have been moved into a new restoration service, and a bit of redundant code has been ripped out. I've fully tested the legacy WebUI and CLI to make sure they continue to work during the transition, and am happy to see that there are no longer any calls into the Generate class from within Nodes (or in fact anywhere else except for the Web and CLI front ends).

Please consider this ready for a review.

After this, I plan to do documentation and a bit more refactoring - in particularly I would like to refactor the two facial reconstruction routines and esrgan upscaling.

I will then rework the legacy CLI so that it no longer uses Generate. I think that cli_app.py with the pipe syntax is great, but making it mimic the "dream" syntax that the legacy CLI uses is not the way to go with it. I think it needs some convenience functions such as autocomplete support, and I'm happy to work on that if it seems reasonable.

@lstein lstein marked this pull request as ready for review March 11, 2023 22:29
@lstein lstein enabled auto-merge March 12, 2023 02:31
@lstein lstein merged commit 1aaad93 into main Mar 12, 2023
@lstein lstein deleted the refactor/nodes-on-generator branch March 12, 2023 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants