Skip to content

Commit c22d529

Browse files
Add node-based invocation system (#1650)
This PR adds the core of the node-based invocation system first discussed in https://github.com/invoke-ai/InvokeAI/discussions/597 and implements it through a basic CLI and API. This supersedes #1047, which was too far behind to rebase. ## Architecture ### Invocations The core of the new system is **invocations**, found in `/ldm/invoke/app/invocations`. These represent individual nodes of execution, each with inputs and outputs. Core invocations are already implemented (`txt2img`, `img2img`, `upscale`, `face_restore`) as well as a debug invocation (`show_image`). To implement a new invocation, all that is required is to add a new implementation in this folder (there is a markdown document describing the specifics, though it is slightly out-of-date). ### Sessions Invocations and links between them are maintained in a **session**. These can be queued for invocation (either the next ready node, or all nodes). Some notes: * Sessions may be added to at any time (including after invocation), but may not be modified. * Links are always added with a node, and are always links from existing nodes to the new node. These links can be relative "history" links, e.g. `-1` to link from a previously executed node, and can link either specific outputs, or can opportunistically link all matching outputs by name and type by using `*`. * There are no iteration/looping constructs. Most needs for this could be solved by either duplicating nodes or cloning sessions. This is open for discussion, but is a difficult problem to solve in a way that doesn't make the code even more complex/confusing (especially regarding node ids and history). ### Services These make up the core the invocation system, found in `/ldm/invoke/app/services`. One of the key design philosophies here is that most components should be replaceable when possible. For example, if someone wants to use cloud storage for their images, they should be able to replace the image storage service easily. The services are broken down as follows (several of these are intentionally implemented with an initial simple/naïve approach): * Invoker: Responsible for creating and executing **sessions** and managing services used to do so. * Session Manager: Manages session history. An on-disk implementation is provided, which stores sessions as json files on disk, and caches recently used sessions for quick access. * Image Storage: Stores images of multiple types. An on-disk implementation is provided, which stores images on disk and retains recently used images in an in-memory cache. * Invocation Queue: Used to queue invocations for execution. An in-memory implementation is provided. * Events: An event system, primarily used with socket.io to support future web UI integration. ## Apps Apps are available through the `/scripts/invoke-new.py` script (to-be integrated/renamed). ### CLI ``` python scripts/invoke-new.py ``` Implements a simple CLI. The CLI creates a single session, and automatically links all inputs to the previous node's output. Commands are automatically generated from all invocations, with command options being automatically generated from invocation inputs. Help is also available for the cli and for each command, and is very verbose. Additionally, the CLI supports command piping for single-line entry of multiple commands. Example: ``` > txt2img --prompt "a cat eating sushi" --steps 20 --seed 1234 | upscale | show_image ``` ### API ``` python scripts/invoke-new.py --api --host 0.0.0.0 ``` Implements an API using FastAPI with Socket.io support for signaling. API documentation is available at `http://localhost:9090/docs` or `http://localhost:9090/redoc`. This includes OpenAPI schema for all available invocations, session interaction APIs, and image APIs. Socket.io signals are per-session, and can be subscribed to by session id. These aren't currently auto-documented, though the code for event emission is centralized in `/ldm/invoke/app/services/events.py`. A very simple test html and script are available at `http://localhost:9090/static/test.html` This demonstrates creating a session from a graph, invoking it, and receiving signals from Socket.io. ## What's left? * There are a number of features not currently covered by invocations. I kept the set of invocations small during core development in order to simplify refactoring as I went. Now that the invocation code has stabilized, I'd love some help filling those out! * There's no image metadata generated. It would be fairly straightforward (and would make good sense) to serialize either a session and node reference into an image, or the entire node into the image. There are a lot of questions to answer around source images, linked images, etc. though. This history is all stored in the session as well, and with complex sessions, the metadata in an image may lose its value. This needs some further discussion. * We need a list of features (both current and future) that would be difficult to implement without looping constructs so we can have a good conversation around it. I'm really hoping we can avoid needing looping/iteration in the graph execution, since it'll necessitate separating an execution of a graph into its own concept/system, and will further complicate the system. * The API likely needs further filling out to support the UI. I think using the new API for the current UI is possible, and potentially interesting, since it could work like the new/demo CLI in a "single operation at a time" workflow. I don't know how compatible that will be with our UI goals though. It would be nice to support only a single API though. * Deeper separation of systems. I intentionally tried to not touch Generate or other systems too much, but a lot could be gained by breaking those apart. Even breaking apart Args into two pieces (command line arguments and the parser for the current CLI) would make it easier to maintain. This is probably in the future though.
2 parents 49ffb64 + cd98d88 commit c22d529

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4502
-0
lines changed

.coveragerc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[run]
2+
omit='.env/*'
3+
source='.'
4+
5+
[report]
6+
show_missing = true

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ htmlcov/
6868
.cache
6969
nosetests.xml
7070
coverage.xml
71+
cov.xml
7172
*.cover
7273
*.py,cover
7374
.hypothesis/

.pytest.ini

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[pytest]
2+
DJANGO_SETTINGS_MODULE = webtas.settings
3+
; python_files = tests.py test_*.py *_tests.py
4+
5+
addopts = --cov=. --cov-config=.coveragerc --cov-report xml:cov.xml

docs/contributing/ARCHITECTURE.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Invoke.AI Architecture
2+
3+
```mermaid
4+
flowchart TB
5+
6+
subgraph apps[Applications]
7+
webui[WebUI]
8+
cli[CLI]
9+
10+
subgraph webapi[Web API]
11+
api[HTTP API]
12+
sio[Socket.IO]
13+
end
14+
15+
end
16+
17+
subgraph invoke[Invoke]
18+
direction LR
19+
invoker
20+
services
21+
sessions
22+
invocations
23+
end
24+
25+
subgraph core[AI Core]
26+
Generate
27+
end
28+
29+
webui --> webapi
30+
webapi --> invoke
31+
cli --> invoke
32+
33+
invoker --> services & sessions
34+
invocations --> services
35+
sessions --> invocations
36+
37+
services --> core
38+
39+
%% Styles
40+
classDef sg fill:#5028C8,font-weight:bold,stroke-width:2,color:#fff,stroke:#14141A
41+
classDef default stroke-width:2px,stroke:#F6B314,color:#fff,fill:#14141A
42+
43+
class apps,webapi,invoke,core sg
44+
45+
```
46+
47+
## Applications
48+
49+
Applications are built on top of the invoke framework. They should construct `invoker` and then interact through it. They should avoid interacting directly with core code in order to support a variety of configurations.
50+
51+
### Web UI
52+
53+
The Web UI is built on top of an HTTP API built with [FastAPI](https://fastapi.tiangolo.com/) and [Socket.IO](https://socket.io/). The frontend code is found in `/frontend` and the backend code is found in `/ldm/invoke/app/api_app.py` and `/ldm/invoke/app/api/`. The code is further organized as such:
54+
55+
| Component | Description |
56+
| --- | --- |
57+
| api_app.py | Sets up the API app, annotates the OpenAPI spec with additional data, and runs the API |
58+
| dependencies | Creates all invoker services and the invoker, and provides them to the API |
59+
| events | An eventing system that could in the future be adapted to support horizontal scale-out |
60+
| sockets | The Socket.IO interface - handles listening to and emitting session events (events are defined in the events service module) |
61+
| routers | API definitions for different areas of API functionality |
62+
63+
### CLI
64+
65+
The CLI is built automatically from invocation metadata, and also supports invocation piping and auto-linking. Code is available in `/ldm/invoke/app/cli_app.py`.
66+
67+
## Invoke
68+
69+
The Invoke framework provides the interface to the underlying AI systems and is built with flexibility and extensibility in mind. There are four major concepts: invoker, sessions, invocations, and services.
70+
71+
### Invoker
72+
73+
The invoker (`/ldm/invoke/app/services/invoker.py`) is the primary interface through which applications interact with the framework. Its primary purpose is to create, manage, and invoke sessions. It also maintains two sets of services:
74+
- **invocation services**, which are used by invocations to interact with core functionality.
75+
- **invoker services**, which are used by the invoker to manage sessions and manage the invocation queue.
76+
77+
### Sessions
78+
79+
Invocations and links between them form a graph, which is maintained in a session. Sessions can be queued for invocation, which will execute their graph (either the next ready invocation, or all invocations). Sessions also maintain execution history for the graph (including storage of any outputs). An invocation may be added to a session at any time, and there is capability to add and entire graph at once, as well as to automatically link new invocations to previous invocations. Invocations can not be deleted or modified once added.
80+
81+
The session graph does not support looping. This is left as an application problem to prevent additional complexity in the graph.
82+
83+
### Invocations
84+
85+
Invocations represent individual units of execution, with inputs and outputs. All invocations are located in `/ldm/invoke/app/invocations`, and are all automatically discovered and made available in the applications. These are the primary way to expose new functionality in Invoke.AI, and the [implementation guide](INVOCATIONS.md) explains how to add new invocations.
86+
87+
### Services
88+
89+
Services provide invocations access AI Core functionality and other necessary functionality (e.g. image storage). These are available in `/ldm/invoke/app/services`. As a general rule, new services should provide an interface as an abstract base class, and may provide a lightweight local implementation by default in their module. The goal for all services should be to enable the usage of different implementations (e.g. using cloud storage for image storage), but should not load any module dependencies unless that implementation has been used (i.e. don't import anything that won't be used, especially if it's expensive to import).
90+
91+
## AI Core
92+
93+
The AI Core is represented by the rest of the code base (i.e. the code outside of `/ldm/invoke/app/`).

docs/contributing/INVOCATIONS.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Invocations
2+
3+
Invocations represent a single operation, its inputs, and its outputs. These operations and their outputs can be chained together to generate and modify images.
4+
5+
## Creating a new invocation
6+
7+
To create a new invocation, either find the appropriate module file in `/ldm/invoke/app/invocations` to add your invocation to, or create a new one in that folder. All invocations in that folder will be discovered and made available to the CLI and API automatically. Invocations make use of [typing](https://docs.python.org/3/library/typing.html) and [pydantic](https://pydantic-docs.helpmanual.io/) for validation and integration into the CLI and API.
8+
9+
An invocation looks like this:
10+
11+
```py
12+
class UpscaleInvocation(BaseInvocation):
13+
"""Upscales an image."""
14+
type: Literal['upscale'] = 'upscale'
15+
16+
# Inputs
17+
image: Union[ImageField,None] = Field(description="The input image")
18+
strength: float = Field(default=0.75, gt=0, le=1, description="The strength")
19+
level: Literal[2,4] = Field(default=2, description = "The upscale level")
20+
21+
def invoke(self, context: InvocationContext) -> ImageOutput:
22+
image = context.services.images.get(self.image.image_type, self.image.image_name)
23+
results = context.services.generate.upscale_and_reconstruct(
24+
image_list = [[image, 0]],
25+
upscale = (self.level, self.strength),
26+
strength = 0.0, # GFPGAN strength
27+
save_original = False,
28+
image_callback = None,
29+
)
30+
31+
# Results are image and seed, unwrap for now
32+
# TODO: can this return multiple results?
33+
image_type = ImageType.RESULT
34+
image_name = context.services.images.create_name(context.graph_execution_state_id, self.id)
35+
context.services.images.save(image_type, image_name, results[0][0])
36+
return ImageOutput(
37+
image = ImageField(image_type = image_type, image_name = image_name)
38+
)
39+
```
40+
41+
Each portion is important to implement correctly.
42+
43+
### Class definition and type
44+
```py
45+
class UpscaleInvocation(BaseInvocation):
46+
"""Upscales an image."""
47+
type: Literal['upscale'] = 'upscale'
48+
```
49+
All invocations must derive from `BaseInvocation`. They should have a docstring that declares what they do in a single, short line. They should also have a `type` with a type hint that's `Literal["command_name"]`, where `command_name` is what the user will type on the CLI or use in the API to create this invocation. The `command_name` must be unique. The `type` must be assigned to the value of the literal in the type hint.
50+
51+
### Inputs
52+
```py
53+
# Inputs
54+
image: Union[ImageField,None] = Field(description="The input image")
55+
strength: float = Field(default=0.75, gt=0, le=1, description="The strength")
56+
level: Literal[2,4] = Field(default=2, description="The upscale level")
57+
```
58+
Inputs consist of three parts: a name, a type hint, and a `Field` with default, description, and validation information. For example:
59+
| Part | Value | Description |
60+
| ---- | ----- | ----------- |
61+
| Name | `strength` | This field is referred to as `strength` |
62+
| Type Hint | `float` | This field must be of type `float` |
63+
| Field | `Field(default=0.75, gt=0, le=1, description="The strength")` | The default value is `0.75`, the value must be in the range (0,1], and help text will show "The strength" for this field. |
64+
65+
Notice that `image` has type `Union[ImageField,None]`. The `Union` allows this field to be parsed with `None` as a value, which enables linking to previous invocations. All fields should either provide a default value or allow `None` as a value, so that they can be overwritten with a linked output from another invocation.
66+
67+
The special type `ImageField` is also used here. All images are passed as `ImageField`, which protects them from pydantic validation errors (since images only ever come from links).
68+
69+
Finally, note that for all linking, the `type` of the linked fields must match. If the `name` also matches, then the field can be **automatically linked** to a previous invocation by name and matching.
70+
71+
### Invoke Function
72+
```py
73+
def invoke(self, context: InvocationContext) -> ImageOutput:
74+
image = context.services.images.get(self.image.image_type, self.image.image_name)
75+
results = context.services.generate.upscale_and_reconstruct(
76+
image_list = [[image, 0]],
77+
upscale = (self.level, self.strength),
78+
strength = 0.0, # GFPGAN strength
79+
save_original = False,
80+
image_callback = None,
81+
)
82+
83+
# Results are image and seed, unwrap for now
84+
image_type = ImageType.RESULT
85+
image_name = context.services.images.create_name(context.graph_execution_state_id, self.id)
86+
context.services.images.save(image_type, image_name, results[0][0])
87+
return ImageOutput(
88+
image = ImageField(image_type = image_type, image_name = image_name)
89+
)
90+
```
91+
The `invoke` function is the last portion of an invocation. It is provided an `InvocationContext` which contains services to perform work as well as a `session_id` for use as needed. It should return a class with output values that derives from `BaseInvocationOutput`.
92+
93+
Before being called, the invocation will have all of its fields set from defaults, inputs, and finally links (overriding in that order).
94+
95+
Assume that this invocation may be running simultaneously with other invocations, may be running on another machine, or in other interesting scenarios. If you need functionality, please provide it as a service in the `InvocationServices` class, and make sure it can be overridden.
96+
97+
### Outputs
98+
```py
99+
class ImageOutput(BaseInvocationOutput):
100+
"""Base class for invocations that output an image"""
101+
type: Literal['image'] = 'image'
102+
103+
image: ImageField = Field(default=None, description="The output image")
104+
```
105+
Output classes look like an invocation class without the invoke method. Prefer to use an existing output class if available, and prefer to name inputs the same as outputs when possible, to promote automatic invocation linking.

ldm/generate.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1030,6 +1030,8 @@ def upscale_and_reconstruct(
10301030
image_callback=None,
10311031
prefix=None,
10321032
):
1033+
1034+
results = []
10331035
for r in image_list:
10341036
image, seed = r
10351037
try:
@@ -1083,6 +1085,10 @@ def upscale_and_reconstruct(
10831085
else:
10841086
r[0] = image
10851087

1088+
results.append([image, seed])
1089+
1090+
return results
1091+
10861092
def apply_textmask(
10871093
self, image_path: str, prompt: str, callback, threshold: float = 0.5
10881094
):

ldm/invoke/app/api/dependencies.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Copyright (c) 2022 Kyle Schouviller (https://github.com/kyle0654)
2+
3+
from argparse import Namespace
4+
import os
5+
6+
from ..services.processor import DefaultInvocationProcessor
7+
8+
from ..services.graph import GraphExecutionState
9+
from ..services.sqlite import SqliteItemStorage
10+
11+
from ...globals import Globals
12+
13+
from ..services.image_storage import DiskImageStorage
14+
from ..services.invocation_queue import MemoryInvocationQueue
15+
from ..services.invocation_services import InvocationServices
16+
from ..services.invoker import Invoker
17+
from ..services.generate_initializer import get_generate
18+
from .events import FastAPIEventService
19+
20+
21+
# TODO: is there a better way to achieve this?
22+
def check_internet()->bool:
23+
'''
24+
Return true if the internet is reachable.
25+
It does this by pinging huggingface.co.
26+
'''
27+
import urllib.request
28+
host = 'http://huggingface.co'
29+
try:
30+
urllib.request.urlopen(host,timeout=1)
31+
return True
32+
except:
33+
return False
34+
35+
36+
class ApiDependencies:
37+
"""Contains and initializes all dependencies for the API"""
38+
invoker: Invoker = None
39+
40+
@staticmethod
41+
def initialize(
42+
args,
43+
config,
44+
event_handler_id: int
45+
):
46+
Globals.try_patchmatch = args.patchmatch
47+
Globals.always_use_cpu = args.always_use_cpu
48+
Globals.internet_available = args.internet_available and check_internet()
49+
Globals.disable_xformers = not args.xformers
50+
Globals.ckpt_convert = args.ckpt_convert
51+
52+
# TODO: Use a logger
53+
print(f'>> Internet connectivity is {Globals.internet_available}')
54+
55+
generate = get_generate(args, config)
56+
57+
events = FastAPIEventService(event_handler_id)
58+
59+
output_folder = os.path.abspath(os.path.join(os.path.dirname(__file__), '../../../../outputs'))
60+
61+
images = DiskImageStorage(output_folder)
62+
63+
# TODO: build a file/path manager?
64+
db_location = os.path.join(output_folder, 'invokeai.db')
65+
66+
services = InvocationServices(
67+
generate = generate,
68+
events = events,
69+
images = images,
70+
queue = MemoryInvocationQueue(),
71+
graph_execution_manager = SqliteItemStorage[GraphExecutionState](filename = db_location, table_name = 'graph_executions'),
72+
processor = DefaultInvocationProcessor()
73+
)
74+
75+
ApiDependencies.invoker = Invoker(services)
76+
77+
@staticmethod
78+
def shutdown():
79+
if ApiDependencies.invoker:
80+
ApiDependencies.invoker.stop()

ldm/invoke/app/api/events.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Copyright (c) 2022 Kyle Schouviller (https://github.com/kyle0654)
2+
3+
import asyncio
4+
from queue import Empty, Queue
5+
from typing import Any
6+
from fastapi_events.dispatcher import dispatch
7+
from ..services.events import EventServiceBase
8+
import threading
9+
10+
class FastAPIEventService(EventServiceBase):
11+
event_handler_id: int
12+
__queue: Queue
13+
__stop_event: threading.Event
14+
15+
def __init__(self, event_handler_id: int) -> None:
16+
self.event_handler_id = event_handler_id
17+
self.__queue = Queue()
18+
self.__stop_event = threading.Event()
19+
asyncio.create_task(self.__dispatch_from_queue(stop_event = self.__stop_event))
20+
21+
super().__init__()
22+
23+
24+
def stop(self, *args, **kwargs):
25+
self.__stop_event.set()
26+
self.__queue.put(None)
27+
28+
29+
def dispatch(self, event_name: str, payload: Any) -> None:
30+
self.__queue.put(dict(
31+
event_name = event_name,
32+
payload = payload
33+
))
34+
35+
36+
async def __dispatch_from_queue(self, stop_event: threading.Event):
37+
"""Get events on from the queue and dispatch them, from the correct thread"""
38+
while not stop_event.is_set():
39+
try:
40+
event = self.__queue.get(block = False)
41+
if not event: # Probably stopping
42+
continue
43+
44+
dispatch(
45+
event.get('event_name'),
46+
payload = event.get('payload'),
47+
middleware_id = self.event_handler_id)
48+
49+
except Empty:
50+
await asyncio.sleep(0.001)
51+
pass
52+
53+
except asyncio.CancelledError as e:
54+
raise e # Raise a proper error

0 commit comments

Comments
 (0)