Skip to content

Dockerfile and automatic generation of images via GitHub Actions #1469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

oxc
Copy link

@oxc oxc commented Sep 9, 2023

TL;DR

Building on the great work of ZacharyACoon, I'm providing a Dockerfile that is a bit less complex and more aligned with common best practices, and makes builds more cachable.
Additionally, this PR contains a GitHub Actions workflow that builds the image in five different PyTorch flavours (cu118, cu121, rocm5.6, rocm5.7-nightly, cpu), and pushes them to GitHub Container Repository (works out of the box) and optionally to Docker Hub (requires a username + token configured in action secrets).

You can see the action in action (no pun intended) in my fork: https://github.com/oxc/ComfyUI/actions/workflows/docker.yml


Dockerfile

The first commit adds the Dockerfile. Since you mentioned that you are not too familiar with those, and are unsure about maintaining it, I will go through the commands in the file and give some additional explanations. Hopefully that makes the decision easier.

The Dockerfile is used by docker to build the docker image. It basically is the recipe used to cook a final image. The image itself can then be uploaded to a remote registry, and from there everybody can pull it to their local registry to simply run it in a container.

The docker-compose.yml file can make it easier to run the docker image using docker compose, but is not strictly a requirement.

Expand here to see the commented Dockerfile
# syntax=docker/dockerfile:1.4

Specifying this syntax directive allows us to use certain features like RUN --mount=type=cache.

ARG BASE_IMAGE="python:3.11-slim-bookworm"

ARG at the top of the file defines build-args that can be set when building the docker image and used in FROM statements.

FROM ${BASE_IMAGE}

FROM specifies the base image that we are building our image on. This Dockerfile uses a python image based on Debian by default, selecting a python version that is supported by PyTorch.
I personally have not seen making the base image configurable as a build-arg, but it does not really hurt, so I decided to keep it. It might be useful for people who are building their own images, and just want to use a different base image that provides a few more installed tools, for example.

ARG PYTORCH_INSTALL_ARGS=""
ARG EXTRA_ARGS=""
ARG USERNAME="comfyui"
ARG USER_UID=1000
ARG USER_GID=${USER_UID}

ARG after a FROM define that this build stage uses the mentioned args. These args also include default values, which can be overridden on the command line.

RUN \
	--mount=target=/var/lib/apt/lists,type=cache,sharing=locked \
	--mount=target=/var/cache/apt,type=cache,sharing=locked \
	set -eux; \
		apt-get update; \
		apt-get install -y --no-install-recommends \
			git \
			git-lfs

RUN set -eux; \
	groupadd --gid ${USER_GID} ${USERNAME}; \
	useradd --uid ${USER_UID} --gid ${USER_GID} -m ${USERNAME}

RUN executes the given shell command. These commands install git in the container, and create the user/group which will be running comfyui.

# run instructions as user
USER ${USER_UID}:${USER_GID}

USER defines the user/group as which to run the given commands.

WORKDIR /app

WORKDIR defines that all following commands will run in the context of the given directory in the container. You can think of it a bit like cd.

ENV PIP_CACHE_DIR="/cache/pip"
ENV VIRTUAL_ENV=/app/venv
ENV TRANSFORMERS_CACHE="/app/.cache/transformers"

ENV defines environment variables that we will be using later in the file.

# create cache directory
RUN mkdir -p ${TRANSFORMERS_CACHE}

# create virtual environment to manage packages
RUN python -m venv ${VIRTUAL_ENV}

Create the transformers cache directory that we will be setting later as env.
Run python to create a virtual environment with the venv module.

# run python from venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"

Set the PATH so that pip and python will be run in the context of our venv.

COPY --chown=${USER_UID}:${USER_GID} requirements-${GPU_MAKE}.txt .
RUN --mount=type=cache,target=/cache/,uid=${USER_UID},gid=${USER_GID} \
	pip install torch torchvision torchaudio ${PYTORCH_INSTALL_ARGS}

Now install the PyTorch dependencies, passing the PYTORCH_INSTALL_ARGS so that users can select which PyTorch index/platform they want to use. This will be passed by the GitHub Action.

# copy requirements files first so packages can be cached separately
COPY --chown=${USER_UID}:${USER_GID} requirements.txt .
RUN --mount=type=cache,target=/cache/,uid=${USER_UID},gid=${USER_GID} \
	pip install -r requirements.txt

Now we COPY the requirements.txt to the container, and run pip install on it.
We are using RUN --mount=type=cache to mount a cache directory for our pip cache that is managed by the docker runtime and shared between builds.

Why are we only copying the requirements file? Here you need to know that docker tries to cache all commands when building an image, but as soon as one command fails to use the cached version, all further commands have to be re-run also. For this reason, it helps to do the more stable stuff earlier in the image, and the more frequently changing things later.
Since copying the program code of ComfyUI will change with every commit, copying it here would invalidate all caches for requirements etc. We need the requirements file however, so we copy only that. As long as it does not change, the result pip install can be cached. One can always run the build command with --no-cache to force a re-run.

You can read more about the cache here: https://docs.docker.com/build/cache/

COPY --chown=${USER_UID}:${USER_GID} . .

Now, as a last step, we copy the whole program code into the container image.

# default environment variables
ENV COMFYUI_ADDRESS=0.0.0.0
ENV COMFYUI_PORT=8188
ENV COMFYUI_EXTRA_BUILD_ARGS="${EXTRA_ARGS}"
ENV COMFYUI_EXTRA_ARGS=""
# default start command
CMD python -u main.py --listen ${COMFYUI_ADDRESS} --port ${COMFYUI_PORT} ${COMFYUI_EXTRA_BUILD_ARGS} ${COMFYUI_EXTRA_ARGS}

Finally, we define some more environment variables that can be overridden, and define the CMD that will be run when the container is launched.

GitHub Action

The GitHub action spawns several jobs in a matrix, one for each supported PyTorch flavor (cu118, cu121, rocm5.6, rocm5.7-nightly, cpu).

For each flavor, it builds the container image, and pushes it to the GitHub Container Registry. This works out of the box with the information provided by GitHub Actions.
If a DOCKERHUB_USERNAME and a DOCKERHUB_TOKEN secret are configured on the GitHub repository, the action will also login and push to the Docker Hub. Here is the image the action on my fork created: https://hub.docker.com/r/obeliks/comfyui

The workflow uses several actions provided by Docker, one for generating image metadata from the gIthub context, one for logging in, and one for building and pushing. Ultimately, the workflow has been pieced together from the following examples (and adjusted accordingly):

I have opted not to add extra complexity to persist the pip cache in GitHub Actions cache. In most cases, the persisted layer cache will suffice. Take a look at this run that uses cached layers, compared to this one that creates most of them from scratch. The downloading of those large layers that contain pytorch wheels still takes some time, but is much faster then doing the full build.

Note that it still makes sense to have those pip caches in a cache mount, because it is useful for local building, and should prevent the cache from ending up in the image without having to erase it explicitly.

@oxc oxc requested a review from comfyanonymous as a code owner September 9, 2023 21:16
@oxc
Copy link
Author

oxc commented Sep 18, 2023

AFAIK there is no auto-detection for CPU and currently the --cpu option has to be set explicitly.

You're right, if I try to start the cpu image, I get

AssertionError: Torch not compiled with CUDA enabled

I guess --cpu should be set by default in the cpu image.

@oxc
Copy link
Author

oxc commented Sep 18, 2023

Alternatively, the documentation could mention that you have to run the cpu version with --cpu in COMFYUI_EXTRA_ARGS. That seems a bit silly, but is running ComfyUI on CPU only a thing anyway?

@oxc
Copy link
Author

oxc commented Sep 18, 2023

The cpu image now passes the --cpu flag as arg.

I've introduced a separate COMFYUI_EXTRA_BUILD_ARGS env that gets initialized from the EXTRA_ARGS build arg, which gets set to --cpu in the GitHub Action matrix.

@oxc oxc force-pushed the docker branch 3 times, most recently from 7e5777d to 006a9d8 Compare September 18, 2023 20:40
@oxc
Copy link
Author

oxc commented Sep 18, 2023

For completeness' sake, I've added all image variants and the information about the EXTRA_ARGS=--cpu build arg to the README.

@oxc oxc force-pushed the docker branch 2 times, most recently from fad7f91 to 19422a4 Compare September 21, 2023 22:37
@oxc
Copy link
Author

oxc commented Sep 21, 2023

I've slightly modified the Dockerfile to install git and git-lfs, so that ComfyUI Manager works out of the box. I've updated the collapsed documentation in the original post.

@oxc
Copy link
Author

oxc commented Oct 13, 2023

Updates:

I've update the requirements files to the latest versions (and created a separate PR for it).

I've changed the base image to 3.11-slim-bookworm to make sure that Python 3.11 is used (not 3.12).

@oxc
Copy link
Author

oxc commented Oct 20, 2023

Update: I've removed the requirement-*.txt files because I also wanted to build a cu118 version, and since xformers is gone (for now), it just seems easier and more flexible not to have to commit everything to a file.

It also might make this PR more acceptable to be merged.

I've updated the description in the first post accordingly.

@julien-blanchon
Copy link

What is blocking this PR ?

@julien-blanchon
Copy link

julien-blanchon commented Nov 6, 2023

Hey, I'm using your docker and sometime with custom node I get this error

comfyui-comfyui-1  |     import cv2
comfyui-comfyui-1  | ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Seems that apt-get update && apt-get install libgl1 in the Dockerfile solve this issue with minimal package overhead.
Maybe you should include this in directly in the Dockerfile.

@oxc
Copy link
Author

oxc commented Nov 6, 2023

Hey, I'm using your docker and sometime with custom node I get this error

Thanks for reaching out. Can you tell me which custom node this is? I'll try to figure out which packages are used by commonly used custom nodes...

@julien-blanchon
Copy link

I don't know exactly which one but this occurs with https://github.com/ltdrdata/ComfyUI-Impact-Pack with mmdet_skip = True

@oxc
Copy link
Author

oxc commented Nov 6, 2023

It looks like this might be solved by installing opencv-python-headless (with pip) instead of the GUI version. Could you try this?
I'm wondering whether and how/where comfyui manager installs dependencies, and if that survives container restarts.

@julien-blanchon
Copy link

I will not be able to try before tomorrow, I will let you know

@jvnte
Copy link

jvnte commented Dec 18, 2023

When will this be merged? :-)

@oxc
Copy link
Author

oxc commented Feb 16, 2024

You can now mount a folder to /app/custom_venv and the image will copy its venv there and launch from there. This will make installed custom_node dependencies persist across restarts.

@oxc oxc force-pushed the docker branch 2 times, most recently from 31fb6d8 to de3c90e Compare March 10, 2024 19:10
@polarathene
Copy link

@ravensorb not likely, see #1469 (comment)

@rizlas
Copy link

rizlas commented Jan 13, 2025

@oxc i think that is better to migrate this PR in a dedicated repo. Many github organisations separate the software it self from the dockerization. What do you think?

@polarathene
Copy link

polarathene commented Jan 13, 2025

@rizlas the entire point of this PR is to have an image that's officially endorsed by ComfyUI maintainers. If they want a separate repo for this they'll request that, but in my experience many projects keep this support in the same project repo.

To have an unofficial image at @oxc own repo defeats the intent of the PR. There's a variety of unoffical images already published, but we want ComfyUI to either endorse one, or better yet accept one to maintain and publish officially as that would be more trustworthy for users along with centralizing contributions/maintenance on the image.

@rizlas
Copy link

rizlas commented Jan 13, 2025

Yeah but, it's been one year since this pr was opened. One of the many pr aiming to address docker support. I think that they will not merge any docker related pr without also having on board a full time maintainer of it.

So imho better have single ref point, even as unofficial (but in the end official), than having a years long pr.

My2c

@oxc
Copy link
Author

oxc commented Jan 13, 2025

Feel free to use the images published to my fork of the repo (without any guarantees). However, as @polarathene has perfectly explained, that is not the intent of this PR.

In any case, any such decisions would require input from one of the maintainers, otherwise it's just guess work.

@polarathene
Copy link

polarathene commented Jan 13, 2025

Yeah but, it's been one year since this pr was opened. One of the many pr aiming to address docker support. I think that they will not merge any docker related pr without also having on board a full time maintainer of it.

This PR is the only remaining active one AFAIK. It's author, @oxc is very responsive to feedback and keeping the PR relevant in hopes it'll eventually get considered. In the meantime, interested users will engage in review / discussion here and provide feedback from their own usage which should help towards instilling more confidence to accept this PR.

Last I recall in past PRs for Docker, at the time I think the only maintainer was the original author of ComfyUI and they were very much against accepting a PR for Docker, since they were not confident in reviewing contributions or maintaining it.

Best thing you can do right now is help demonstrate community demand for this with a 👍 reaction to the PR description at the top (we only have 8 atm), and wait until a maintainer chimes in. I know that it's a long wait, but the maintainers will prioritize what they're comfortable with, thus until one of them wants an official Docker image the only other option is sufficient demand from the community to justify it (just the 👍 reactions, not +1 comments which do no good).

Alternatively if someone engages with the project long enough with contributions to gain maintainer trust and become an official maintainer themselves, that might be "faster", but is a lot of work 😅

You may see a decision to use a separate repo in the org and assign someone like @oxc as a maintainer for that. They'd be granted sufficient permissions to manage that repo, and don't need to have the ability to manage any secrets. That repo could enforce PRs (protected main branch) and still require approval to merge from the core maintainers if that extra precaution were necessary.

Depends what their comfortable with, but for now we need to demonstrate there is demand for it and that what the PR proposes can be trusted (which community engagement is assisting with).

Most importantly, try not to ping the maintainers unless there is good reason to. They'll be swamped in notifications and other tasks. Wanting a PR looked at is fair, but if they're bothered by it too frequently it'll get the opposite effect you want.


So imho better have single ref point, even as unofficial (but in the end official), than having a years long pr.

I understand what you're saying but until a project maintainer adds to the project README / docs a section of endorsed images to direct users, you already have to go find some unofficial published image and place trust that the publisher is not malicious (or review and build manually).

What you're proposing still requires discovery/awareness of the third-party maintained image.

with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A token doesn't need to be created manually, if you provide the permissions as part of the GitHub action, see also: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/controlling-permissions-for-github_token

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the token provided by GitHub actions, it's not created manually.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but the write:packages scope is missing in the permissions for the workflow, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good god, you are fast. You are not looking for a job by any chance? 🤣

@codingjoe
Copy link

@huchenlei excuse the ping. 🫣 A lot of good effort has been put into this patch, but it needs some input to progress. I'd love to eventually see an official image, as do many others. Thanks 🍀

@polarathene
Copy link

A lot of good effort has been put into this patch, but it needs some input to progress

I am chipping away at a review with extensive feedback. It's gotten quite verbose however with enough change suggestions that it may be easier for me to take that and create yet another PR attempt 😂

I'm also curious about alternative tooling vs pip, but I won't touch on that with the review. I assume some users may expect to use pip based on following some guides/instructions (similar to why the project itself still relies on requirements.txt?)


Question - Has the ComfyUI Manager integration changed?

While I work away at the review, as I haven't used ComfyUI manager myself yet... can anybody chime in with the current situation since this PR was created?

There's now an official CLI package for ComfyUI that installs ComfyUI and can install the manager, along with an official registry for manager packages to install vs the less trustworthy approach prior (IIRC the official registry was in response to some exploit/malware event from a bad actor abusing users trust of the manager?).

I'm curious if that changes anything. I haven't yet explored the CLI install method, but am curious how dependent that is upon a full git clone, which IIRC was decided for ComfyUI Manager to handle updates or something? (still working through past review discussions)

I am aware that ComfyUI Manager has some options at least for just leveraging custom_nodes with support to restore the packages. I don't know if there's any other popular customizations outside of ComfyUI Manager that should be expected to support? (if you're using the image @oxc publishes or similar, please share any other use-cases)


Concern - Custom Venv with third-party package additions vs Python/PyTorch upgrades?

One other concern that my review also raises is with the venv sync. When a user upgrades to a new image release, that may contain a change in Python version or PyTorch / Cuda. The rsync operation during container startup will copy that over to a custom venv, but will that risk breaking that custom venv?

Apart from size bloat with no longer used PyTorch packages (which appear to be around 1.5GB in size for the CPU variant), anything installed by the user via ComfyUI Manager or similar to that custom venv would presumably be at risk of breakage?

I am not much of a Python user or dev, but if I'm not mistaken, the venv has packages stored in Python version specific directories, so upgrading to newer Python presumably makes those packages no longer used by the venv? (that or the venv configures for the old Python version that'll no longer be available on the system? Although the rsync would correct that AFAIK, site-packages dir aside)

We could consider having the Python runtime installed/pinned separately in a way that it'd be persisted as part of the users volume, rather than relying on a base image to provide it. That won't really help with the PyTorch change however.

If anyone is familiar with the situation with ComfyUI Manager (or anything else a user might setup afterwards), is that keeping track of deps/packages somewhere? Surely it's something those would do on a host to upgrade Python or PyTorch with an existing install on a host instead of a container...? Is there a modified list like a package.json or cargo.toml equivalent that pip or uv could reinstall/sync?

Image design when mutability of dependencies is desired

Or is it more realistic to just disregard Python/PyTorch upgrades, by keeping that as an initial install at container run time when a volume is first mounted and empty?

In that sense the image would be rather lightweight itself, only packaging the ComfyUI repo + convenience scripts. Afterwards image upgrades would get a newer version of ComfyUI, but you'd otherwise have everything else managed externally at your volume mount 🤔

Perhaps a bit slower for the first time run of the container vs time downloading a much larger image?

  • For the user, storage wise that's a bit of a win and image upgrades are much lighter/faster, without the anticipated breakage from rsync when switching between releases (if troubleshooting/rolling back). Even to switch between CPU/GPU variants, you'd need separate volumes presumably? (can't just rely on the image alone due to the venv state sync, should you use that)
  • For maintainers, CI cache usage would be considerably lighter too. The image registry would only have the single variant (the user could just use an ENV to handle the cpu/cuda/rocm at runtime instead).
  • Especially with the CLI and potential for permutations to support, it may make more sense to defer the bulky PyTorch deps to the users system.
  • That said, without bundling this weight into the image itself, the container would only be practical with an explicit volume mounted. Perhaps some users subscribed here are interested in a container without an explicit volume (easily resolved by extending the image with VOLUME instruction, but I'm not a fan of that being part of a default image). That may not be enough for some, should they be regularly deploying a container to a fresh host (I've heard of some services that run "serverless" by pulling an image and starting a container)... Container startup would be a problem there, but nothing a custom image couldn't resolve 🤷‍♂️

Copy link

@kayakyakr kayakyakr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used your build on the fork to successfully install and run comfyUI in docker. Went pretty smooth, all things considered.

I have been linking volumes to subfolders within the app folder individually. I think that's the most difficult thing to figure out which folders are necessary. I wonder if providing a volume for /app on the whole would be detrimental to the deploy.

e: And since I just published this after starting it days ago, I'll edit with some agreements to the previous post -

Or is it more realistic to just disregard Python/PyTorch upgrades, by keeping that as an initial install at container run time when a volume is first mounted and empty?

In that sense the image would be rather lightweight itself, only packaging the ComfyUI repo + convenience scripts. Afterwards image upgrades would get a newer version of ComfyUI, but you'd otherwise have everything else managed externally at your volume mount 🤔

I like this idea as it creates a more flexible base image. The one thing I would caution: the installation of pytorch can be fraught with complications, so the startup install script would need to be good. I have been running a number of torch/cuda containers, and the only ones that have worked easily have been those pre-built with cuda 12.4 and torch.

For maintainers, CI cache usage would be considerably lighter too. The image registry would only have the single variant (the user could just use an ENV to handle the cpu/cuda/rocm at runtime instead).

possibly worth creating a base image that does runtime install, and a few common version combos? 1 rcom, 2 cuda, and then the base image with just cpu?

That said, without bundling this weight into the image itself, the container would only be practical with an explicit volume mounted.

I have seen images with required volumes and I think that's a fair requirement. This one actually needs a number of required volumes as, at the minimum, you need a volume for models if you want it to do anything of value in the tool. That was why my question above: would it be possible to just add a volume for /app and get all the benefits?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the compose example.

# For CPU only
docker pull ghcr.io/comfyanonymous/comfyui:latest-cpu
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a short breakdown of recommended volumes, any key environment variables, or just a quick-start docker run command here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have it in the compose file

@polarathene
Copy link

I have been linking volumes to subfolders within the app folder individually. I think that's the most difficult thing to figure out which folders are necessary. I wonder if providing a volume for /app on the whole would be detrimental to the deploy.

My WIP review addresses this. ComfyUI has a --base-directory arg now which lets you shift all those state dirs to another location which greatly simplifies this concern. There are some custom nodes however that apparently aren't compatible and need an update to correctly resolve that path.

There's also a YAML extra paths file (see the comfyui repo) that can be used to further customize dir organization if needed, and I think that has an arg as well.


I like this idea as it creates a more flexible base image. The one thing I would caution: the installation of pytorch can be fraught with complications, so the startup install script would need to be good.

I'm still not that experienced with building PyTorch containers, any additional context on complications would be good.

I know that you want to install PyTorch upfront to what is compatible with your GPU, otherwise using the CPU wheel. Since the packages bundle everything there though, I don't know about any other concerns beyond getting the index right and installing that prior to the actual project deps.

uv seems to have improved tooling on that part too which is good as that was an area it struggled vs pip.

Unrelated would be additional system packages, I've seen some images bring in stuff like ffmpeg for video models support, and there's also that opencv python package for the cv2 module already discussed earlier in this PR.

One image I saw was installing the persisted venv with the python version and pytorch variant in the venv dir name, so that it wouldn't overwrite a pre-existing one should the image have bumped those up in it's scripts.


I have been running a number of torch/cuda containers, and the only ones that have worked easily have been those pre-built with cuda 12.4 and torch.

That's something I've also looked into, the only concern here with CUDA 12.4 I think was it was the last one supported by an older Nvidia driver series (550?), not sure if that's still the case, but it's probably the main limitation to keep in mind with compatibility there. I have a link in my review that helps track that information 👍


possibly worth creating a base image that does runtime install, and a few common version combos? 1 rcom, 2 cuda, and then the base image with just cpu?

Yeah, that's basically just shifting some RUN instructions to an entrypoint script. I'll give it some thought after my review on the current approach is done :)


That was why my question above: would it be possible to just add a volume for /app and get all the benefits?

One image I saw clones the repo at container startup too, it is basically a Dockerized installer to a volume. Their scripts for it though was a bit of a mess that didn't look fun to maintain at all.

If you'd like to beat me to it, you should be able to just run a Fedora or Ubuntu container for example, install git and Python / pip and then use pip to install the official Comfy CLI which can install both ComfyUI and ComfyUI Manager. Use the --base-directory arg to point to your mounted volume and then the only remaining thing would be your venv (which you could probably create at the same volume at a different directory).

I don't know how common / comfortable users are with rootless containers, but if you run those (quite easy with Podman) then your ownership woes with the volume mount are taken care of for you regardless of what UID your have on the host. The rootful container switch to a non-root user can be supported too, it's just more work and has some potential for bugs (not that I'd really want to trust random custom nodes using a root user in a rootful container, it should be secure for the most part these days but I'd still feel iffy about that risk).


Regarding your review feedback on docs, AFAIK once you use --base-directory there should only be two folders you need to persist, and that could be done in the same volume if you like. There may be cache or other content written in a home dir within the container, but that should be less critical to persist.

ENV wise I don't think the image itself needs much there for a user to think about. Possibly PUID/PGID if allowing for custom non-root user switching at run time instead. AMD GPUs have extra ENV sometimes to support ROCm with PyTorch, but that's nothing specific to the container itself.

@ravensorb
Copy link

@polarathene - I took that kind of approach in the container that I recently build. You can take a look here if interested:

https://github.com/ravensorb/ComfyUI-docker

@polarathene
Copy link

@ravensorb thanks for the reference. That's a bit tidier than some I've seen elsewhere, but for an official upstream PR here that'd be far too much to get approval.

The community can always extend the official image, but without getting that first step it'll be spread out with various third-party attempts 😓


I'd love more context on the additional apt deps you've installed if you know what those were needed for to support.

I do have some feedback though from a quick glance.

  • Your FROM base images are not in sync, yet you're building software to copy over (ffmpeg) with dynamic linking involved? That's risky and prone to breakage. Perhaps a base image with a package repo that has prebuilt ffmpeg with the nvidia support is possible?
  • You should ideally avoid VOLUME where possible, it's far better to be explicit there. See my earlier reply on this topic in this PR for more context.
  • Do you really need the passwordless sudo workaround? That kind of defeats the whole purpose of non-root user for security. Are you relying on this for ownership at the host volume path to match your host user? There are better ways to support that (rootless containers in particular). When I complete my review, you'll see another proper way to support this with rootful containers.
  • Your Dockerfile ends with CMD to a entrypoint script. That should be ENTRYPOINT, no need for bash, the entrypoint script has a shebang for the interpreter to use, so it just needs to be executable.

username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uses: docker/build-push-action@v5
uses: docker/build-push-action@v6

@RedsAnalysis
Copy link

Hey I have gone through the whole PR and it has been very help full I have created a fort of Comfyui to add docker support on the official page but currently it only supports Nvidia GPU, since I don't have access to a amd card I can't really test my code for amd. Please take a look at my fork and let me know what you think and any problems that my fork has or might face. Thank you
https://github.com/RedsAnalysis/ComfyUI

@kayakyakr
Copy link

Found a slight issue with using this as an image. I think that it's related to the pip cache not being a volume? I installed a few modules, but they broke after a power failure. I don't know if I had ever restarted the container after bringing it up initially. Had to exec into the image and manually reinstall missing dependencies.

Honestly haven't done much debugging of it, so can't say for certain, but it might be something that must be a volume to work properly rather than an optional volume. Someone with more knowledge of this system may be able to tell me where I've gone wrong. Didn't think about this PR at the time, so didn't keep the logs as I brought the server back up.

@sequencerr
Copy link

sequencerr commented Mar 16, 2025

@kayakyakr cache is not installed in image, because purpose of the cache is to speed up following installs, while you don't plan to install anything for complete build

you better send an environment (which packages) an the error

oxc added 2 commits March 17, 2025 21:01
Based on ComfyUI PR comfyanonymous#530

Co-Authored-By: ZacharyACoon <show>
Add a GitHub Action that builds a docker image on every push and tag,
and publishes it to GitHub container repository, and -- if a docker
username is defined in the repository secrets -- to the Docker Hub.
Copy link

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got side-tracked while writing up this rather lengthy review.

I didn't quite finish it as I wanted to do some extra investigation on my end and summarize all of this for you. To avoid losing all this information thus far I'm submitting the review in it's current state as I think it's only persisted so long through the browser cache 😓

I don't know if I'll have time until perhaps later in the month to respond to much engagement (if any) on the review. Once I've cleared away some existing work and can refresh over this review again I'll be happy to continue, or spin up a separate PR with my feedback applied.

Hope it's somewhat helpful should anyone read through it 😎

ARG PYTORCH_INSTALL_ARGS=""
ARG EXTRA_ARGS=""
ARG USERNAME=comfyui
ARG USER_UID=1000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default used by official DB images (redis,postgres,mongo,mysql,etc) is 999 for both UID and GID.

I understand the intention here is for rootful containers to default to a more convenient ownership and/or UID of processes that will run in the container so that they're write to storage with that ownership...

However, should the user install some third-party code, or some other vulnerability be exploited to enable a container escape, this would be similar to escaping as root (assuming the convenience is on user facing systems rather than servers, where the environment is often configured to allow running docker CLI commands without requiring credentials).

As such I'd discourage this default choice. You also shouldn't need to depend on it for building an image.


The official Docker DB images tend to have an entrypoint script with this logic to correct ownership mismatch as the container root user to the non-root user that it switches to internally. That only runs when the container is provided with a --user (Docker CLI) / user: (compose.yaml) option to override the container user.

That referenced entrypoint script then sets a umask since for databases, it's wiser to restrict access and this will ensure any new files or folders created at runtime are not accessible to group/other users, only the file owner.

Finally the entrypoint uses exec to execute a command and replace the current shell process with that (which ensures signals are properly forwarded). There's some other parts in the script to support custom command / CMD from the user when it's only options, and an ENV they have for alternatively providing those too (like the you're already doing in the current Dockerfile iteration).


Introducing an ENTRYPOINT script was suggested in my previous review feedback, with reluctance expressed regarding it adding friction to the review process.

Not handling this sort of thing properly though may result in support requests / complaints that instead has maintainers drop official image support. So I'm proposing one with references and context to provide confidence in it. There should be little need to modify it going forward.

Copy link

@polarathene polarathene Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference

For context, here's where these were implemented/discussed in the official docker images:

If you need anything explained about the bash syntax below, let me know. I have revised it a little with improvements that should help make it easier to grok / maintain.

NOTE: In Debian /bin/sh links to /usr/bin/dash, not bash. In Alpine images you'd get a link to ash (which lacks some bash features like arrays), and other glibc base images may link to bash (Fedora I think), so be sure to use shebangs that explicitly call for bash and consider it as an explicit SHELL instruction in the Dockerfile if potentially using bash in RUN / CMD (without an ENTRYPOINT).

./docker/entrypoint.sh:

#!/usr/bin/env bash
set -e

# Optional, for convenience:
# If the user provided a custom command that starts with a dash,
# then treat the command as args/options to append to the default CMD:
if [[ "${1#-}" != "${1}" ]]; then
    set -- comfyui-start "${@}"
fi

export XDG_DATA_HOME="${XDG_DATA_HOME:-/service/comfyui/data}"
# Workaround until resolved: https://github.com/comfyanonymous/ComfyUI/issues/8110
if [[ ! -d "${XDG_DATA_HOME}/custom_nodes" ]]; then
    mkdir -p "${XDG_DATA_HOME}/custom_nodes"
fi

# Optional, for security (non-root switch) + convenience (chown):
# Support `docker run --user` by only running `chown` when using the default CMD + USER (root):
# 1. Ensure the data directory has the expected ownership of the non-root user (`comfyui` user).
# 2. Switch to the `comfyui` user + group via `setpriv` which will re-run this entrypoint script.
# NOTE: `setpriv` will change the UID/GID to non-root, but will not adjust ENV like HOME.
# If a service requires such, it must be set explicitly prior to starting that service.
if [[ "${1}" == 'comfyui-start' && "$(id -u)" == '0' && "${PG_UID}" != '0' ]]; then
    echo 'Switching to non-root user'
    # Assign ownership to the comfyui users data directory (used as ComfyUI `--base-directory`):
    find "${XDG_DATA_HOME}" -not -user comfyui -exec chown comfyui:comfyui '{}' +
    # Assign ownership to the VENV (this is large, switching will be slow)
    #find "/app/venv" -not -user comfyui -exec chown comfyui:comfyui '{}' +
    # Update HOME env for pip to be able to write cache (required for ComfyUI-Manager)
    export HOME='/home/comfyui'
    # Run this entrypoint script again (with any additional args provided), but as the `comfyui` user:
    exec setpriv --reuid=comfyui --regid=comfyui --clear-groups -- "${0}" "${@}"
fi

# Optional, for extra security:
# When `umask` is the default (0022),
# Ensure only the owner can access files created by the process at runtime:
if [[ "$(umask)" == '0022' ]]; then
    umask 0077
fi

# Run the intended command, replacing the active process (this shell) as PID 1:
exec "${@}"

./docker/start.sh: (could technically exist as part of the entrypoint, but separating responsibility is better)

#!/usr/bin/env bash

# Support for persisting Python venv to a volume mount:
# https://github.com/comfyanonymous/ComfyUI/pull/1469#issuecomment-2034975575
# - Sync the containers internal venv + correct the absolute path in `pyenv.cfg`
# - The PATH env gives preference to the custom location when present
#
# NOTE: rsync will copy over any updates from the source path overwriting the
# equivalent at the destination path. Any other files at the destination that
# don't exist at the source path will be left as-is.
if [[ -d "${VIRTUAL_ENV_CUSTOM}" ]]; then
    echo "Syncing '${VIRTUAL_ENV}/' to '${VIRTUAL_ENV_CUSTOM}/'"
    rsync -a "${VIRTUAL_ENV}/" "${VIRTUAL_ENV_CUSTOM}/"
    sed -i "s!${VIRTUAL_ENV}!${VIRTUAL_ENV_CUSTOM}!g" "${VIRTUAL_ENV_CUSTOM}/pyvenv.cfg"
fi

# Implicitly prepend `--cpu` to the args when running the CPU image variant:
if [[ "${IMAGE_VARIANT}" == 'cpu' ]]; then
    set -- '--cpu' "${@}"
fi

exec python -u /opt/comfyui/app/main.py \
    --listen "${COMFYUI_ADDRESS}" \
    --port "${COMFYUI_PORT}" \
    --base-directory "${XDG_DATA_HOME}" \
    ${COMFYUI_EXTRA_ARGS} ${@}
  • -P flag removed from rsync, there's like 30k files in the venv at first sync, that's a considerable amount of noise to output, just go with a single line to stdout to communicate any delay instead 😅
  • The python -u could drop the -u for ENV PYTHONUNBUFFERED=1 in the Dockerfile which might be better should the container try to use python at any other point (such as a running container with a user shelling in and troubleshooting or running their own python scripts, would also work with python shebangs).
  • I still don't really see the point of the --listen + --port options with ENV set to the defaults ComfyUI already uses... users could just do docker run image-name --listen 127.0.0.1 --port 80 (or equivalent command: in compose.yaml), if they need more flexibility via ENV they could manage that themselves, the image itself doesn't really need to support that 🤷‍♂️
# syntax=docker/dockerfile:1

# Default to CPU:
ARG VARIANT="cpu"
ARG PYTORCH_INSTALL_ARGS="--index-url https://download.pytorch.org/whl/${VARIANT}"

ARG PYTHON_VERSION=3.12
FROM python:${PYTHON_VERSION}-slim

# Fail fast on errors or unset variables
SHELL ["/usr/bin/bash", "-eux", "-o", "pipefail", "-c"]

ARG DEBIAN_FRONTEND=noninteractive
RUN <<HEREDOC
    # Create comfyui user + group (reserving a fixed ID before any packages install):
    # NOTE: Debian specific `adduser` (more convenient than the alternative `useradd`)
    # - `--system` intentionally ensures no login shell or password (system user)
    # - `--home` will create the home dir (system users default to /nonexistent)
    # - `--group` creates a group with the same user and ID (aborts if already taken)
    adduser --system --home /home/comfyui --uid 999 --group comfyui

    apt-get update
    apt-get install -y --no-install-recommends \
        git \
        git-lfs \
        rsync \
        fonts-recommended
HEREDOC

ARG XDG_CACHE_HOME=/cache
# This ENV is used by Python tooling, normally it's set by activating a venv:
ENV VIRTUAL_ENV=/app/venv
# This ENV isn't standard, but will 
ENV VIRTUAL_ENV_CUSTOM=/app/custom_venv
# Run python from venv (prefer the custom venv when available)
ENV PATH="${VIRTUAL_ENV_CUSTOM}/bin:${VIRTUAL_ENV}/bin:${PATH}"

ARG PYTORCH_INSTALL_ARGS
# Create a virtual environment + install base packages:
RUN --mount=type=cache,target=/cache <<HEREDOC
    python -m venv "${VIRTUAL_ENV}"
    pip install torch torchvision torchaudio ${PYTORCH_INSTALL_ARGS}
HEREDOC

# Separate layer to not invalidate the much larger torch layer,
# This layer will invalidate when requirements.txt is updated.
RUN --mount=type=cache,target=/cache \
    --mount=type=bind,source=requirements.txt,target=requirements.txt \
    <<HEREDOC
    pip install -r requirements.txt

    # Optional: Prevents a concern with installing custom nodes,
    # that introduce an incompatible cv2 dependency:
    # https://github.com/comfyanonymous/ComfyUI/pull/1469#issuecomment-2571452051
    pip install opencv-python-headless
HEREDOC

# Copy over everything from the git clone, except files matching `.dockerignore` patterns:
COPY . /opt/comfyui/app/

# Workaround due to the top-level file organization of the repo:
# This places the scripts in the PATH and adjusts filenames to look more like commands
RUN <<HEREDOC
  cd /opt/comfyui/app
  mv docker/start.sh /usr/local/bin/comfyui-start
  mv docker/entrypoint.sh /usr/local/bin/comfyui-entrypoint
  rm -rf docker/
  chmod +x /usr/local/bin/comfyui-*
HEREDOC

ARG VARIANT
ENV IMAGE_VARIANT="${VARIANT}"

# Default environment variables used by `comfyui-start`:
ENV COMFYUI_ADDRESS=0.0.0.0
ENV COMFYUI_PORT=8188
ENV COMFYUI_EXTRA_ARGS=""

ENTRYPOINT ["/usr/local/bin/comfyui-entrypoint"]
CMD ["comfyui-start"]
  • Removed PIP_CACHE_DIR=/cache/pip, pip will respect XDG_CACHE_HOME. You don't need to make the cache dir upfront either, it'll do that when it's missing.
  • Switched XDG_CACHE_HOME from ENV to ARG, it's only needed for the cache mount during build, afterwards it makes more sense to change the location. I previously highlighted ENV => ARG in prior review feedback.
  • Removed WORKDIR, no apparent need for it. It was used for avoiding more explicit/absolute paths with COPY and calling main.py.
  • I am inclined to drop the ENV for COMFYUI_ADDRESS + COMFYUI_PORT as per prior review feedback, there's little value in it. You shouldn't be adding them to the image by default for "potential" specific requirements... it is far better to have feature requests for those so they're known instead of unknown and can be documented accordingly (even if that's just via git blame history tying a PR to an issue).

Could additionally:

Comment on lines +14 to +21
volumes:
- "./models:/app/models"
- "./input:/app/input"
- "./temp:/app/output/temp"
- "./output:/app/output"
- "./user:/app/user"
- "./custom_venv:/app/custom_venv"
- "./custom_nodes:/app/custom_nodes"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be ideal to minimize these volume mounts, one for the ComfyUI state itself, and probably best to keep another separate for the custom venv.


We can use ComfyUI's own support for these various locations, but I don't think base_path + is_default is sufficient alone? (ComfyUI Manager has some support of it's own)

comfyui:
    base_path: /home/comfyui/data
    is_default: true
    input: input/
    user: user/
    checkpoints: models/checkpoints/
    clip: models/clip/
    clip_vision: models/clip_vision/
    configs: models/configs/
    controlnet: models/controlnet/
    diffusion_models: |
                 models/diffusion_models
                 models/unet
    embeddings: models/embeddings/
    loras: models/loras/
    upscale_models: models/upscale_models/
    vae: models/vae/

Even with that, ComfyUI refuses to start if it can't create it's own default directories like /user (crash) or /input (error). I was curious about running the app install with it left as root owned, despite the switch to a non-root user. Technically a bug on ComfyUI's end there I think, when new default paths are configured.

Resolved by creating those missing directories in advance.

is_default: true support arrived with the 0.2.3 release (Oct 2024).


UPDATE: The CLI arg --base-directory (is this really not documented anywhere else?) can be used instead of this verbose YAML config.

  • It works better than the base setting above since all paths will actually default to that location (_the models/ subpath folders retain that prefix due to folder_paths.py.
  • --base-directory is fairly new (v0.3.13, released Jan 30 2025).

NOTE: ComfyUI Manager 3.0 will leverage the ComfyUI user/ directory for config and generated files at <USER_DIRECTORY>/default/ComfyUI-Manager/.

There's also a snapshot feature, but it states that it's reliant on git for full support.

There's some overlap here with comfy CLI (which presumably an official wrapper to ComfyUI Manager and maybe cm-cli?) 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern with the custom venv syncing is when the image upgrades to a new version of Python, it'll sync over the new site packages folder with PyTorch deps, but you would not have the equivalent for anything third-party, so caution would be needed when upgrading?

Comment on lines +12 to +13
ports:
- "8188:8188"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it more explicit that it's only accessible from the host itself. The default is 0.0.0.0 which to less experienced users will publish the port to public interfaces which firewalls like UFW last I checked will not prevent (since Docker manages iptables directly and it's rules have precedence). When firewalld is used instead, this scopes to the docker zone IIRC so it differs a bit.

Suggested change
ports:
- "8188:8188"
ports:
- "127.0.0.1:8188:8188"

Comment on lines +5 to +11
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, is it assumed the user has done any required setup to support this?

It's simplified for you if you're using Docker Desktop and Nvidia, but otherwise on Linux IIRC there's a little bit of extra setup.

CDI is where OCI spec is adopting this config more broadly, although Intel and AMD haven't quite got there at creating the necessary host files. Nvidia supports it well IIRC but might still require a bit more work than the snippet chosen here at present.

Intel/AMD is a bit more involved IIRC, although you're building ROCm images, without the guidance here users will likely report issues to project maintainers about this. You'd want to either clarify the lack of support for those image variants or provide guidance on how to configure that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding AMD/ROCm, I have seen this issue where the user reports a compose config with related config to support that. From that and others I've seen in the past, the required ENV varies by GPU, so generic guidance for AMD hardware is probably a bit difficult to offer, I guess it depends how much of that is specific to a container deployment when it comes to the burden of support from maintainers.

Probably will improve once AMD has better support for CDI 🤔 (I think that can remove the need for specific ENV for example as relevant config is all generated on the host, only needing to provide a single line to the container runtime to reference it in a standardized manner)

Comment on lines +3 to +4
user: "1000:1000"
build: .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user shouldn't be necessary. It doesn't make much sense with your current iteration vs just using USER in the Dockerfile to set that since you're tying it to build-time anyway..

build is fine, but you could add image to refer to the nvidia image that would be published. That'll be grabbed instead of the the build then. You may want to add the related build-arg if you think it's worth including build in the compose.yaml, but there's not much advantage to that if the user doesn't intend to build regularly vs just running the equivalent CLI command.

NOTE: Mixing image + build would have any build triggered replace the image locally which may be a subtle source of bug reports (similar to relying on latest tags that become stale over time without an explicit pull and are prone to breaking changes).

FWIW: Without image, Compose will build the image and tag it based on the project + service name (so ComfyUI/comfyui:latest). Generally though I think a user interested in this compose file would prefer the registry image over a local build.

Comment on lines +90 to +101
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: |
PYTORCH_INSTALL_ARGS=${{ matrix.pytorch_install_args }}
EXTRA_ARGS=${{ matrix.extra_args }}
cache-from: type=gha,scope=${{ github.ref_name }}-${{ matrix.id }}
cache-to: type=gha,mode=max,scope=${{ github.ref_name }}-${{ matrix.id }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to indicate the source of what implicit labels are being configured?:

Suggested change
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: |
PYTORCH_INSTALL_ARGS=${{ matrix.pytorch_install_args }}
EXTRA_ARGS=${{ matrix.extra_args }}
cache-from: type=gha,scope=${{ github.ref_name }}-${{ matrix.id }}
cache-to: type=gha,mode=max,scope=${{ github.ref_name }}-${{ matrix.id }}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
# Defaults:
# https://github.com/docker/metadata-action/blob/8e1d5461f02b7886d3c1a774bfbd873650445aa2/src/meta.ts#L509-L522
labels: ${{ steps.meta.outputs.labels }}
build-args: |
VARIANT=${{ matrix.variant }}
cache-from: type=gha,scope=${{ github.ref_name }}-${{ matrix.id }}
cache-to: type=gha,mode=max,scope=${{ github.ref_name }}-${{ matrix.id }}

Cache wise I can't comment atm as I'm not that familiar with the type=gha one yet. My Dockerfile suggestions should ensure layers could be shared up until the PyTorch package install, but I don't know if there's much size in that to be worthwhile trying to share across variants cache 😅 (I wouldn't want to complicate maintenance further)


Reference

These are the defaults configured, mostly sourced from repository metadata API (docker/metadata-action acquires it here, and this is the equivalent response for this repo):

const res: Array<string> = [
  `org.opencontainers.image.title=${this.repo.name || ''}`,
  `org.opencontainers.image.description=${this.repo.description || ''}`,
  `org.opencontainers.image.url=${this.repo.html_url || ''}`,
  `org.opencontainers.image.source=${this.repo.html_url || ''}`,
  `org.opencontainers.image.version=${this.version.main || ''}`,
  `org.opencontainers.image.created=${this.date.toISOString()}`,
  `org.opencontainers.image.revision=${this.context.sha || ''}`,
  `org.opencontainers.image.licenses=${this.repo.license?.spdx_id || ''}`
];
  • this.version.main is set to the parsed version. For type=edge that'll be the branch name, and type=semver that'd be the release tag (with v prefix stripped), along with any prefix/suffix. So overriding that may be better if it's only meant to represent ComfyUI release itself not a variant like the tag.
  • The created label will default to the workflow run timestamp (ISO 8601 timestamp), but the action itself provides {{ date }} and {{ commit_date }} expressions. Otherwise you'd manually do such like I've done in the past:
steps:
  - name: "Get current timestamp (ISO 8601)"
    id: build-info
    run: echo "date=$(date --utc --iso-8601=seconds)" >> "${GITHUB_OUTPUT}"

  # ...
  # docker/metadata-action step:
          labels: |
            org.opencontainers.image.created=${{ steps.build-info.outputs.date }}

Rough equivalent defaults as explicit labels

So roughly explicit definitions of these using context expressions (this is roughed out and not verified, github.event.repository should mostly be reliable across workflow types IIRC 😅):

labels: |
  org.opencontainers.image.title=${{ github.event.repository.name }}
  org.opencontainers.image.description=${{ github.event.repository.description }}
  org.opencontainers.image.url==${{ github.event.repository.html_url }}
  org.opencontainers.image.source=${{ github.event.repository.html_url }}
  org.opencontainers.image.version=${{ github.ref_type == 'tag' && github.ref_name || 'edge' }}
  org.opencontainers.image.created={{ date }}
  org.opencontainers.image.revision=${{ github.sha }}
  org.opencontainers.image.licenses=${{ github.event.repository.license?.spdx_id }}
  • revision is using github.sha context, which IIRC is tied the commit SHA triggering the workflow event, which might not always align with the actual git clone commit being built, care should be taken there 😅
  • created is relying on the {{ date }} placeholder from the action. {{ commit_date }} may be more relevant if it should reflect the date of the source (to align with the revision?). Depends if it's more useful to know what time the container itself was built. OCI spec states it should reflect the date the image was built, similar to some other fields regarding documentation and urls.
  • version applies my earlier suggestion to use the tag when present or edge as a fallback (instead of the branch name itself). This excludes any prefix or suffix for variants that is used in the actual image tags.
  • licenses may not work, I'm not sure if ?. is supported
  • No || '' fallback needed, as a null (missing) value should already be treated as ''.

.dockerignore Outdated
Comment on lines 1 to 8
.git
__pycache__/
*.py[cod]
input
models
notebooks
output

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that every docker user knows that containers do not persist data if not instructed otherwise thus update should be handled in other ways (e.g. docker tags)

Just want to point out that VOLUME instruction in an image will implicitly create an anonymous volume. Docker doesn't presently have a way to opt-out of that like Podman does.

It can be problematic with Docker Compose (where the intent is to persist that volume across image updates, and it does so by associating the volume to the compose service name rather than the image itself), hence an entirely different image with the same VOLUME path will get that same mount. That can be unexpected such as with a service name db, and switching from MongoDB to Redis which both share /data internally.

It'd also not be wise to use that with data that is necessary for the image to function (such as the venv) if were on the same internal one that were updated with image updates for example you'd implicitly get failures when the old venv of your VOLUME is no longer compatible.


Generally you make an immutable image and have volumes to persist state. That works well for a typical deployment.

Then there is the alternative, where you have a project that's say NodeJS or Python based. Instead of building a custom image with everything baked in, you just use the official image (and any extras that it may require system package wise), and you run that with the volume mount to install and persist your project deps. No git clone or anything within the image.

I've done the latter before for projects serving thousands of users from a single server. The image just provides the environment, the project is version controlled with git, and there's a bit of manual work (or script automation) to push out an update. Specialized images are kinda redundant/wasteful for that purpose.

This type of approach seems to align with what is being done here with reliance upon third-party customization / deps. The volume mount isn't just application state, it's messing with deps and that's going to have some interplay with the Python and PyTorch versions should you use the image to upgrade.

What @TomK32 describes would be a very light image that doesn't have much point in being published or upgraded after install? That's easier to maintain and would effectively be a documented / committed Dockerfile that you build locally instead.

Publishing an image with the different permutations (CPU + GPU toolkits) and image updates with git source baked in makes sense when there's less runtime modification going on. It's meant to be convenient/useful, but when you're no longer persisting state there's going to be more risk of breakage should you expect to rely on updates via image upgrades 🤷‍♂️ (which from the sounds of it will be a concern when adding stuff like ComfyUI Manager)

ComfyUI Manager could instead be bundled if it's the popular, but that doesn't quite solve the issue given how syncing a venv would be done, nor the fact that this addition still installs more deps. With the Comfy CLI tool, you could at least extend this image as a base and install deps in a deterministic manner. You could then just rebuild that to avoid all these problems. A little inconvenient for users that install via UI, if there's no way to export a list (ideally one would be managed/updated as a file without user interaction required).

Comment on lines +1 to +8
.*
!.git
__pycache__/
*.py[cod]
input
models
notebooks
output

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior review feedback of mine to consider.

This ignores the standard folders input,models,output being copied over. models has one folder with some actual content in it (last update was 2 years ago, so I assume it's not important?). Other than that, the folders are expected to exist at run time, or ComfyUI attempts to create them (and would fail if it lacked the correct permissions).

EDIT: Non-issue as it seems better to prefer providing the --base-directory arg like I've mentioned in my review feedback for the compose.yaml volumes config.

Comment on lines +13 to +14
# Fail fast on errors or unset variables
SHELL ["/bin/bash", "-eux", "-o", "pipefail", "-c"]
Copy link

@polarathene polarathene Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it won't affect this image build, should anyone extend it with ComfyUI Manager, note that presently cm-cli would not be reliable to fail early.

UPDATE:

@@ -0,0 +1,101 @@
name: Build and publish Docker images

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed workflow change (Rough example, I've ran out of time for now to refine and test):

name: 'Build and Publish Docker Images'

on:
  workflow_dispatch:
  push:
    branches:
      - master
    tags:
      - 'v*.*.*'

permissions:
  contents: read
  packages: write

jobs:
  docker:
    name: 'Deploy Docker Images'
    uses: comfyanonymous/ComfyUI/.github/workflows/generic_publish.yml@master
    with:
      image-variant: CUDA
      project-name: comfyui
      revision: ${{ github.sha }}
      source: https://github.com/${{ github.repository }}
      version: ${{ github.ref_type == 'tag' && github.ref_name || 'edge' }}
    secrets: inherit

generic_publish.yml:

name: 'Build and Publish the Docker Image'

on:
  workflow_call:
    inputs:
      image-variant:
        required: false
        type: string
      project-name:
        required: true
        type: string
      revision:
        required: true
        type: string
      source:
        required: true
        type: string
      version:
        required: true
        type: string

permissions:
  contents: read
  packages: write

jobs:
  publish-images:
    name: 'Build and Publish'
    runs-on: ubuntu-latest
    steps:
      - name: 'Checkout'
        uses: actions/checkout@v4

      - name: 'Find the latest tag (sorted via SemVer)'
        id: tag
        env:
          REPO_URL: ${{ github.event.repository.html_url }}
          TAG_FILTER: '^refs/tags/v[0-9]+\.[0-9]+\.[0-9]+$'
        run: |
          TAG=$(
            git ls-remote --exit-code --refs --tags --sort='-v:refname' "${REPO_URL}" \
            | awk '{print $2}' | grep -E ${TAG_FILTER} | head -n 1
          )

          echo "/${TAG}"

      # This step will support building non-native platform archs via emulation (slow)
      # Skip this when you can compile for that platform natively, or cross-compile (Zig, Go)
      - name: 'Set up QEMU'
        uses: docker/setup-qemu-action@v3
        with:
          platforms: arm64

      - name: 'Set up Docker Buildx'
        uses: docker/setup-buildx-action@v3

      # The metadata action appends labels and sets the image names + tags to publish.
      # Tags for the image to be published, based on the following rules:
      # https://github.com/docker/metadata-action?tab=readme-ov-file#typeedge
      # https://github.com/docker/metadata-action?tab=readme-ov-file#typesemver
      # Updates to master branch will publish an image with the `:edge` tag (useful between official releases)
      # Pushing a git tag will publish an image with all 3 semver tags
      #
      # NOTE: Publishing the semver tags like this is useful for software that monitors an image
      # for updates only by a digest change to the image tag at the registry, like WatchTower does:
      # https://github.com/containrrr/watchtower/issues/1802
      #
      # NOTE: By default the `:latest` image tag will be updated for the `semver` type below,
      # you may want to have more control over that if your release policy could publish a patch
      # release to an older major/minor release (as this would unintentionally update the latest tag too):
      # https://github.com/docker/metadata-action?tab=readme-ov-file#latest-tag
      - name: 'Prepare image tags'
        id: docker-metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            docker.io/comfyanonymous/${{ inputs.project-name }}
            ghcr.io/comfyanonymous/${{ inputs.project-name }}
          tags: |
            type=edge,branch=master
            # Remove the enable condition once v0 releases are no longer published:
            # https://github.com/docker/metadata-action/tree/master#major-version-zero
            type=semver,pattern={{major}},enable=${{ !startsWith(github.ref, 'refs/tags/v0.') }}
            type=semver,pattern={{major}}.{{minor}}
            type=semver,pattern={{major}}.{{minor}}.{{patch}}
          # https://specs.opencontainers.org/image-spec/annotations/#pre-defined-annotation-keys
          labels: |
            org.opencontainers.image.title=ComfyUI for ${{ inputs.image-variant }}
            org.opencontainers.image.description=${{ github.event.repository.description }}
            # Project + organization info:
            org.opencontainers.image.vendor=Comfy Org
            org.opencontainers.image.url=https://www.comfy.org/
            org.opencontainers.image.documentation=https://docs.comfy.org/
            org.opencontainers.image.authors=https://www.comfy.org/about
            org.opencontainers.image.licenses=GPL-3.0
            # Build info:
            org.opencontainers.image.source=${{ github.event.repository.html_url }}
            org.opencontainers.image.revision=${{ github.sha }}
            org.opencontainers.image.created={{ date }}
            org.opencontainers.image.version=${{ github.ref_type == 'tag' && github.ref_name || 'edge' }}

      # Publish to image registries (GHCR and DockerHub):
      - name: 'Login to GitHub Container Registry'
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      # Requires a maintainer to configure credentials as secrets, unlike GHCR:
      - name: 'Login to DockerHub'
        uses: docker/login-action@v3
        with:
          registry: docker.io
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}


      # Build the image from a Dockerfile for the requested platforms,
      # then publish a multi-platform image at the registries with the tags and labels declared above
      - name: 'Build and publish images'
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          labels: ${{ steps.docker-metadata.outputs.labels }}
          tags: ${{ steps.docker-metadata.outputs.tags }}
          # annotations: ${{ steps.meta.outputs.labels }}
          # Customize any Dockerfile ARG values here:
          build-args: |
            VARIANT=${{ inputs.image-variant }}
          # Prefer OCI media types and OCI artifacts (can remove this line once these become default)
          outputs: type=image,oci-mediatypes=true,oci-artifact=true
          # Disable provenance attestation (avoid publishing `unknown/unknown` platforms):
          # https://docs.docker.com/build/attestations/slsa-provenance/
          # https://github.com/docker-mailserver/docker-mailserver/issues/3582#issuecomment-2716365953
          provenance: false

The first action snippet would need to be adapted to using a matrix config like you've got here already. My WIP iteration at the time was a little different:

jobs:
  docker:
    strategy:
      # https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idstrategyfail-fast
      fail-fast: false
      matrix:
        # compute_platform: [CPU, CUDA, ROCm]
        include:
          - compute-platform: CPU
            pytorch-index: cpu
          - compute-platform: CUDA
            pytorch-index: cu124
            version: 12.4
          - compute-platform: ROCm
            pytorch-index: rocm6.2
            version: '6.2'

@oxc
Copy link
Author

oxc commented Jun 19, 2025

I currently don't have the resources to keep running ComfyUI and to keep working on this. Is anyone interested in taking this over? @polarathene, you have put a lot of thought and work into your reviews, would you be willing to continue on this?

Any feedback from an official maintainer would also be welcome, to gauge whether this has generally any chance of being merged.

@polarathene
Copy link

I currently don't have the resources to keep running ComfyUI and to keep working on this.

Did you move on to an alternative project?

Is anyone interested in taking this over? @polarathene, you have put a lot of thought and work into your reviews, would you be willing to continue on this?

Yes I will contribute a PR with my own feedback applied when time permits.

Any feedback from an official maintainer would also be welcome, to gauge whether this has generally any chance of being merged.

I have a small PR open to fix a minor bug where the comfy init code panics if a custom nodes directory doesn't exist, yet that's received no feedback. I am doubtful of a PR for this going through review any time soon unless there is sufficient demand to warrant the maintainers engaging.

We can remain hopeful I guess, but I think for now the maintainers have higher priorities for where their time is spent.

@oxc
Copy link
Author

oxc commented Jun 19, 2025

I currently don't have the resources to keep running ComfyUI and to keep working on this.

Did you move on to an alternative project?

I have stopped playing with local image generators, especially because my hardware was always a bit underpowered. More importantly though, work is keeping me busy on several fronts.

@polarathene
Copy link

No worries, I understand. I don't have much time to spend with projects like these myself but I like to try stay informed of where AI tooling is going.

I have very little disk space for these kind of images, so I'm slowly learning more about building images for CUDA and trying to get more confident in compatibility/performance concerns. These are a fair bit more daunting than my typical Dockerfile which don't have the GPU compatibility concerns 😅 (I've since learned that it's not just the CUDA version that's relevant).

I'll comment here again with a link to a new PR once I've managed to tackle that, might be sometime next month.

In the mean time perhaps leave this PR open for visibility? Then can close it once my own PR is up. If someone else wants to tackle a new PR by all means chime in here, otherwise I'll get to it when I can.

@TheRealPSV
Copy link

If anyone finds it helpful, I have a small repo here doing automatic builds of Docker images for ComfyUI: https://github.com/TheRealPSV/comfyui-docker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.