Skip to content

High RAM usage on GPU mode compared to using apple/ml-stable-diffusion CLI tool #69

Open
@svenkevs

Description

@svenkevs

I noticed that the diffuser app, while running on GPU mode, uses just over 13GB RAM while infering on the non-quantized SDXL 1.0 model. If I use pretty much the same settings with Apple's Core ML Stable Diffusion software (https://github.com/apple/ml-stable-diffusion), on the same model, my system uses just under 8GB of ram. Both result in different pictures. Hardware: Apple Mac-Mini M2 Pro, 16GB RAM, latest MacOS 14 public beta.

swift-coreml-diffuser settings:

Positive prompt: a photo of an astronaut dog on mars
Negative prompt: [empty]
Guidance Scale: 7.5
Step count: 20
Preview count: 25
Random seed: 4184258190
Advanced: GPU
Disable Safety Checker: Selected

Commandline prompt with arguments:
swift run StableDiffusionSample "a photo of an astronaut dog on mars" --compute-units cpuAndGPU --step-count 20 --seed 4184258190 --resource-path <path to model> --xl --disable-safety --output-path <path to image folder>

I do make the assumption here that selecting GPU is in actual fact the same as the CLI's cpuAndGPU (considering the CLI has no GPU option). Perhaps the difference lies there? In that case, can cpu & gpu mode support be added?

First time loading the model in the app (e.g. first time after starting app or switching models) also takes a lot longer vs. loading time in the command line. 13GB of RAM use by the app leads to a bunch of swapfile use on my 16GB M2 Pro Mac Mini, while running the CLI tool does not lead to swap file use, which most likely explains this difference.

Considering model sizes and RAM usage, it almost looks like the app is loading the model twice? That's pure speculation though, I imagine there's plenty of overhead involved. But considering the App itself, before a model is loaded, uses 40MB of ram, there's a difference with the commandline tools of just over 5GB (about the size of the unet weights) while generating an image.

I haven't tested for non-sdxl models, I might follow up if I find some time for that (at which point I can also compare ram use when using the neural engine).

I'm honestly not sure if this is a bug or simply caused by some different settings/features under the hood I am not aware of. But it is an issue for how usable the software is on machines with lower ram.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions