Skip to content

Add Argmax DiffusionKit Snippet #869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 30, 2024

Conversation

ardaatahan
Copy link
Contributor

This PR:

  1. Adds a new diffusionkit snippet in model-libraries-snippets.ts.
  2. Adds diffusionkit library with necessary information in MODEL_LIBRARIES_UI_ELEMENTS in model-libraries.ts.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, thanks @ardaatahan! I've added some comments to improve the integration and then we should be good. Don't forget to tag models on the Hub as diffusionkit (we usually prefer to have a few examples before merging 😉)

@@ -905,4 +905,33 @@ whisperkit-cli transcribe --audio-path /path/to/audio.mp3
# Or use your preferred model variant
whisperkit-cli transcribe --model "large-v3" --model-prefix "distil" --audio-path /path/to/audio.mp3 --verbose`,
];

export const diffusionkit = (): string[] => [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this section up in the file to preserve alphabetical order as much as possible? (just after diffusers for instance)

prettyLabel: "DiffusionKit",
repoName: "DiffusionKit",
repoUrl: "https://github.com/argmaxinc/DiffusionKit",
docsUrl: "https://github.com/argmaxinc/DiffusionKit?tab=readme-ov-file#-image-generation-with-python-mlx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docsUrl: "https://github.com/argmaxinc/DiffusionKit?tab=readme-ov-file#-image-generation-with-python-mlx",

No need to provide a docsUrl when it's the same as repoUrl

@@ -174,6 +174,13 @@ export const MODEL_LIBRARIES_UI_ELEMENTS = {
filter: true,
/// diffusers has its own more complex "countDownloads" query
},
diffusionkit: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have some models listed in https://huggingface.co/models?other=diffusionkit before merging this PR. You can do that by tagging models as diffusionkit in the model card metadata.

I opened a first PR to do that: https://huggingface.co/argmaxinc/mlx-FLUX.1-schnell/discussions/2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the SD3 models have the DiffusionKit tag: https://huggingface.co/models?other=DiffusionKit. The correct tag should be diffusionkit (lowercased). Make sure to update them to benefit from this PR.

(the correct key is the key defined here, so diffusionkit in this case. Below we are defining prettyLabel and repoName but that's just for aesthetic in the UI)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generated these models with diffusionkit much before we had an MLX implementation, and in fact don't yet have a public CoreML inference implementation of DiffusionKit (in the works). These models are actually more directly usable by the huggingface swift-coreml-diffusers app, although they were generated with diffusionkit. So in this case we might want to just remove the lowercase diffusionkit tag? @pcuenca what do you think about a local-app deeplink for Diffusers.app? We had discussed adding arbitrary repos in the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated our tags and model cards: https://huggingface.co/models?other=diffusionkit

Comment on lines 910 to 922
`# Install CLI with pip
pip install diffusionkit-cli

# View all available options
diffusionkit-cli --help

# Generate image using default FLUX.1-schnell and save it to out.png
diffusionkit-cli --prompt "a beautiful sunset over the ocean"

# To use Stable Diffusion 3 accept the terms before downloading the checkpoint: https://huggingface.co/stabilityai/stable-diffusion-3-medium
# Once you accept the terms, sign in with your Hugging Face hub token with read access to contents of all public gated repos you can access:
huggingface-cli login --token YOUR_HF_HUB_TOKEN

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "code snippet" part is generally a very short snippet to help users getting started. It should not contain installation or configuration steps and should be focused on "how to use this model with diffusionkit". Code snippets are shown on each model page and are customized to be ready to use.

Suggested change
`# Install CLI with pip
pip install diffusionkit-cli
# View all available options
diffusionkit-cli --help
# Generate image using default FLUX.1-schnell and save it to out.png
diffusionkit-cli --prompt "a beautiful sunset over the ocean"
# To use Stable Diffusion 3 accept the terms before downloading the checkpoint: https://huggingface.co/stabilityai/stable-diffusion-3-medium
# Once you accept the terms, sign in with your Hugging Face hub token with read access to contents of all public gated repos you can access:
huggingface-cli login --token YOUR_HF_HUB_TOKEN
`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @Wauplin. In addition, I'd recommend that different examples or use cases are returned as multiple strings, like in the audioseal example. This way snippets are separated visually and can be copied independently:

Screenshot 2024-08-27 at 11 21 53

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better yes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great example, was curious how to do this!

Comment on lines 923 to 924
# Use specific model and set custom output path
diffusionkit-cli --prompt "a futuristic cityscape" --model-version stable-diffusion-3-medium --output-path /path/to/output.png
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Use specific model and set custom output path
diffusionkit-cli --prompt "a futuristic cityscape" --model-version stable-diffusion-3-medium --output-path /path/to/output.png
# Generate an image and save it to out.png
diffusionkit-cli --prompt "a futuristic cityscape" --model-version ${model.id}

Comment on lines 926 to 935
# Set seed for reproducibility, specify number of steps, and set custom output image dimensions
diffusionkit-cli --prompt "detailed cinematic dof render of a \
detailed MacBook Pro on a wooden desk in a dim room with items \
around, messy dirty room. On the screen are the letters 'FLUX on \
DiffusionKit' glowing softly. High detail hard surface render" \
--height 768 \
--width 1360 \
--seed 1001 \
--step 4 \
--output ~/Desktop/flux_on_mac.png`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Set seed for reproducibility, specify number of steps, and set custom output image dimensions
diffusionkit-cli --prompt "detailed cinematic dof render of a \
detailed MacBook Pro on a wooden desk in a dim room with items \
around, messy dirty room. On the screen are the letters 'FLUX on \
DiffusionKit' glowing softly. High detail hard surface render" \
--height 768 \
--width 1360 \
--seed 1001 \
--step 4 \
--output ~/Desktop/flux_on_mac.png`,
# Specify dimensions, seed, number of steps and destination file
diffusionkit-cli \
--prompt "detailed cinematic dof render of a \
detailed MacBook Pro on a wooden desk in a dim room with items \
around, messy dirty room. On the screen are the letters 'FLUX on \
DiffusionKit' glowing softly. High detail hard surface render" \
--model-version ${model.id}
--height 768 \
--width 1360 \
--seed 1001 \
--step 4 \
--output ~/Desktop/flux_on_mac.png`,

Add --model-version ${model.id} + some nits

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ardaatahan - thanks for the PR. 🤗

Quick thought, for the libraries PR let's use the actual modeling code snippets i.e. https://github.com/argmaxinc/DiffusionKit?tab=readme-ov-file#code

This also makes it easy for people to quickly able to take the snippet and run (this is similar to how we show for transformers)

and then for the CLI we can do a separate LocalApps integration - local apps integration would be similar to how we currently show drawthings, and other Text to Image apps?

I think this would give you the best distribution overall since you promote your library as a modeling/ conversion library plus as an app as well.

cc: @atiorh @enzostvs @pcuenca @Wauplin

Thoughts?

If you think this is a good idea then the next step would be to:

  1. Replace the CLI code snippet here with Python inference snippets.
  2. Open a new PR similar to how the drawthings integration I linked above is.

@ZachNagengast
Copy link
Contributor

Agree with everything here, will adjust based on review.

@Vaibhavs10 thanks for your comment, also curious about thoughts from the rest of the team. I think its a great suggestion. One caveat is that the mlx implementation of diffusionkit is not really a fully fledge app, but more of a cli utility into the python library (similar to howwhisperkit-cli provides an interface into the swift library). On the other hand, it will work with the source repos of some very specific text-to-image models that we support so far (flux and sd3) since we do the model dict adjustments inside the library, so will investigate how to filter that down similar to drawthings.

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app, but more of a cli utility into the python library (similar to howwhisperkit-cli provides an interface into the swift library). On the other hand, it will work with the source repos of some very specific text-to-image models that we support so far (flux and sd3) since we do the model dict adjustments inside the library, so will investigate how to filter that down similar to drawthings.

Ah makes sense - the current structure makes sense to me with actual model inference snippets over CLI. We can discuss CLI LocalApps integration a bit later once DiffusionKit matures a bit more potentially.

Reviewing the PR real quick.

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Much cleaner! 🤗

Note: There's a little conflict, make sure to rebase!

image

pipeline = DiffusionPipeline(
shift=3.0,
use_t5=False,
model_version="argmaxinc/mlx-stable-diffusion-3-medium",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it not possible to do it model-specific at this time, as @Wauplin mentioned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it would be best to showcase only the relevant code snippet here. How does the CLI determines which pipeline to instantiate? I would reuse the same logic here. Otherwise, it's also possible to tag the models on the Hub as "flux" or stable-diffusion (using the tags list in the model card metadata) and use this information to generate the snippets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback! Both of your proposals are excellent suggestions. I will be implementing the latter approach, using tags in the model card metadata to determine which snippets to generate.

latent_size=(HEIGHT // 8, WIDTH // 8),
)`;

return [sd3Snippet, fluxSnippet, generateSnippet];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have multiple snippets, it may be useful to include a comment line at the beginning of each one to explain what they are about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and added, thanks!

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Sorry being late to the party. Let's add it only as a library for now. Note that since we are displaying multiple snippets, it is still possible to have "Python" snippets followed by "CLI" snippets.

pipeline = DiffusionPipeline(
shift=3.0,
use_t5=False,
model_version="argmaxinc/mlx-stable-diffusion-3-medium",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it would be best to showcase only the relevant code snippet here. How does the CLI determines which pipeline to instantiate? I would reuse the same logic here. Otherwise, it's also possible to tag the models on the Hub as "flux" or stable-diffusion (using the tags list in the model card metadata) and use this information to generate the snippets.

@ardaatahan ardaatahan force-pushed the add-diffusionkit-snippet branch from 1c0d5a9 to 6055b69 Compare August 28, 2024 18:03
@ardaatahan
Copy link
Contributor Author

app, but more of a cli utility into the python library (similar to howwhisperkit-cli provides an interface into the swift library). On the other hand, it will work with the source repos of some very specific text-to-image models that we support so far (flux and sd3) since we do the model dict adjustments inside the library, so will investigate how to filter that down similar to drawthings.

Ah makes sense - the current structure makes sense to me with actual model inference snippets over CLI. We can discuss CLI LocalApps integration a bit later once DiffusionKit matures a bit more potentially.

Reviewing the PR real quick.

Hey @Vaibhavs10, thank you for your feedback! After your original comment about opening a new PR similar to how the drawthings integration is, I made similar changes to the local-apps file for DiffusionKit and I'm curious about your thoughts:

+ const snippetDiffusionKit = (model: ModelData): LocalAppSnippet[] => {
+	const command = (binary: string) =>
+		[
+			"# Load and run the model:",
+			`${binary} \\`,
+			`  --model-version ${model.id}" \\`,
+			'  --prompt "a futuristic cityscape" \\',
+			`  --height 768 \\`,
+			`  --width 1360 \\`,
+			`  --seed 1001 \\`,
+			`  --step 4 \\`,
+			`  --output ~/Desktop/out.png`,
+		].join("\n");
+	return [
+		{
+			title: "Install from pip",
+			setup: "pip install diffusionkit",
+			content: command("diffusionkit-cli"),
+		},
+		{
+			title: "Build from source code",
+			setup: [
+				// prettier-ignore
+				"git clone https://github.com/argmaxinc/DiffusionKit.git",
+				"cd DiffusionKit",
+				"pip install -e .",
+			].join("\n"),
+			content: command("diffusionkit-cli"),
+		},
+	];
+ };

Inside LOCAL_APPS:

+ diffusionkit: {
+		prettyLabel: "DiffusionKit",
+		docsUrl: "https://github.com/argmaxinc/DiffusionKit",
+		mainTask: "text-to-image",
+		macOSOnly: true,
+		displayOnModelPage: (model) => model.library_name === "diffusionkit" && model.pipeline_tag === "text-to-+image",
+		snippet: snippetDiffusionKit,
+ },

Given that these local app changes are already implemented, would you recommend creating a new PR to handle them?

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! :)

Comment on lines 200 to 201
NUM_STEPS = 4 # 4 for FLUX.1-schnell, 50 for SD3
CFG_WEIGHT = 0. # for FLUX.1-schnell, 5. for SD3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NUM_STEPS = 4 # 4 for FLUX.1-schnell, 50 for SD3
CFG_WEIGHT = 0. # for FLUX.1-schnell, 5. for SD3
NUM_STEPS = ${model.tags.includes("flux") ? 4 : 50}
CFG_WEIGHT = ${model.tags.includes("flux") ? 0. : 5}

Let's reuse the flux tag to generate the pipeline config as well :)

(disclaimer: not tested)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I just tested and made the changes.

@@ -174,6 +174,13 @@ export const MODEL_LIBRARIES_UI_ELEMENTS = {
filter: true,
/// diffusers has its own more complex "countDownloads" query
},
diffusionkit: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the SD3 models have the DiffusionKit tag: https://huggingface.co/models?other=DiffusionKit. The correct tag should be diffusionkit (lowercased). Make sure to update them to benefit from this PR.

(the correct key is the key defined here, so diffusionkit in this case. Below we are defining prettyLabel and repoName but that's just for aesthetic in the UI)

@ardaatahan
Copy link
Contributor Author

@Vaibhavs10 @pcuenca @Wauplin Let me know if there's any other changes needed here or if it is good to go. Open to any suggestions you'd think would help the users!

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me!
@Vaibhavs10 and/or @pcuenca please have a last check as well and we should be good to merge :)

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot for iterating here 🙌 Let's wait for @Wauplin and @Vaibhavs10 thoughts!

Edit: had the review tab open while @Wauplin was submitting his review.


const pipelineSnippet = model.tags.includes("flux") ? fluxSnippet : sd3Snippet;

return [pipelineSnippet, generateSnippet];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually show a single snippet per task (in this case it would be pipeline preparation followed by generation), but I'm not opposed to using two to differentiate instantiation from generation.

@pcuenca
Copy link
Member

pcuenca commented Aug 30, 2024

Linter fixed, but two tests failing. Is it ok to merge @Wauplin @SBrandeis ?

@Wauplin
Copy link
Contributor

Wauplin commented Aug 30, 2024

Thanks for the linter :) I would say yes since the failing tests are unrelated to this PR (already happening on main and other PRs)

@Wauplin Wauplin merged commit 94cb7fe into huggingface:main Aug 30, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants