Skip to content

[Core] fuse_qkv_projection() to Flux #9185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 23, 2024
Merged

[Core] fuse_qkv_projection() to Flux #9185

merged 11 commits into from
Aug 23, 2024

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Aug 15, 2024

What does this PR do?

Adds fuse_qkv_projection() support Flux.

Will report the performance improvements soon.

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to this comment to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from torchao. We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul requested a review from DN6 August 16, 2024 07:13
@sayakpaul sayakpaul marked this pull request as ready for review August 16, 2024 07:13
@sayakpaul sayakpaul requested a review from yiyixuxu August 18, 2024 03:06
@yiyixuxu
Copy link
Collaborator

awesome, but I think we will have to update once the refactor PR is in since I combined the attention processors there #9074

@sayakpaul
Copy link
Member Author

100 percent right. I will repurpose once your PR is in :)

@sayakpaul
Copy link
Member Author

@yiyixuxu could you give this a look? I face adjusted it accordingly with #9074.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good to me
Can we run a actual test to see the improvement before merge? feel free to merge once that's done

@sayakpaul
Copy link
Member Author

Check the PR description:

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to https://github.com/huggingface/diffusers/pull/8829/#issuecomment-2236254834 to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from [torchao](https://github.com/pytorch/ao/). We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

@yiyixuxu
Copy link
Collaborator

@sayakpaul ahh I missed it! sorry! very nice!

@sayakpaul sayakpaul merged commit 2d9ccf3 into main Aug 23, 2024
18 checks passed
@sayakpaul sayakpaul deleted the fuse-flux branch August 23, 2024 05:24
@ngaloppo
Copy link

ngaloppo commented Oct 15, 2024

@sayakpaul This feature doesn't seem to work together with torchao's quantize_(transformer, int8_weight_only()) quantization. Is that expected? I get an error from torchao:

File "/Users/sysperf/miniforge3/envs/flux/lib/python3.11/site-packages/torchao/utils.py", line 389, in _dispatch__torch_dispatch__
    raise NotImplementedError(f"{cls.__name__} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.cat.default

@sayakpaul
Copy link
Member Author

Please redirect the issue to https://github.com/sayakpaul/diffusers-torchao

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* start fusing flux.

* test

* finish fusion

* fix-copues
@llcnt llcnt mentioned this pull request Apr 25, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants