Add fused QKV `HQQ` `triton_mm` test #306

jeromeku · 2024-06-03T17:12:53Z

Add fused QKV `HQQ` `triton_mm` test

Add test / example of how to fuse qkv projections when using the hqq triton_mm fused int4 matmul kernel.

Description

Common pattern in transformer models is to fuse query, key, and value weights such that qkv projection can be fused into single matmul.

In order to use torchao.prototype.hqq.triton_mixed_mm during int4 quantized model training / inference, query, key, and value weights are quantized and packed. This introduces additional complexities when additionally applying this fusion pattern.

Contribution

This PR adds a test that demonstrates how to properly fuse and pack query, key, and v projection weights such that the fused weight can be used with the triton_mixed_mm kernel.

Tests for equivalence of passing single fused qkv into kernel against individual q, k, v kernels for both transposed and non-transposed cases.

@msaroufim @mobicham @KeremTurgutlu

pytorch-bot · 2024-06-03T17:12:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/306

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 50a9ce7 with merge base 8dbf031 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Run Regression Tests / test (CUDA 2.2.2, linux.g5.12xlarge.nvidia.gpu, torch==2.2.2, cuda, 12.1) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128
Run Regression Tests / test (CUDA 2.3, linux.g5.12xlarge.nvidia.gpu, torch==2.3.0, cuda, 12.1) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128
Run Regression Tests / test (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://download.pytorc... / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

add fused qkv triton mm test

4908914

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2024

msaroufim approved these changes Jun 4, 2024

View reviewed changes

msaroufim added 2 commits June 3, 2024 23:07

Merge branch 'main' into hqq_triton_mm_fused

0f6eeea

Merge branch 'main' into hqq_triton_mm_fused

50a9ce7

msaroufim merged commit 729fa4d into pytorch:main Jun 4, 2024
10 of 13 checks passed

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Add fused QKV HQQ triton_mm test (pytorch#306)

1433cb4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fused QKV `HQQ` `triton_mm` test #306

Add fused QKV `HQQ` `triton_mm` test #306

Uh oh!

jeromeku commented Jun 3, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Add fused QKV HQQ triton_mm test #306

Add fused QKV HQQ triton_mm test #306

Uh oh!

Conversation

jeromeku commented Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!