[Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs #12711

kylesayrs · 2025-02-03T21:37:49Z

Purpose

Fix type error in use_mla which was causing models with quantization and without config_groups to fail
Support loading models which do not have config_groups in the compressed tensors config. While these models aren't accelerated, they can still be generated from valid recipes in LLM Compressor and are useful for testing.

Changes

Fix type error in use_mla
Add warning and return no scheme if the model is attempted to load without quantization

Testing

Tested nm-testing/Qwen2-VL-2B-Instruct-Sparse-0.6
- Confirmed that MLA is indeed disabled in the fallback case

WARNING 02-03 16:37:36 config.py:1008] compressed-tensors MLA support requires fp8 activations and weights in group 'group_0', but got activations type 'None' and weights type 'int'.

Tested neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-quantized.w4a16

github-actions · 2025-02-03T21:38:00Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Kyle Sayers <[email protected]>

This reverts commit d5ccf6233c3afaf967566d664681e140e1be32fe. Signed-off-by: Kyle Sayers <[email protected]>

…essed Tensors configs (vllm-project#12711)

kylesayrs requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners February 3, 2025 21:37

kylesayrs force-pushed the kylesayrs/fix-use_mla-type_error branch from 8a63916 to 92f4d0f Compare February 3, 2025 21:52

kylesayrs added 3 commits February 3, 2025 17:00

support pure-sparsity configs

7e5bafc

Signed-off-by: Kyle Sayers <[email protected]>

simplify line

3d35fc3

Signed-off-by: Kyle Sayers <[email protected]>

Revert "simplify line"

d3c6349

This reverts commit d5ccf6233c3afaf967566d664681e140e1be32fe. Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs force-pushed the kylesayrs/fix-use_mla-type_error branch from a864fd0 to d3c6349 Compare February 3, 2025 22:00

kylesayrs changed the title ~~[Quant] Support loading pure-sparsity Compressed Tensors configs~~ [Quant] Fix use_mla default and support loading pure-sparsity Compressed Tensors configs Feb 3, 2025

kylesayrs mentioned this pull request Feb 3, 2025

[Model][Quant] Fix GLM, Fix fused module mappings for quantization #12634

Merged

kylesayrs changed the title ~~[Quant] Fix use_mla default and support loading pure-sparsity Compressed Tensors configs~~ [Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs Feb 3, 2025

mgoin approved these changes Feb 4, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 4, 2025

simon-mo merged commit 4896d0c into vllm-project:main Feb 4, 2025
58 of 65 checks passed

kylesayrs deleted the kylesayrs/fix-use_mla-type_error branch February 5, 2025 01:17

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Quant] Fix use_mla TypeError and support loading pure-sparsity Compr…

76ccf4a

…essed Tensors configs (vllm-project#12711)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs #12711

[Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs #12711

Uh oh!

kylesayrs commented Feb 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs #12711

[Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs #12711

Uh oh!

Conversation

kylesayrs commented Feb 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

github-actions bot commented Feb 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kylesayrs commented Feb 3, 2025 •

edited by github-actions bot

Loading