Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 57 additions & 15 deletions vllm/model_executor/model_loader/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@

from vllm.config import ModelConfig, ModelImpl
from vllm.logger import init_logger
from vllm.model_executor.layers.quantization.awq import AWQConfig
from vllm.model_executor.layers.quantization.base_config import (
QuantizationConfig)
from vllm.model_executor.layers.quantization.bitsandbytes import (
BitsAndBytesConfig)
from vllm.model_executor.models import ModelRegistry
from vllm.model_executor.models.adapters import (as_classification_model,
as_embedding_model,
as_reward_model)
from vllm.model_executor.models.utils import WeightsMapper

logger = init_logger(__name__)

Expand Down Expand Up @@ -153,19 +157,57 @@ def get_sub_modules(self,

def configure_quant_config(quant_config: QuantizationConfig,
model_class: Type[nn.Module]):
"""
Pass packed_modules_mapping by reference to quant_config so that
quant_config can properly match fused modules

Note that model attributes are passed by reference to quant_config,
enabling them to be updated by model_class.__new__ (ex. chatglm, qwen)
"""
packed_mapping = getattr(model_class, "packed_modules_mapping", None)
if packed_mapping is not None:
# pass packed_modules_mapping by reference to quant_config
quant_config.packed_modules_mapping = packed_mapping
else:
logger.warning(
"The model class %s has not defined `packed_modules_mapping`, "
"this may lead to incorrect mapping of quantized or ignored "
"modules", model_class.__name__)
def _configure_packed_modules_mapping():
"""
Pass packed_modules_mapping by reference to quant_config so that
quant_config can properly match fused modules

Note that model attributes are passed by reference to quant_config,
enabling them to be updated by model_class.__new__ (ex. chatglm, qwen)
"""
packed_mapping = getattr(model_class, "packed_modules_mapping", None)
if packed_mapping is not None:
# pass packed_modules_mapping by reference to quant_config
quant_config.packed_modules_mapping = packed_mapping
else:
logger.warning(
"The model class %s has not defined `packed_modules_mapping`, "
"this may lead to incorrect mapping of quantized or ignored "
"modules", model_class.__name__)
Comment on lines +161 to +177
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed after we added SupportsQuant (#13104), I thought getting the packed_modules_mapping from the model to the quant config was the main purpose of that. cc @kylesayrs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _configure_packed_modules_mapping function needs to remain in place until SupportsQuant has been added to all applicable models


def _configure_quant_skip_modules():
"""
Configures the quantization skip modules for the model based on the
provided quantization configuration.
This function checks if the model class has a `hf_to_vllm_mapper`
attribute. If it does, it uses this mapper to update the list of
modules to be skip for different quantization.
configurations.
- For `BitsAndBytesConfig`, it updates the `llm_int8_skip_modules`.
- For `AWQConfig`, it updates the `modules_to_not_convert`.

"""

if getattr(model_class, "hf_to_vllm_mapper", None) is None:
return
hf_to_vllm_mapper: WeightsMapper = model_class.hf_to_vllm_mapper

# BitsAndBytes
if (isinstance(quant_config, BitsAndBytesConfig)
and quant_config.llm_int8_skip_modules):
quant_config.llm_int8_skip_modules = [
hf_to_vllm_mapper._map_name(module)
for module in quant_config.llm_int8_skip_modules
]
# AWQ
elif (isinstance(quant_config, AWQConfig)
and quant_config.modules_to_not_convert):
quant_config.modules_to_not_convert = [
hf_to_vllm_mapper._map_name(module)
for module in quant_config.modules_to_not_convert
]
# TODO: Supports more quantization types.
Comment on lines +196 to +210
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should introduce a common ignored_modules or ignored_prefixes to QuantizationConfig like packed_modules_mapping https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/base_config.py#L60-L66

Then each quant config can convert their specific llm_int8_skip_modules, modules_to_not_convert, etc in a canonical format in ignored_modules. This will also allow us to generalize the is_layer_skipped function

Copy link
Contributor

@kylesayrs kylesayrs Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd support an implementation like this as well. This current implementation could fail to properly map module names in nested models.

modules_to_not_convert = ["SubModel.A"]
SubModel.hf_to_vllm_mapper = Mapper(orig_to_new_prefix={"A": "B"})

Note that "SubModel.A" will not match because "SubModel.A" does not start with "A"

This is a fairly minor issue, but something to keep in mind.

Another implementation could look like this:

  1. Add a mutable ignored_modules attribute to QuantizationConfig
  2. At construction-time, using the method-specific constructor to populate the ignored_modules attribute from disk
  3. At initialize-time, within SupportsQuant, use the given model prefix and mapper to update the ignored_modules list with the proper model-specific mapping
    a. ignored_modules = [prefix + hf_to_vllm_mapper[module - prefix] for module in ignored_modules]

This has the advantage of further standardizing around the QuantizationConfig base, as well as supporting mapping with nested models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeejeelee Here's a WIP of what that might look like: #14635

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kylesayrs Can you provide an example?


_configure_packed_modules_mapping()
_configure_quant_skip_modules()
Loading