Skip to content

[Usage]: How to access mlp layer using the current version vllm(0.4.0) #8278

@waterluck

Description

@waterluck

Your current environment


Description:

I am currently updating code that was written based on an older version of vllm (version 0.2.7). In the previous implementation, I accessed the mlp layer using the following code snippet:

obj = model.llm_engine.driver_worker.model_runner.model.model.layers[i].mlp

However, after updating to the latest version of vllm, this line now raises the following error:

AttributeError: 'LLMEngine' object has no attribute 'driver_worker'

It seems that the architecture of vllm has changed in the newer version, and I am unsure how to access the mlp layer now.

Below is the relevant part of the code where I use this method:

from vllm import LLM, SamplingParams
model = LLM(model=args.model, tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True)

if args.activation_mask:
    activation_masks = torch.load(args.activation_mask)

for activation_mask, mask_lang in zip(activation_masks, mask_langs):
    if activation_mask:
        def factory(mask):
            def llama_forward(self, x):
                gate_up, _ = self.gate_up_proj(x)
                i = gate_up.size(-1)
                activation = F.silu(gate_up[:, :, : i // 2])
                activation.index_fill_(2, mask, 0)
                x = activation * gate_up[:, :, i // 2 :]
                x, _ = self.down_proj(x)
                return x

            def bloom_forward(self, x: torch.Tensor):
                x, _ = self.dense_h_to_4h(x)
                x = self.gelu_impl(x)
                x.index_fill_(2, mask, 0)
                x, _ = self.dense_4h_to_h(x)
                return x

            if is_llama:
                return llama_forward
            else:
                return bloom_forward

        for i, layer_mask in enumerate(activation_mask):
            if is_llama:
                obj = model.llm_engine.driver_worker.model_runner.model.model.layers[i].mlp
            else:
                obj = model.llm_engine.driver_worker.model_runner.model.transformer.h[i].mlp
            obj.forward = MethodType(factory(layer_mask.to('cuda')), obj)

  for lang in langs:
        texts, sampling_params, = load_dataset(lang, sampling_params)
        outputs = model.generate(texts, sampling_params)

Questions:

  1. What is the correct method to access the mlp layer in the new version of vllm?
  2. Has there been a change in how the model architecture is structured in the new versions? If so, could you please guide me on how to adjust the above code to work with the updated architecture?

Any guidance would be appreciated. Thanks!

How would you like to use vllm

I don't know how to integrate it with new version vllm.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions