-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Labels
usageHow to use vllmHow to use vllm
Description
Your current environment
Description:
I am currently updating code that was written based on an older version of vllm
(version 0.2.7). In the previous implementation, I accessed the mlp
layer using the following code snippet:
obj = model.llm_engine.driver_worker.model_runner.model.model.layers[i].mlp
However, after updating to the latest version of vllm
, this line now raises the following error:
AttributeError: 'LLMEngine' object has no attribute 'driver_worker'
It seems that the architecture of vllm
has changed in the newer version, and I am unsure how to access the mlp
layer now.
Below is the relevant part of the code where I use this method:
from vllm import LLM, SamplingParams
model = LLM(model=args.model, tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True)
if args.activation_mask:
activation_masks = torch.load(args.activation_mask)
for activation_mask, mask_lang in zip(activation_masks, mask_langs):
if activation_mask:
def factory(mask):
def llama_forward(self, x):
gate_up, _ = self.gate_up_proj(x)
i = gate_up.size(-1)
activation = F.silu(gate_up[:, :, : i // 2])
activation.index_fill_(2, mask, 0)
x = activation * gate_up[:, :, i // 2 :]
x, _ = self.down_proj(x)
return x
def bloom_forward(self, x: torch.Tensor):
x, _ = self.dense_h_to_4h(x)
x = self.gelu_impl(x)
x.index_fill_(2, mask, 0)
x, _ = self.dense_4h_to_h(x)
return x
if is_llama:
return llama_forward
else:
return bloom_forward
for i, layer_mask in enumerate(activation_mask):
if is_llama:
obj = model.llm_engine.driver_worker.model_runner.model.model.layers[i].mlp
else:
obj = model.llm_engine.driver_worker.model_runner.model.transformer.h[i].mlp
obj.forward = MethodType(factory(layer_mask.to('cuda')), obj)
for lang in langs:
texts, sampling_params, = load_dataset(lang, sampling_params)
outputs = model.generate(texts, sampling_params)
Questions:
- What is the correct method to access the
mlp
layer in the new version ofvllm
? - Has there been a change in how the model architecture is structured in the new versions? If so, could you please guide me on how to adjust the above code to work with the updated architecture?
Any guidance would be appreciated. Thanks!
How would you like to use vllm
I don't know how to integrate it with new version vllm.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
usageHow to use vllmHow to use vllm