Skip to content

can model Qwen/Qwen-VL-Chat work well? #962

@wangschang

Description

@wangschang

when i use Qwen/Qwen-VL-Chat I do not know why!

throw a error

Traceback (most recent call last): File "test.py", line 20, in <module> model = LLM(model=model_path, tokenizer=model_path,tokenizer_mode='slow',tensor_parallel_size=1,trust_remote_code=True) File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 66, in __init__ self.llm_engine = LLMEngine.from_engine_args(engine_args) File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 220, in from_engine_args engine = cls(*engine_configs, File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 101, in __init__ self._init_workers(distributed_init_method) File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 133, in _init_workers self._run_workers( File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 470, in _run_workers output = executor(*args, **kwargs) File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/worker/worker.py", line 67, in init_model self.model = get_model(self.model_config) File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/model_executor/model_loader.py", line 57, in get_model model.load_weights(model_config.model, model_config.download_dir, File "/usr/local/miniconda3/lib/python3.8/site-packages/vllm/model_executor/models/qwen.py", line 308, in load_weights param = state_dict[name] KeyError: 'transformer.visual.positional_embedding'

the code is

`from vllm import LLM, SamplingParams
from transformers import AutoModelForCausalLM, AutoTokenizer,AutoConfig
import time

model_path="Qwen/Qwen-VL-Chat"

model = LLM(model=model_path, tokenizer=model_path,tokenizer_mode='slow',tensor_parallel_size=1,trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, legacy=True, trust_remote_code=True)

sampling_params = SamplingParams(temperature=0,max_tokens=8096)
start=time.time()
prompts = ["你好!"]
outputs = model.generate(prompts, sampling_params)
end = time.time()
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
length = len(generated_text)
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
print(end-start)
cost = end-start
print(f"{length/cost}tokens/s")`

Metadata

Metadata

Assignees

No one assigned

    Labels

    new-modelRequests to new models

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions