-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Closed as not planned
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
Launched server with:
vllm serve $MODEL --num-scheduler-steps 8
Sent the following request:
from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
# Completion API
stream = False
completion = client.completions.create(
model=model,
prompt="A robot may not injure a human being",
echo=False,
n=2,
stream=stream)
print("Completion results:")
if stream:
for c in completion:
print(c)
else:
print(completion)
Got the following output:
INFO: Finished server process [1668044]
INFO 08-28 19:29:45 server.py:222] vLLM ZMQ RPC Server was interrupted.
Future exception was never retrieved
future: <Future finished exception=RuntimeError('shape mismatch: value tensor of shape [2] cannot be broadcast to indexing result of shape [1, 1]')>
Traceback (most recent call last):
File "/home/rshaw/vllm/vllm/entrypoints/openai/rpc/server.py", line 111, in generate
async for request_output in results_generator:
File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 1050, in generate
async for output in await self.add_request(
File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 110, in generator
raise result
File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 52, in _log_task_completion
return_value = task.result()
^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 916, in run_engine_loop
result = task.result()
^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 859, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/engine/async_llm_engine.py", line 346, in step_async
output = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/executor/gpu_executor.py", line 178, in execute_model_async
output = await make_async(self.driver_worker.execute_model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/.pyenv/versions/3.11.9/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/worker/worker_base.py", line 327, in execute_model
output = self.model_runner.execute_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/worker/multi_step_model_runner.py", line 275, in execute_model
output = self._base_model_runner.execute_model(frozen_model_input,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/worker/model_runner.py", line 1489, in execute_model
output: SamplerOutput = self.model.sample(
^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/model_executor/models/llama.py", line 447, in sample
next_tokens = self.sampler(logits, sampling_metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/model_executor/layers/sampler.py", line 153, in forward
sample_results, maybe_sampled_tokens_tensor = _sample(
^^^^^^^^
File "/home/rshaw/vllm/vllm/model_executor/layers/sampler.py", line 771, in _sample
return _sample_with_torch(
^^^^^^^^^^^^^^^^^^^
File "/home/rshaw/vllm/vllm/model_executor/layers/sampler.py", line 633, in _sample_with_torch
sampled_token_ids_tensor[long_sample_indices] = \
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [2] cannot be broadcast to indexing result of shape [1, 1]
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity