Skip to content

Fail to call Qwen3-Coder model with tool calling enabled #8744

@ontecAI

Description

@ontecAI

Before submitting your bug report

Relevant environment info

- OS: Windows 11
- Continue version: 1.2.10
- IDE version: VS Code
- Model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
- config:
  
  - name: qwen3-coder-30b
    provider: vllm
    model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
    apiBase: http://ai01-dx-mr300-b04.dev1.xxx.at:7777/v1/
    apiKey: xxxx
    roles:
      - apply
      - chat
      - edit
    capabilities:
      - tool_use

  - name: qwen3-coder-30b-litellm
    provider: openai
    model: qwen3-coder-30b-fp8
    apiBase: https://litellm.dev1.xxx.at/v1/
    apiKey: xxxx
    roles:
      - apply
      - chat
      - edit
    capabilities:
      - tool_use
  
  OR link to agent in Continue hub:

Description

[@continuedev] error: Error streaming response: list index out of range {"context":"llm_stream_chat","model":"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8","provider":"vllm","useOpenAIAdapter":true,"streamEnabled":true,"templateMessages":true}

[Extension Host] Error: Error streaming response: list index out of range
at parseDataLine (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:131119:15)
at parseSseLine (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:131136:33)
at streamSse (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:131151:37)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async Vllm2._streamChat (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:244427:26)
at async Vllm2._streamComplete (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:244338:26)
at async Vllm2.streamChat (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:243942:30)
at async llmStreamChat (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:722153:17)
at async Wd.handleMessage [as value] (c:\Users\mbuchner.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:760618:27)

To reproduce

Chat with the model - think it happens when tool calling ...

vllm runtime - docker compose

services:
  vllm_openai:
    image: vllm/vllm-openai:v0.11.0
    restart: unless-stopped
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [ gpu ]
    volumes:
      - /opt/projects/.cache:/root/.cache/huggingface # the model is saved here
    ipc: host
    environment:
      HUGGING_FACE_HUB_TOKEN: hf_xxxx
    ports:
      - 7777:8000
    networks:
      - traefik-public
    command:
      - --model
      - "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8"
      - --api-key
      - "xxx"
      - --gpu-memory-utilization
      - "0.90"
      - --kv-cache-dtype
      - "fp8"
      - --enable-chunked-prefill
      - --swap-space
      - "16"
      - --trust-remote-code
      - --disable-log-request
      - --tensor-parallel-size
      - "1"        # should be same as the number of GPUs
      - --enable-auto-tool-choice       # enable tools
      #- --tool-call-parser
      #- "qwen3_xml"    # see vllm docs - recomended setting for qwen3 coder
      - --max-num-seqs
      - "4"              # batch size for throughput - decrease = lesser memory (as huge context size set it to 16 - default 128)

Log output

[@continuedev] error: Error streaming response: list index out of range {"context":"llm_stream_chat","model":"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8","provider":"vllm","useOpenAIAdapter":true,"streamEnabled":true,"templateMessages":true}

[Extension Host] Error: Error streaming response: list index out of range
	at parseDataLine (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:131119:15)
	at parseSseLine (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:131136:33)
	at streamSse (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:131151:37)
	at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
	at async Vllm2._streamChat (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:244427:26)
	at async Vllm2._streamComplete (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:244338:26)
	at async Vllm2.streamChat (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:243942:30)
	at async llmStreamChat (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:722153:17)
	at async Wd.handleMessage [as value] (c:\Users\mbuchner\.vscode\extensions\continue.continue-1.2.10-win32-x64\out\extension.js:760618:27)

Metadata

Metadata

Labels

area:chatRelates to chat interfaceide:vscodeRelates specifically to VS Code extensionkind:bugIndicates an unexpected problem or unintended behavioros:windowsHappening specifically on Windows

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions