Skip to content

feat: options and ChatCompletionRequest add property enable_thinking #2940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xuanmiss
Copy link

@xuanmiss xuanmiss commented Apr 29, 2025

related issue: #2941

enable_thinking is used to control whether the Qwen3 model enables the thinking mode.

Thank you for taking time to contribute this pull request!
You might have already read the [contributor guide][1], but as a reminder, please make sure to:

  • Sign the contributor license agreement
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

… enable_thinking is used to control whether the Qwen3 model enables the thinking mode.

Signed-off-by: xuanmiss <[email protected]>
/**
* Whether to enable the thinking mode
*/
private @JsonProperty("enable_thinking") Boolean enableThinking;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do with these differences emerging, in particular in the reasoning models. This option is not part of openai.

Maybe we can have a subclass of OpenAiChatOptions such as QwenAiChatOptions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about utilizing something like the template pattern? Apart from the openai compatible apis, in general, most of the models just have a few differences on request and response objects

@apappascs
Copy link
Contributor

Thank you for the contribution @xuanmiss . Could you please add some integration tests ?

Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

As a temporary solution, you can add /think in the end of your prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants