feat: options and ChatCompletionRequest add property enable_thinking #2940

xuanmiss · 2025-04-29T17:00:23Z

related issue: #2941

enable_thinking is used to control whether the Qwen3 model enables the thinking mode.

Thank you for taking time to contribute this pull request!
You might have already read the [contributor guide][1], but as a reminder, please make sure to:

Sign the contributor license agreement
Rebase your changes on the latest main branch and squash your commits
Add/Update unit tests as needed
Run a build and make sure all tests pass prior to submission

… enable_thinking is used to control whether the Qwen3 model enables the thinking mode. Signed-off-by: xuanmiss <[email protected]>

markpollack · 2025-04-30T14:26:57Z

models/spring-ai-openai/src/main/java/org/springframework/ai/openai/OpenAiChatOptions.java

+	/**
+	 * Whether to enable the thinking mode
+	 */
+	private @JsonProperty("enable_thinking") Boolean enableThinking;


I'm not sure what to do with these differences emerging, in particular in the reasoning models. This option is not part of openai.

Maybe we can have a subclass of OpenAiChatOptions such as QwenAiChatOptions?

How about utilizing something like the template pattern? Apart from the openai compatible apis, in general, most of the models just have a few differences on request and response objects

@apappascs can you elaborate more please?

apappascs · 2025-05-02T09:17:33Z

Thank you for the contribution @xuanmiss . Could you please add some integration tests ?

Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

As a temporary solution, you can add /think in the end of your prompt.

xuanmiss · 2025-05-07T02:54:16Z

Thank you for the contribution @xuanmiss . Could you please add some integration tests ?感谢您的贡献。您能否添加一些集成测试？

Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.鉴于文档，这是否是正确的结构 https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes 并不清楚。
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'
As a temporary solution, you can add /think in the end of your prompt.作为临时解决方案，您可以在提示符末尾添加 /think。

This does seem a bit complicated. Although various model providers or deployment inference model services like vllm and SGLang are compatible with OpenAI's API format and protocol, there might still be some parameter differences depending on the provider and model. For example, as shown in the documentation structure, the qwen3 model deployed by vllm places additional parameters in chat_template_kwargs. I tested an inference model API service from modelScope with the following parameter structure:

curl --request POST \
  --url https://api-inference.modelscope.cn/v1/chat/completions \
  --header 'Authorization: Bearer token' \
  --header 'Content-Type: application/json' \
  --data '{
	"model": "Qwen/Qwen3-8B",
	"messages": [
		{
			"role": "user",
			"content": "Give me a short introduction to large language models."
		}
	],
	"temperature": 0.7,
	"top_p": 0.8,
	"top_k": 20,
	"stream": true,
	"max_tokens": 8192,
	"presence_penalty": 1.5,
	"enable_thinking": true
}'

Therefore, we might need to consider how to handle this more appropriately, ensuring sufficient flexibility for both the caller and client sides.

pigeon2049 · 2025-05-09T10:58:31Z

why not continue spring-ai-qwen-spring-boot-starter?
or create new package like spring-ai-starter-model-qwen

yangzehan · 2025-05-20T04:11:45Z

Should add a way to extend request parameters for model calls, it can be a map

yangzehan · 2025-05-20T04:12:56Z

Because most model vendors support openai format requests, but there are some specific request parameters that are not part of the standard. If there is a field for extended request parameters, there is no need to split out more model integrations.

apappascs · 2025-05-30T09:08:44Z

Because most model vendors support openai format requests, but there are some specific request parameters that are not part of the standard. If there is a field for extended request parameters, there is no need to split out more model integrations.

That's definitely something could be done (some models already had a map for extra params) but in most of the cases an extra request param has also an extra field or different type on the response (sometimes depended sometimes not).

on36 · 2025-06-03T09:09:06Z

如果openai将来也增加请求参数，spring AI作为集成方案是要跟着更新然后升级版本呢，你们都已经知道了不同模型可能有需要的额外参数，有这样需要的模型也不一定会在你们支持的starter中，如果每有一个这样的模型出现，都需要创建这样的starter去适配，你们觉得有这样的必要吗？在ChatOptions中增加一个扩展Map，可以解决所有模型这样的需求，为什么还需要特定的去给指定模型建立一个starter呢

coderliguoqing · 2025-08-07T03:40:27Z

Obviously, for the differentiation of models from different manufacturers, it is best to use extended parameters to support, and leave this part of the extended parameters to the person using spring-ai to decide, because in the end, what model to interact with is also determined by the user in the end; The same processing logic should be applied to the returned part, and as a depth user, I urgently need such support, without which I would even need to handle it myself via http request, isn't that ridiculous?

YunKuiLu · 2025-08-07T06:34:53Z

We can discuss the issue of extra_body under issue #3409.

Regarding this PR, I think we could use spring-ai-alibaba to support the qwen3 model.

feat: options and ChatCompletionRequest add property enable_thinking.…

0681646

… enable_thinking is used to control whether the Qwen3 model enables the thinking mode. Signed-off-by: xuanmiss <[email protected]>

xuanmiss force-pushed the feat-chatoptions branch from 5b67970 to 0681646 Compare April 29, 2025 17:02

markpollack reviewed Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: options and ChatCompletionRequest add property enable_thinking #2940

feat: options and ChatCompletionRequest add property enable_thinking #2940

xuanmiss commented Apr 29, 2025 •

edited

Loading

Uh oh!

markpollack Apr 30, 2025

Uh oh!

apappascs May 2, 2025

Uh oh!

markpollack May 6, 2025

Uh oh!

markpollack May 6, 2025

Uh oh!

apappascs commented May 2, 2025

Uh oh!

xuanmiss commented May 7, 2025 •

edited

Loading

Uh oh!

pigeon2049 commented May 9, 2025 •

edited

Loading

Uh oh!

yangzehan commented May 20, 2025

Uh oh!

yangzehan commented May 20, 2025

Uh oh!

apappascs commented May 30, 2025

Uh oh!

on36 commented Jun 3, 2025

Uh oh!

coderliguoqing commented Aug 7, 2025

Uh oh!

YunKuiLu commented Aug 7, 2025

Uh oh!

Uh oh!

feat: options and ChatCompletionRequest add property enable_thinking #2940

Are you sure you want to change the base?

feat: options and ChatCompletionRequest add property enable_thinking #2940

Conversation

xuanmiss commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markpollack Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

apappascs May 2, 2025

Choose a reason for hiding this comment

Uh oh!

markpollack May 6, 2025

Choose a reason for hiding this comment

Uh oh!

markpollack May 6, 2025

Choose a reason for hiding this comment

Uh oh!

apappascs commented May 2, 2025

Uh oh!

xuanmiss commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pigeon2049 commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangzehan commented May 20, 2025

Uh oh!

yangzehan commented May 20, 2025

Uh oh!

apappascs commented May 30, 2025

Uh oh!

on36 commented Jun 3, 2025

Uh oh!

coderliguoqing commented Aug 7, 2025

Uh oh!

YunKuiLu commented Aug 7, 2025

Uh oh!

Uh oh!

xuanmiss commented Apr 29, 2025 •

edited

Loading

xuanmiss commented May 7, 2025 •

edited

Loading

pigeon2049 commented May 9, 2025 •

edited

Loading