-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: options and ChatCompletionRequest add property enable_thinking #2940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… enable_thinking is used to control whether the Qwen3 model enables the thinking mode. Signed-off-by: xuanmiss <[email protected]>
5b67970
to
0681646
Compare
/** | ||
* Whether to enable the thinking mode | ||
*/ | ||
private @JsonProperty("enable_thinking") Boolean enableThinking; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to do with these differences emerging, in particular in the reasoning models. This option is not part of openai.
Maybe we can have a subclass of OpenAiChatOptions such as QwenAiChatOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about utilizing something like the template pattern? Apart from the openai compatible apis, in general, most of the models just have a few differences on request and response objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apappascs can you elaborate more please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apappascs can you elaborate more please?
Thank you for the contribution @xuanmiss . Could you please add some integration tests ? Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.
As a temporary solution, you can add |
This does seem a bit complicated. Although various model providers or deployment inference model services like vllm and SGLang are compatible with OpenAI's API format and protocol, there might still be some parameter differences depending on the provider and model. For example, as shown in the documentation structure, the qwen3 model deployed by vllm places additional parameters in chat_template_kwargs. I tested an inference model API service from modelScope with the following parameter structure: curl --request POST \
--url https://api-inference.modelscope.cn/v1/chat/completions \
--header 'Authorization: Bearer token' \
--header 'Content-Type: application/json' \
--data '{
"model": "Qwen/Qwen3-8B",
"messages": [
{
"role": "user",
"content": "Give me a short introduction to large language models."
}
],
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"stream": true,
"max_tokens": 8192,
"presence_penalty": 1.5,
"enable_thinking": true
}' Therefore, we might need to consider how to handle this more appropriately, ensuring sufficient flexibility for both the caller and client sides. |
why not continue spring-ai-qwen-spring-boot-starter? |
Should add a way to extend request parameters for model calls, it can be a map |
Because most model vendors support openai format requests, but there are some specific request parameters that are not part of the standard. If there is a field for extended request parameters, there is no need to split out more model integrations. |
That's definitely something could be done (some models already had a map for extra params) but in most of the cases an extra request param has also an extra field or different type on the response (sometimes depended sometimes not). |
如果openai将来也增加请求参数,spring AI作为集成方案是要跟着更新然后升级版本呢,你们都已经知道了不同模型可能有需要的额外参数,有这样需要的模型也不一定会在你们支持的starter中,如果每有一个这样的模型出现,都需要创建这样的starter去适配,你们觉得有这样的必要吗? 在ChatOptions中增加一个扩展Map,可以解决所有模型这样的需求,为什么还需要特定的去给指定模型建立一个starter呢 |
related issue: #2941
enable_thinking is used to control whether the Qwen3 model enables the thinking mode.
Thank you for taking time to contribute this pull request!
You might have already read the [contributor guide][1], but as a reminder, please make sure to:
main
branch and squash your commits