-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As well, I would like to disable thinking in Qwen3 |
You can add "/think" or "/no_think" in prompt control thinking,and "enable_thinking" parameter not working |
Second input with /no_think
(ps: /no think is working as well )
|
right. but it affected the output format |
As a workaround, add the following to the beginning of your assistant message:
This is how enable_thinking is implemented in the jinja template. |
I hope |
+1 It would be awesome to merge #13196 |
Name and Version
llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 5199 (ced44be)
built with MSVC 19.41.34120.0 for x64
Operating systems
windows 11
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
param
enable_thinking: false
on llama-server has not effect at all when you send on request. (despite been on Alibaba examples)SGLang and VLLM support this by "chat_template_kwargs":
https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes
https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: