Skip to content

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
celsowm opened this issue Apr 29, 2025 · 7 comments · May be fixed by #13196
Open

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160

celsowm opened this issue Apr 29, 2025 · 7 comments · May be fixed by #13196

Comments

@celsowm
Copy link

celsowm commented Apr 29, 2025

Name and Version

llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 5199 (ced44be)
built with MSVC 19.41.34120.0 for x64

Operating systems

windows 11

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server Qwen3-14B-Q5_K_M.gguf

Problem description & steps to reproduce

param enable_thinking: false on llama-server has not effect at all when you send on request. (despite been on Alibaba examples)

SGLang and VLLM support this by "chat_template_kwargs":

https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes
https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes

curl http://localhost:30000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

First Bad Commit

No response

Relevant log output

@gnusupport
Copy link

As well, I would like to disable thinking in Qwen3

@zhouxihong1
Copy link

zhouxihong1 commented Apr 29, 2025

You can add "/think" or "/no_think" in prompt control thinking,and "enable_thinking" parameter not working

@WayneJin88888
Copy link

Second input with /no_think

user_input_2 = "Then, how many r's in blueberries? /no_think"

(ps: /no think is working as well )

## Third input with /think
user_input_3 = "Really? /think"

@woshitoutouge
Copy link

You can add "/think" or "/no_think" in prompt control thinking,and "enable_thinking" parameter not working

right. but it affected the output format

@snichols
Copy link
Contributor

As a workaround, add the following to the beginning of your assistant message:

<think>

</think>

This is how enable_thinking is implemented in the jinja template.

@celsowm
Copy link
Author

celsowm commented Apr 29, 2025

I hope "chat_template_kwargs": {"enable_thinking": false} be implemented in llama.cpp too

@createthis
Copy link

createthis commented May 8, 2025

+1 It would be awesome to merge #13196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants