Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160

celsowm · 2025-04-29T01:06:38Z

Name and Version

llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 5199 (ced44be)
built with MSVC 19.41.34120.0 for x64

Operating systems

windows 11

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server Qwen3-14B-Q5_K_M.gguf

Problem description & steps to reproduce

param enable_thinking: false on llama-server has not effect at all when you send on request. (despite been on Alibaba examples)

SGLang and VLLM support this by "chat_template_kwargs":

https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes
https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes

curl http://localhost:30000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

First Bad Commit

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

gnusupport · 2025-04-29T05:59:33Z

As well, I would like to disable thinking in Qwen3

zhouxihong1 · 2025-04-29T07:42:11Z

You can add "/think" or "/no_think" in prompt control thinking，and "enable_thinking" parameter not working

WayneJin88888 · 2025-04-29T07:56:40Z

Second input with /no_think

user_input_2 = "Then, how many r's in blueberries? /no_think"

(ps: /no think is working as well )

## Third input with /think
user_input_3 = "Really? /think"

woshitoutouge · 2025-04-29T08:24:20Z

You can add "/think" or "/no_think" in prompt control thinking，and "enable_thinking" parameter not working

right. but it affected the output format

snichols · 2025-04-29T09:38:26Z

As a workaround, add the following to the beginning of your assistant message:

<think>

</think>

This is how enable_thinking is implemented in the jinja template.

celsowm · 2025-04-29T14:12:42Z

I hope "chat_template_kwargs": {"enable_thinking": false} be implemented in llama.cpp too

createthis · 2025-05-08T16:27:01Z

+1 It would be awesome to merge #13196

celsowm added the bug-unconfirmed label Apr 29, 2025

celsowm mentioned this issue Apr 29, 2025

Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp #13189

Open

qinxuye mentioned this issue Apr 29, 2025

ENH: optimize qwen3, support chat_template_kwargs for all engines xorbitsai/inference#3354

Merged

matteoserva linked a pull request Apr 29, 2025 that will close this issue

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160

celsowm commented Apr 29, 2025 •

edited

Loading

gnusupport commented Apr 29, 2025

zhouxihong1 commented Apr 29, 2025 •

edited

Loading

WayneJin88888 commented Apr 29, 2025

woshitoutouge commented Apr 29, 2025

snichols commented Apr 29, 2025

celsowm commented Apr 29, 2025

createthis commented May 8, 2025 •

edited

Loading

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160

Misc. bug: Qwen 3.0 "enable_thinking" parameter not working #13160

Comments

celsowm commented Apr 29, 2025 • edited Loading

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

gnusupport commented Apr 29, 2025

zhouxihong1 commented Apr 29, 2025 • edited Loading

WayneJin88888 commented Apr 29, 2025

Second input with /no_think

woshitoutouge commented Apr 29, 2025

snichols commented Apr 29, 2025

celsowm commented Apr 29, 2025

createthis commented May 8, 2025 • edited Loading

celsowm commented Apr 29, 2025 •

edited

Loading

zhouxihong1 commented Apr 29, 2025 •

edited

Loading

createthis commented May 8, 2025 •

edited

Loading