You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know if it's too soon but I'm opening this to keep track of the issue.
The original qwen3 template is not supported but the bug can be tested by using a modified template
The qwen3 template contains the following check (stripped down to the relevant section):
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
KEEP REASONING TOKENS
Meaning that in the common case the tokens are kept when the last role is assistant and the tokens are discarded when the last role is user.
The problem is that at the start of the turn, the following pseudo-code is executed:
The diff is not computed correctly since the assistant message used in v1 has the thinking tokens preserved and the assistant message in v2 has the thinking tokens removed.
Name and Version
611aa91
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA
Models
Qwen3
Problem description & steps to reproduce
I don't know if it's too soon but I'm opening this to keep track of the issue.
The original qwen3 template is not supported but the bug can be tested by using a modified template
The qwen3 template contains the following check (stripped down to the relevant section):
Meaning that in the common case the tokens are kept when the last role is
assistant
and the tokens are discarded when the last role isuser
.The problem is that at the start of the turn, the following pseudo-code is executed:
The
diff
is not computed correctly since the assistant message used in v1 has the thinking tokens preserved and the assistant message in v2 has the thinking tokens removed.Relevant section of the code:
llama.cpp/common/chat.cpp
Line 320 in 611aa91
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: