Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

matteoserva · 2025-05-09T13:10:48Z

Name and Version

611aa91

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA

Models

Qwen3

Problem description & steps to reproduce

I don't know if it's too soon but I'm opening this to keep track of the issue.
The original qwen3 template is not supported but the bug can be tested by using a modified template

The qwen3 template contains the following check (stripped down to the relevant section):

{%- if loop.index0 > ns.last_query_index %} 
{%- if loop.last or (not loop.last and reasoning_content) %} 
 KEEP REASONING TOKENS

Meaning that in the common case the tokens are kept when the last role is assistant and the tokens are discarded when the last role is user.

The problem is that at the start of the turn, the following pseudo-code is executed:

- messages.append(user_message)
- fmt_past_msg = apply_chat_template(messages)
- messages.append(assistant_messages)
- fmt_new_msg = apply_chat_template(messages)
- diff = fmt_new_msg - fmt_past_msg

The diff is not computed correctly since the assistant message used in v1 has the thinking tokens preserved and the assistant message in v2 has the thinking tokens removed.

Relevant section of the code:

llama.cpp/common/chat.cpp

Line 320 in 611aa91

    
           ss << fmt_new_msg.substr(fmt_past_msg.size(), fmt_new_msg.size() - fmt_past_msg.size());

First Bad Commit

No response

Relevant log output

std::string common_chat_format_single(...) {
[...]
fmt_past_msg = common_chat_templates_apply(tmpls, inputs).prompt;
[...]
inputs.messages.push_back(new_msg);
[...]
auto fmt_new_msg = common_chat_templates_apply(tmpls, inputs).prompt;
// get the diff part
ss << fmt_new_msg.substr(fmt_past_msg.size(), fmt_new_msg.size() - fmt_past_msg.size());

The text was updated successfully, but these errors were encountered:

matteoserva added the bug-unconfirmed label May 9, 2025

matteoserva mentioned this issue May 9, 2025

Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

matteoserva commented May 9, 2025

Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

Comments

matteoserva commented May 9, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output