Skip to content

Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matteoserva opened this issue May 9, 2025 · 0 comments

Comments

@matteoserva
Copy link
Contributor

Name and Version

611aa91

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA

Models

Qwen3

Problem description & steps to reproduce

I don't know if it's too soon but I'm opening this to keep track of the issue.
The original qwen3 template is not supported but the bug can be tested by using a modified template

The qwen3 template contains the following check (stripped down to the relevant section):

{%- if loop.index0 > ns.last_query_index %} 
{%- if loop.last or (not loop.last and reasoning_content) %} 
 KEEP REASONING TOKENS

Meaning that in the common case the tokens are kept when the last role is assistant and the tokens are discarded when the last role is user.

The problem is that at the start of the turn, the following pseudo-code is executed:

- messages.append(user_message)
- fmt_past_msg = apply_chat_template(messages)
- messages.append(assistant_messages)
- fmt_new_msg = apply_chat_template(messages)
- diff = fmt_new_msg - fmt_past_msg

The diff is not computed correctly since the assistant message used in v1 has the thinking tokens preserved and the assistant message in v2 has the thinking tokens removed.

Relevant section of the code:

ss << fmt_new_msg.substr(fmt_past_msg.size(), fmt_new_msg.size() - fmt_past_msg.size());

First Bad Commit

No response

Relevant log output

std::string common_chat_format_single(...) {
[...]
fmt_past_msg = common_chat_templates_apply(tmpls, inputs).prompt;
[...]
inputs.messages.push_back(new_msg);
[...]
auto fmt_new_msg = common_chat_templates_apply(tmpls, inputs).prompt;
// get the diff part
ss << fmt_new_msg.substr(fmt_past_msg.size(), fmt_new_msg.size() - fmt_past_msg.size());
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant