-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
76549e1
to
379c7a8
Compare
can you add chat_template_kwargs to cli argument as well? |
I added it. I tested it using updated command (You might want to check the escaping of the double quotes): |
d1861c4
to
01b58b5
Compare
Very useful for Qwen3 series. +1 for this feature! |
2d1d595
to
28dd37e
Compare
Cannot work with the --chat_template_kwargs option from CLI: error: invalid argument: --chat-template-kwargs |
@ggerganov is there any reason why this PR has not been accepted and merged yet? |
This PR is implemented only for llama-server and its webui. llama-cli has unresolved bugs that prevent me from enabling this feature. |
Hope you'll integrate it for the CLI environment soon, thanks! |
28dd37e
to
d44d099
Compare
d44d099
to
5b3de5d
Compare
would be nice a enable_thinking checkbox or something like that on llama cpp webui too |
@celsowm Lack of eyes on this area would be my guess. With 438 open PRs (many obsolete), I've kind of come to accept I'll need to pull in some PRs of interest to me when building. |
vLLM and SGLang have got this feature the first day Qwen3 released. At the same time many useful enhancement and fix PRs become obsolete just because of delay on merging here in llama.cpp community. Really sad about that. |
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
coding standard: cosmetic changes Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
This is so necessary when dealing with Qwen3! Can't wait to see this merged and be able to use the latest version with this <3 |
FYI now that #13573 is merged, the official template should work as expected and there's no need to use the modified one. We're one step closed to having proper support for Qwen3. One remaining thing is the correct handling of the ... part in previous assistant messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matteoserva Sorry for only getting to review this now! First couple of thoughts:
-
While being able to pass kwargs will be very useful in general (🎉), I think for the particular case of enable_thinking we will probably want to special case it, since there's a few thinking models around, some of which force the thinking tag open (a common --disable-thinking flag could close their tags, and set the
enable_thinking: false
for Qwen3).Building on
server
: streaming of tool calls and thoughts when--jinja
is on #12379 would make this easy for instance: 506e712 -
We'll want to pass the params even when there are tools (right now only setup in common_chat_params_init_without_tools). Ideally after the diffs PR goes in 😅
@@ -73,6 +74,7 @@ struct common_chat_templates_inputs { | |||
bool parallel_tool_calls = false; | |||
bool extract_reasoning = true; | |||
std::chrono::system_clock::time_point now = std::chrono::system_clock::now(); | |||
std::map<std::string, std::string> chat_template_kwargs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a map from string keys to stringified json values. Why not just store the stringified top-level json? (maybe name it chat_additional_context_json
or chat_template_kwargs_json
?)
Why not accept this PR to get the general "pass kwargs to chat template" feature at first, then implement the |
This PR implements handling additional jinja parameters.
Used for example to set enable_thinking in Qwen3 models.
The official template is still partially compatible. I modified it to use only supported features.
It's here:
https://pastebin.com/16ZpCLHkhttps://pastebin.com/GGuTbFRcAnd should be loaded with
llama-server --jinja --chat-template-file {template_file}
It fixes #13160 and #13189
Test it with:
{"prompt":"\n<|im_start|>user\nGive me a short introduction to large language models.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"}