-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Add DeepSeek-R1-0528 function call chat template #18874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeepSeek-R1-0528 function call chat template #18874
Conversation
Signed-off-by: 许文卿 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: 许文卿 <[email protected]>
Signed-off-by: 许文卿 <[email protected]>
Signed-off-by: 许文卿 <[email protected]>
I use the following command to start server
and curl the server with the body
but I got the result as follows:
it seems like tool parser failed to contract the tool_calls parameters, have I use the wrong command ? |
@markluofd you can add tool_choice="required" in your request. |
failed too, seems like the response is not a json format😂
response as:
I extract the prompt from vllm log as:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, thanks
I'm encountering an issue when trying to use the DeepSeek-R1-0528-Qwen3-8B model. It appears unsupported, returning Error 400:
|
Thanks for adding the support, @Xu-Wenqing. Btw, could you paste the test results in the PR description? Also do you want to include the updated version in this PR? It's fine to have another one to include the updated template. |
I'm experiencing the same with DeepSeek-R1-0528-Qwen3-8B as @NaiveYan is. IDK if it helps, but here's an ollama chat template that has tool calling working with this model: https://ollama.com/okamototk/deepseek-r1:8b/blobs/e94a8ecb9327 |
@houseroad @wukaixingxp @markluofd @NaiveYan @alllexx88 Sorry for the late reply. The past few days were Chinese Dragon Boat Festival, I didn’t check messages. I’ll try out the chat template on some test datasets again. Meanwhile, seems DeepSeek update DeepSeek-R1-0528 chat template: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/commit/4236a6af538feda4548eca9ab308586007567f52#d2h-846292 I will also update the template here. |
Error code: 400 - {'object': 'error', 'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None', 'type': 'BadRequestError', 'param': None, 'code': 400}, if i use langchain will tool call agent, it will get the following error. vllm serve /deepseek-ai/DeepSeek-R1-0528-Qwen3-8B --tensor-parallel-size 8 --host 0.0.0.0 --port 10001 --api-key none --rope-scaling '{"factor": 2.0, "original_max_position_embeddings": 32768, "rope_type": "yarn"}' --gpu-memory-utilization 0.9 --enable-reasoning --reasoning-parser deepseek_r1 --guided_decoding_backend guidance --enable-auto-tool-choice --tool-call-parser deepseek_v3 --chat-template /home/ubuntu/wzr/LLM-MODELS/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/tool_chat_template_deepseekr1.jinja --served-model-name DeepSeek-R1 |
I tried --tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekr1.jinja, and got Error code: 400 - {'object': 'error', 'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None', 'type': 'BadRequestError', 'param': None, 'code': 400}, if I use --tool-call-parser hermes, vllm backend shows: The following fields were present in the request but ignored: {'function_call'}. I am using langchain agent to make tool calls, QWQ-32B, Qwen3 serires works fine for me. |
docs/features/tool_calling.md
Outdated
* `deepseek-ai/DeepSeek-V3-0324` | ||
* `deepseek-ai/DeepSeek-R1-0528` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this look in the docs?
* `deepseek-ai/DeepSeek-V3-0324` | |
* `deepseek-ai/DeepSeek-R1-0528` | |
* `deepseek-ai/DeepSeek-V3-0324` (`--tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekv3.jinja`) | |
* `deepseek-ai/DeepSeek-R1-0528` (`--tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekr1.jinja`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmellor Updated the markdown file.
Signed-off-by: 许文卿 <[email protected]>
Signed-off-by: 许文卿 <[email protected]>
Signed-off-by: 许文卿 <[email protected]>
@houseroad Updated the chat template, and added test results in description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Hi,
Is anybody got success using this model and this PR ? |
I got successed in tool calling. maybe you did not put the template file in the right path. |
Well that's an absolute path and I can read the file. You are using the same model ? I am running vllm 0.9.0.1 on NVIDIA B200 if that helps... |
I use docker and I put the template in and use docker run to start the model and it worked. you can follow the bash: docker run -d \
--name vllm-deepseek-r1 \
--restart unless-stopped \
--runtime=nvidia \
--health-cmd="wget -qO- http://localhost:12345/v1/models >/dev/null 2>&1 || exit 1" \
--health-interval=30s \
--health-timeout=5s \
--health-retries=3 \
--health-start-period=300s \
--gpus all \
-p 12345:12345 \
--ipc=host \
--log-driver json-file \
--log-opt max-size=100m \
--log-opt max-file=3 \
-v /data1/models:/data1/models \
-v /data/vllm-cache:/data/vllm-cache \
-v /data/app/vllm/log:/workspace/logs \
-v /data/vllm-extra/tool_chat_template_deepseekr1.jinja:/vllm-workspace/examples/tool_chat_template_deepseekr1.jinja:ro \
-e VLLM_CACHE_ROOT=/data/vllm-cache \
-e VLLM_WORKER_MULTIPROC_METHOD=spawn \
-e VLLM_MARLIN_USE_ATOMIC_ADD=1 \
-e HF_HUB_OFFLINE=1 \
-e VLLM_USE_MODELSCOPE=true \
-e OMP_NUM_THREADS=1 \
-e VLLM_USE_V1=1 \
vllm/vllm-openai:latest \
--host 0.0.0.0 \
--port 12345 \
--model /data1/models/DeepSeek-R1-0528 \
--served-model-name deepseek-reasoning \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.98 \
--max-model-len 131071 \
--max-seq-len-to-capture 8192 \
--max-num-seqs 16 \
--enable-chunked-prefill \
--enable-prefix-caching \
--enable-auto-tool-choice \
--tool-call-parser deepseek_v3 \
--chat-template examples/tool_chat_template_deepseekr1.jinja \
--trust-remote-code |
Is this test using BFCL FC or Prompt mode? |
Hi all, It seems this PR was merged even though there was still ambiguity around its validity (above comments) When I run the following command: vllm serve cognitivecomputations/DeepSeek-R1-0528-AWQ The tool call JSON only gets parsed from the model response about 50% of the time. See below for an example false output:
As you can see, when using this updated chat template and recommended tool parser, we still get an empty function_call/tool_call in the response. Does anyone recommend any solutions? Thanks |
DeepSeek-R1-0528 model support function call, add function call chat template.
Usage:
Function Call Test
Use Berkeley Function Calling Leaderboard to evaluate function call template here.
Evaluation Result:
🦍 Model: DeepSeek-R1-0528
🔍 Running test: simple
✅ Test completed: simple. 🎯 Accuracy: 0.9325
Number of models evaluated: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 41.24it/s]
📈 Aggregating data to generate leaderboard score table...
🏁 Evaluation completed. See /Users/xuwenqing/function_call_eval/score/data_overall.csv for overall evaluation results on BFCL V3.