Pulse · PaddlePaddle/FastDeploy · GitHub

June 30, 2025 – July 7, 2025

Overview

57 Active pull requests

19 Active issues

40 Pull requests merged by 22 people

【Fearture】support qwen2 some func
#2740 merged Jul 8, 2025
[SOT] Remove BreakGraph with paddle.maximum
#2731 merged Jul 8, 2025
[Bug fix] fix complie bug when sm < 89
#2738 merged Jul 8, 2025
[Optimize] Optimize tensorwise fp8 performance
#2729 merged Jul 7, 2025
[iluvatar_gpu] Adapt for iluvatar gpu
#2684 merged Jul 7, 2025
support FastDeploy version setting
#2725 merged Jul 7, 2025
remove redundant install whl of fastdeploy
#2726 merged Jul 7, 2025
[RL] Check if the controller port is available
#2724 merged Jul 7, 2025
[Doc]Update eb45-0.3B minimum memory requirement
#2686 merged Jul 7, 2025
[LLM] support multi node deploy
#2708 merged Jul 6, 2025
修改XPU CI, test=model
#2721 merged Jul 6, 2025
fix bug. (#2718)
#2720 merged Jul 5, 2025
fix bug.
#2718 merged Jul 5, 2025
spec token map lazy.
#2715 merged Jul 4, 2025
[BugFix] fix paddle_git_commit_id error
#2714 merged Jul 4, 2025
add support QWQ enable_thinking
#2706 merged Jul 4, 2025
[CI] Add validation for MTP and CUDAGraph
#2710 merged Jul 4, 2025
添加XPU CI, test=model
#2701 merged Jul 4, 2025
Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue
#2707 merged Jul 4, 2025
[feature]add fd whl version info
#2698 merged Jul 4, 2025
[RL] update reschedule finish reason
#2709 merged Jul 4, 2025
[MTP] Support chunked_prefill in speculative decoding(MTP)
#2705 merged Jul 4, 2025
[Doc] modify reasoning_output docs
#2696 merged Jul 4, 2025
add quick benchmark script
#2703 merged Jul 4, 2025
[feat] support fa3 backend for pd disaggregated
#2695 merged Jul 3, 2025
[Bug] fix logger format
#2689 merged Jul 3, 2025
[doc] update docs
#2692 merged Jul 3, 2025
[doc] update docs
#2690 merged Jul 3, 2025
[Sync] Update to latest code
#2679 merged Jul 3, 2025
add --force-reinstall --no-cache-dir when pip install fastdeploy*.whl
#2682 merged Jul 2, 2025
Update gh-pages.yml
#2680 merged Jul 2, 2025
add wint2 performance
#2673 merged Jul 2, 2025
Update CI test cases
#2671 merged Jul 2, 2025
update iluvatar gpu fastdeploy whl
#2675 merged Jul 2, 2025
fix ci.yml
#2665 merged Jul 1, 2025
【Inference Optimize】Support ERNIE-4_5-300B-A47B-2BITS-Paddle model TP2/TP4 Inference
#2666 merged Jul 1, 2025
【Docs】fix speculative docs
#2669 merged Jul 1, 2025
Update kunlunxin_xpu.md
#2662 merged Jul 1, 2025
【Update Doc】update quantization doc
#2659 merged Jul 1, 2025
Update kunlunxin_xpu.md
#2657 merged Jul 1, 2025

17 Pull requests opened by 16 people

[WIP] optimzie wint2 moe_group_gemm.
#2661 opened Jul 1, 2025
Feat/blackwell sm100 support
#2670 opened Jul 1, 2025
update iluvatar gpu fastdeploy whl
#2674 opened Jul 2, 2025
Add with_output version AppendAttention
#2694 opened Jul 3, 2025
[GCU] Support gcu platform
#2702 opened Jul 3, 2025
[feat]add loadtimequantization modelloader
#2711 opened Jul 4, 2025
[Stop Sequences] support stop sequences
#2712 opened Jul 4, 2025
[RL Feature] add rl qwen model support
#2713 opened Jul 4, 2025
Support use safetensors with paddle.MmapStorage to load model files
#2730 opened Jul 7, 2025
add precision check for ci
#2732 opened Jul 7, 2025
[SOT] Make custom_op dy&st unified
#2733 opened Jul 7, 2025
[draft] change rejectionsampling topk=40
#2734 opened Jul 7, 2025
[SOT] Enable SOT Dy2St in Multimodal Model
#2735 opened Jul 7, 2025
[Bug fix] Fixed the garbled text issues in Qwen3-8B
#2737 opened Jul 7, 2025
Opt wint2
#2741 opened Jul 8, 2025
[Bug fix] fix the missing position args in expert_service.py
#2742 opened Jul 8, 2025
[Bug fix] fix attention rank init
#2743 opened Jul 8, 2025

6 Issues closed by 6 people

基于FastDeploy运行ernie-4.5-vl，在OpenAI的配置里[enable_thingking]参数不生效
#2727 closed Jul 7, 2025
FD运行PP-Vehicle模型推理结果与PaddleDetection运行模型推理结果不同
#2681 closed Jul 2, 2025
Support for CUDA 12.8 / Blackwell SM120
#2656 closed Jul 2, 2025
官网docker镜像作离线推理加载模型到94%时失败，可能跟libnvidia-ml相关
#2667 closed Jul 1, 2025
P800 docker run报错
#2660 closed Jul 1, 2025
fastdeploy-2.0.0a0 版本仅兼容 Paddle-3.1 么？
#2658 closed Jul 1, 2025

13 Issues opened by 12 people

ERNIE-4.5-VL-28B-A3B-Paddle 加载卡主不动，无论是单卡4090 48b还是双卡4090 48g都不行
#2739 opened Jul 7, 2025
ERNIE-4.5-VL-424B-A47B-Paddle加载卡住不动
#2723 opened Jul 6, 2025
OpenAI接口兼容性不佳以及一些其他问题
#2722 opened Jul 5, 2025
fastdeploy 部署erine-21B
#2704 opened Jul 4, 2025
Feature Request: Add Support for max_completion_tokens Parameter (OpenAI API Deprecation)
#2697 opened Jul 3, 2025
python -m fastdeploy.entrypoints.openai.api_server --model /root/fssd/PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Base-Paddle 执行失败
#2693 opened Jul 3, 2025
Feature Request: FastDeploy Architecture Overview
#2691 opened Jul 3, 2025
8卡 h200 部署ERNIE-4.5-VL-424B-A47B-Paddle 失败
#2683 opened Jul 2, 2025
ERNIE-4.5-300B-A47B-2Bits-Paddle 双卡部署报错
#2678 opened Jul 2, 2025
一键编译FastDeploy时报错
#2676 opened Jul 2, 2025
how to get logprobs when deploy a openai format server
#2672 opened Jul 1, 2025
官网docker镜像作离线推理加载模型到94%时失败，可能跟libnvidia-ml相关
#2668 opened Jul 1, 2025
ERNIE-4.5-VL-28B-A3B-Paddle的int4量化加载，4090单卡成功，双卡失败
#2663 opened Jul 1, 2025

2 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

启动失败
#2655 commented on Jul 1, 2025 • 0 new comments
使用官网镜像+操作步骤启动报错
#2651 commented on Jul 1, 2025 • 0 new comments