Skip to content

【Inference Optimize】update qwen2_5vl inference optimize #1171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
update name
  • Loading branch information
chang-wenbin committed Apr 2, 2025
commit 9254cb4b83d797face5750e2284547a37281d10b
6 changes: 3 additions & 3 deletions deploy/qwen2_5_vl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ python deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
--inference_model True \
--mode dynamic \
--dtype bfloat16 \
--enable_stream_output False \
--output_via_mq False \
--benchmark True
```

Expand All @@ -82,7 +82,7 @@ python deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
--inference_model True \
--mode dynamic \
--dtype bfloat16 \
--enable_stream_output False \
--output_via_mq False \
--quant_type "weight_only_int8" \
--benchmark True
```
Expand All @@ -105,7 +105,7 @@ python -m paddle.distributed.launch --gpus "0,1,2,3" deploy/qwen2_5_vl/qwen2_5_v
--mode dynamic \
--append_attn 1 \
--dtype bfloat16 \
--enable_stream_output False \
--output_via_mq False \
--benchmark True
```

Expand Down
7 changes: 4 additions & 3 deletions deploy/qwen2_5_vl/scripts/qwen2_5_vl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

export PYTHONPATH=/root/paddlejob/workspace/env_run/output/changwenbin/PaddleMIX/PaddleNLP

export CUDA_VISIBLE_DEVICES=2
export USE_FASTER_TOP_P_SAMPLING=1
Expand All @@ -33,7 +34,7 @@ python deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
--mode dynamic \
--append_attn 1 \
--dtype bfloat16 \
--enable_stream_output False \
--output_via_mq False \
--benchmark True


Expand All @@ -53,7 +54,7 @@ python deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
# --inference_model True \
# --mode dynamic \
# --dtype bfloat16 \
# --enable_stream_output False \
# --output_via_mq False \
# --quant_type "weight_only_int8" \
# --benchmark True

Expand All @@ -75,5 +76,5 @@ python deploy/qwen2_5_vl/qwen2_5_vl_infer.py \
# --mode dynamic \
# --append_attn 1 \
# --dtype bfloat16 \
# --enable_stream_output False \
# --output_via_mq False \
# --benchmark True
2 changes: 1 addition & 1 deletion deploy/qwen2_vl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ python deploy/qwen2_vl/single_image_infer.py\
--inference_model True \
--mode dynamic \
--dtype bfloat16 \
--enable_stream_output False \
--output_via_mq False \
--benchmark True

### 3.2. 文本&视频输入高性能推理
Expand Down
7 changes: 4 additions & 3 deletions deploy/qwen2_vl/scripts/qwen2_vl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

export PYTHONPATH=/root/paddlejob/workspace/env_run/output/changwenbin/PaddleMIX/PaddleNLP

export CUDA_VISIBLE_DEVICES=0
#fp16 高性能推理
Expand All @@ -29,7 +30,7 @@ python deploy/qwen2_vl/single_image_infer.py\
--inference_model True \
--mode dynamic \
--dtype bfloat16 \
--enable_stream_output False \
--output_via_mq False \
--benchmark True


Expand All @@ -49,7 +50,7 @@ python deploy/qwen2_vl/single_image_infer.py\
# --inference_model True \
# --mode dynamic \
# --dtype bfloat16 \
# --enable_stream_output False \
# --output_via_mq False \
# --quant_type "weight_only_int8" \
# --benchmark True

Expand All @@ -69,5 +70,5 @@ python deploy/qwen2_vl/single_image_infer.py\
# --inference_model True \
# --mode dynamic \
# --dtype bfloat16 \
# --enable_stream_output False \
# --output_via_mq False \
# --benchmark True