-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[BugFix] Update python to python3 calls for image; fix prefix & input calculations. #21391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a bug in the auto_tune.sh
script where the total number of tokens for benchmarking was miscalculated when using prefix caching. The changes to adjust input_len
and prefix_len
are sound. I've provided one comment to make the calculation more robust against integer division truncation errors. The update to use python3
is also a good improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this issue! left a nit.
… calculations Signed-off-by: Eric Hanley <[email protected]>
Make adjusted_input_len calculation more robust Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eric Hanley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: qizixi <[email protected]>
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: x22x22 <[email protected]>
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jinzhen Lin <[email protected]>
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Paul Pak <[email protected]>
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Diego-Castan <[email protected]>
… calculations. (vllm-project#21391) Signed-off-by: Eric Hanley <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Fix issue with auto_tune.sh script where prefix_len and input_len are being passed to benchmark_serving.py incorrectly.
Also update to python3 to be in step with vllm/vllm-openai:latest.
When attempting to execute the following parameters:
Get the following issue in vllm_... logs:
Per current benchmark_serving.py documentation, input_len and prefix_len need to be calculated differently to pass to benchmark_serving.py as shown in current PR.
Test Plan
Update code to calculate adjusted_input_len and pass to benchmarking.py as shown:
Test Result
Logs report successful run with expected cache_hit_rate_pct:
(Optional) Documentation Update