-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Misc] Clean up MM profiling warnings #25222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Roger Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request cleans up multimodal profiling warnings that are no longer relevant with the deprecation of the V0 scheduler. The changes remove warnings related to sequence length limitations that are now handled by the V1 scheduler's chunked prefill mechanism. The modifications are straightforward and align with the goal of removing obsolete code. I have reviewed the changes and found no high or critical issues.
if total_mm_tokens > seq_len: | ||
logger.warning_once( | ||
"The sequence length (%d) is smaller than the pre-defined" | ||
" worst-case total number of multimodal tokens (%d). " | ||
"This may cause certain multi-modal inputs to fail during " | ||
"inference. To avoid this, you should increase " | ||
"`max_model_len` or reduce `mm_counts`.", | ||
seq_len, | ||
total_mm_tokens, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning will now show for the QwenVL model series by default since we modified profiling logic in #24312
Since we already return the following error message (without the server being crashed) if user actually sends a request longer than context window, I think this warning is no longer necessary and will be rather confusing in V1.
openai.BadRequestError: Error code: 400 - {'error': {'message': 'The decoder prompt (length 131072) is longer than the maximum model length of 128000. Make sure that
max_model_lenis no smaller than the number of text tokens plus multimodal tokens. For image inputs, the number of image tokens depends on the number of images, and possibly their aspect ratios as well.', 'type': 'BadRequestError', 'param': None, 'code': 400}}
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: charlifu <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Purpose
As we're deprecating V0, some of multimodal profiling warnings can be removed.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.