Skip to content

Conversation

ywang96
Copy link
Member

@ywang96 ywang96 commented Sep 19, 2025

Purpose

As we're deprecating V0, some of multimodal profiling warnings can be removed.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request cleans up multimodal profiling warnings that are no longer relevant with the deprecation of the V0 scheduler. The changes remove warnings related to sequence length limitations that are now handled by the V1 scheduler's chunked prefill mechanism. The modifications are straightforward and align with the goal of removing obsolete code. I have reviewed the changes and found no high or critical issues.

Comment on lines -279 to -288
if total_mm_tokens > seq_len:
logger.warning_once(
"The sequence length (%d) is smaller than the pre-defined"
" worst-case total number of multimodal tokens (%d). "
"This may cause certain multi-modal inputs to fail during "
"inference. To avoid this, you should increase "
"`max_model_len` or reduce `mm_counts`.",
seq_len,
total_mm_tokens,
)
Copy link
Member Author

@ywang96 ywang96 Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning will now show for the QwenVL model series by default since we modified profiling logic in #24312

Since we already return the following error message (without the server being crashed) if user actually sends a request longer than context window, I think this warning is no longer necessary and will be rather confusing in V1.

openai.BadRequestError: Error code: 400 - {'error': {'message': 'The decoder prompt (length 131072) is longer than the maximum model length of 128000. Make sure thatmax_model_lenis no smaller than the number of text tokens plus multimodal tokens. For image inputs, the number of image tokens depends on the number of images, and possibly their aspect ratios as well.', 'type': 'BadRequestError', 'param': None, 'code': 400}}

@ywang96 ywang96 requested a review from Isotr0py September 19, 2025 02:49
@Isotr0py Isotr0py enabled auto-merge (squash) September 19, 2025 03:25
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
@Isotr0py Isotr0py merged commit 31a8a2a into vllm-project:main Sep 19, 2025
53 checks passed
ywang96 added a commit to ywang96/vllm that referenced this pull request Sep 19, 2025
debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants