Skip to content

Conversation

Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Sep 3, 2025

Purpose

  • To make sure video processing correctness for GLM4.5V and upcoming Qwen3-VL, we need to add --media-io-kwargs '{"video": {"num_frames": -1}}', which is not safe enough and cause extremly high RAM usage to crash server if input video is quite long.
  • This PR adds a new video loader to support GLM4.5V-style dynamic sampling, so we don't need to decode all frames anymore.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <[email protected]>
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 3, 2025
@Isotr0py
Copy link
Member Author

Benchmark results

Script: https://gist.github.com/Isotr0py/921b17edaeef1ed8bc211e22b47c84b4
Hardware: AMD Ryzen Threadripper 3970X 32-Core Processor

[Full decoding backend processing] start
[Full decoding backend processing] memory cost: 4396.312MB
[Full decoding backend processing] time cost: 27.156s

[Dynamic decoding backend processing] start
[Dynamic decoding backend processing] memory cost: 500.066MB
[Dynamic decoding backend processing] time cost: 12.666s

@Isotr0py Isotr0py marked this pull request as ready for review September 11, 2025 08:34
Signed-off-by: Isotr0py <[email protected]>
input_ids)[0]
if "do_sample_frames" in mm_kwargs and not mm_kwargs[
"do_sample_frames"]:
# Transformers v4.55 has incorrect timestamps issue for
Copy link
Member

@DarkLight1337 DarkLight1337 Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a link to the relevant issue so we know when to remove this workaround?

Copy link
Member Author

@Isotr0py Isotr0py Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root issue is the hardcoded 24 fps in Transformers v4.55's no sampling code path:
https://github.com/huggingface/transformers/blob/d79b2d981f28b2730d402244ac3c2e9a8c054eee/src/transformers/models/glm4v/video_processing_glm4v.py#L173-L176

I think huggingface/transformers#39600 should have fixed this issue. And we can remove this after Transformers v4.56 update. (Although current GLM4.1V's vLLM multimodal processor is broken on Transformers v4.56, I would like to fix it in following PR together 😅)

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 11, 2025
@vllm-bot vllm-bot merged commit bcbe2a4 into vllm-project:main Sep 11, 2025
38 of 41 checks passed
@Isotr0py Isotr0py deleted the glm-video-loader branch September 11, 2025 18:05
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants