-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[VLM] Optimize GLM4.5-V-style video processing to only decode necessary frames #24161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Benchmark resultsScript: https://gist.github.com/Isotr0py/921b17edaeef1ed8bc211e22b47c84b4
|
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
input_ids)[0] | ||
if "do_sample_frames" in mm_kwargs and not mm_kwargs[ | ||
"do_sample_frames"]: | ||
# Transformers v4.55 has incorrect timestamps issue for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a link to the relevant issue so we know when to remove this workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root issue is the hardcoded 24 fps in Transformers v4.55's no sampling code path:
https://github.com/huggingface/transformers/blob/d79b2d981f28b2730d402244ac3c2e9a8c054eee/src/transformers/models/glm4v/video_processing_glm4v.py#L173-L176
I think huggingface/transformers#39600 should have fixed this issue. And we can remove this after Transformers v4.56 update. (Although current GLM4.1V's vLLM multimodal processor is broken on Transformers v4.56, I would like to fix it in following PR together 😅)
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
…ry frames (vllm-project#24161) Signed-off-by: Isotr0py <[email protected]>
…ry frames (vllm-project#24161) Signed-off-by: Isotr0py <[email protected]>
…ry frames (vllm-project#24161) Signed-off-by: Isotr0py <[email protected]>
…ry frames (vllm-project#24161) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
--media-io-kwargs '{"video": {"num_frames": -1}}'
, which is not safe enough and cause extremly high RAM usage to crash server if input video is quite long.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.