-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Disable torch.compile
for dynamic rope models in Transformers backend
#23738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… backend Signed-off-by: Harry Mellor <[email protected]>
… rope Signed-off-by: Harry Mellor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the InternVLForConditionalGeneration
architecture and disables torch.compile
for models with dynamic rope scaling. The changes for the new model support are correct. However, I've identified a critical bug in the implementation for disabling torch.compile
where a potential AttributeError
can occur. Please see my comment for the suggested fix.
Signed-off-by: Harry Mellor <[email protected]>
tests/models/registry.py
Outdated
"3.5-qwen3moe": "OpenGVLab/InternVL3_5-30B-A3B", # noqa: E501 | ||
"3.5-gptoss": "OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview"}, # noqa: E501 | ||
trust_remote_code=True), | ||
"InternVLForConditionalGeneration": _HfExamplesInfo("OpenGVLab/InternVL3-1B-hf"), # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to enable native support through existing InternS1ForConditionalGeneration
with video input support in #23742, because their implementations are exactly same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add video support to the Transformers backend instead 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term, if it's not necessary to reimplement a model in vLLM we should avoid it if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add video support to the Transformers backend instead 👀
I think this needs extra refactoring at Transformers side first, because we only added helper function in processor for images previously IIRC.
Long term, if it's not necessary to reimplement a model in vLLM we should avoid it if possible
Yes, but since InternS1ForConditionalGeneration
is already here, we can reuse it simply with video support and no need to disable torch.compile:
"InternVLForConditionalGeneration": ("interns1", "InternS1ForConditionalGeneration")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but since InternS1ForConditionalGeneration is already here, we can reuse it simply with video support
For now I will remove the InternVL stuff from my PR
no need to disable torch.compile
InternVLChatModel
and InternS1ForConditionalGeneration
do not support torch.compile
in vLLM. Neither of them are decorated with @support_torch_compile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InternVLChatModel and InternS1ForConditionalGeneration do not support torch.compile in vLLM. Neither of them are decorated with @support_torch_compile
No, we only compile text backbone for VLM exactly (no ViT compilation yet), @support_torch_compile
will only take effect at language model level. And Qwen2 and Qwen3MoE (without MRoPE) they used have supported @support_torch_compile
.
(There are no VLM implementations wrapped by @support_torch_compile
, because we only added it to backbone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I wasn't aware that's how it worked.
TransformersForMultimodalLM
is wrapped with @support_torch_compile
and it seems to work fine. In TransformersForMultimodalLM
there is no separate backbone to decorate as we use inheritance to add MM support to TransformersForCausalLM
. If we removed @support_torch_compile
from TransformersForMultimodalLM
, would the @support_torch_compile
on TransformersForCausalLM
do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the torch compile will be limited to the modules that have the decorator (and their submodules)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TransformersForMultimodalLM
is a subclass of TransformersForCausalLM
, not a submodule.
Do you know how @support_torch_compile
interacts with inheritance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think torch.compile
on the base class is also applied to subclasses. Maybe @youkaichao can confirm this
InternVLForConditionalGeneration
and disable torch.compile
for dynamic rope modelstorch.compile
for dynamic rope models in Transformers backend
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
…nd (vllm-project#23738) Signed-off-by: Harry Mellor <[email protected]>
…nd (vllm-project#23738) Signed-off-by: Harry Mellor <[email protected]>
…nd (vllm-project#23738) Signed-off-by: Harry Mellor <[email protected]>
…nd (vllm-project#23738) Signed-off-by: Harry Mellor <[email protected]>
Fix the Transformers backend
torch.compile
support by disabling it when using models with dynamic rope are loaded.