-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Bugfix] Fix Qwen2.5-VL quantized model weights loading #23512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR makes sure quantized model weights are loaded correctly. Currently, loading `RedHatAI/Qwen2.5-VL-7B-Instruct-FP8-Dynamic` will crash an A100s: ``` [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py", line 661, in process_weights_after_loading [core.py:708] layer.scheme.process_weights_after_loading(layer) [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py", line 59, in process_weights_after_loading [core.py:708] prepare_fp8_layer_for_marlin(layer) [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py", line 107, in prepare_fp8_layer_for_marlin [core.py:708] marlin_qweight = ops.gptq_marlin_repack(b_q_weight=qweight, [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [core.py:708] File "/opt/venv/lib/python3.12/site-packages/vllm/_custom_ops.py", line 938, in gptq_marlin_repack [core.py:708] return torch.ops._C.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [core.py:708] File "/opt/venv/lib/python3.12/site-packages/torch/_ops.py", line 1243, in __call__ [core.py:708] return self._op(*args, **kwargs) [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^ [core.py:708] RuntimeError: size_n = 6840 is not divisible by tile_n_size = 64 ``` The issue is introduced in vllm-project#22066 which changed the model implementation to use MergedColumnParallelLinear layer and pack 'gate_proj' and 'up_proj' params. Signed-off-by: Zifei Tong <[email protected]>
819d603
to
66f61ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a crash that occurs when loading quantized Qwen2.5-VL models. The issue stems from a missing packed_modules_mapping
for the gate_up_proj
layer, which was introduced when the model was updated to use a MergedColumnParallelLinear
layer. By adding the necessary mapping, this change ensures that the quantization logic can correctly handle the packed weights, resolving the shape mismatch error during the gptq_marlin_repack
operation. The fix is targeted, necessary, and follows the established pattern in vLLM for supporting packed layers in quantized models. The change is approved.
Signed-off-by: Zifei Tong <[email protected]>
fe0832f
to
83a4945
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
…#23512) Signed-off-by: Zifei Tong <[email protected]>
…#23512) Signed-off-by: Zifei Tong <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
…#23512) Signed-off-by: Zifei Tong <[email protected]>
…#23512) Signed-off-by: Zifei Tong <[email protected]>
…#23512) Signed-off-by: Zifei Tong <[email protected]>
…#23512) Signed-off-by: Zifei Tong <[email protected]> Signed-off-by: Ekagra Ranjan <[email protected]>
…#23512) Signed-off-by: Zifei Tong <[email protected]>
This PR makes sure quantized model weights are loaded correctly. Currently, loading
RedHatAI/Qwen2.5-VL-7B-Instruct-FP8-Dynamic
will crash on A100s:The issue is introduced in #22066 which changed the model implementation to use MergedColumnParallelLinear layer and pack 'gate_proj' and 'up_proj' params.
Purpose
Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.