[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 #15211

h-sugi · 2025-03-20T11:20:58Z

When using Alibi-based models like MPT, the following assertion error causes.

AssertionError: Cascade attention does not support ALiBi.

This is because that the determination logic for use_cascade in gpu_model_runner.py incorrectly uses a hard-coded use_alibi=False. Therefore, cascade_attention is wrongly enabled.

This PR allows setting use_alibi based on the alibi configuration specified in the config.json for MPT models.

github-actions · 2025-03-20T11:21:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon

@h-sugi Thanks for the PR! Left some minor comments.

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: h-sugi <[email protected]>

Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: h-sugi <[email protected]>

Signed-off-by: h-sugi <[email protected]>

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: h-sugi <[email protected]>

mergify · 2025-03-22T11:46:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @h-sugi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: h-sugi <[email protected]>

WoosukKwon · 2025-03-27T05:47:42Z

@h-sugi Thanks for updating the PR! I will merge if the tests go well.

… vllm/v1 (vllm-project#15211) Signed-off-by: h-sugi <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

… vllm/v1 (vllm-project#15211) Signed-off-by: h-sugi <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

… vllm/v1 (vllm-project#15211) Signed-off-by: h-sugi <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

… vllm/v1 (vllm-project#15211) Signed-off-by: h-sugi <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Mu Huai <[email protected]>

h-sugi requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners March 20, 2025 11:20

mergify bot added the v1 label Mar 20, 2025

h-sugi force-pushed the fix-use_alibi branch from ef160e4 to 42ea17d Compare March 20, 2025 11:39

WoosukKwon reviewed Mar 20, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

h-sugi and others added 4 commits March 21, 2025 09:00

fix use_alibi=False

5cb0c43

Signed-off-by: h-sugi <[email protected]>

fix use_alibi logic

f2e7899

Signed-off-by: h-sugi <[email protected]>

delete unnecessary line in vllm/v1/worker/gpu_model_runner.py

7c89692

Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: h-sugi <[email protected]>

change hf_config to hf_text_config

687d085

Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: h-sugi <[email protected]>

h-sugi force-pushed the fix-use_alibi branch from 9e89819 to 687d085 Compare March 21, 2025 00:00

adapt alibi models other than MPT

996657a

Signed-off-by: h-sugi <[email protected]>

WoosukKwon approved these changes Mar 21, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

fommater

893ef8a

Signed-off-by: h-sugi <[email protected]>

mergify bot added the needs-rebase label Mar 22, 2025

Merge branch 'main' into fix-use_alibi

01a6917

mergify bot removed the needs-rebase label Mar 22, 2025

h-sugi added 3 commits March 22, 2025 21:51

precommit

6928f61

Signed-off-by: h-sugi <[email protected]>

Merge branch 'main' into fix-use_alibi

75767d8

merge main

235715b

Signed-off-by: h-sugi <[email protected]>

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 27, 2025

DarkLight1337 merged commit 8958217 into vllm-project:main Mar 27, 2025
39 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Bugfix] Fix use_cascade_attention handling for Alibi-based models on…

a93eb2c

… vllm/v1 (vllm-project#15211) Signed-off-by: h-sugi <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Bugfix] Fix use_cascade_attention handling for Alibi-based models on…

36d1471

… vllm/v1 (vllm-project#15211) Signed-off-by: h-sugi <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 #15211

[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 #15211

Uh oh!

h-sugi commented Mar 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 20, 2025

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 22, 2025

Uh oh!

WoosukKwon commented Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 #15211

[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 #15211

Uh oh!

Conversation

h-sugi commented Mar 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 20, 2025

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 22, 2025

Uh oh!

WoosukKwon commented Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h-sugi commented Mar 20, 2025 •

edited by github-actions bot

Loading