Skip to content

Conversation

luccafong
Copy link
Collaborator

Summary: GB200 FlashInfer Prefill is not compatible with CutlassMLA FP8, allowing disable it for now.

Differential Revision: D81994905

@facebook-github-bot
Copy link

@luccafong has exported this pull request. If you are a Meta employee, you can view the originating diff in D81994905.

@mergify mergify bot added the v1 label Sep 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new environment variable, VLLM_DISABLE_FLASHINFER_PREFILL, to provide an option to disable FlashInfer prefill. This change addresses a compatibility issue on GB200 with CutlassMLA FP8. The implementation adds the new flag in vllm/envs.py and correctly uses it in vllm/v1/attention/backends/mla/common.py to control the feature. The default behavior is unchanged. The changes are correct and well-contained.

    Summary: GB200 FlashInfer Prefill is not compatible with CutlassMLA FP8, allowing disable it for now.

    Differential Revision: D81994905

Signed-off-by: Lu Fang <[email protected]>
@mgoin
Copy link
Member

mgoin commented Sep 19, 2025

Seems reasonable for now, thanks

@mgoin mgoin enabled auto-merge (squash) September 19, 2025 21:08
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
@mgoin mgoin merged commit ee7a66d into vllm-project:main Sep 19, 2025
52 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Lu Fang <[email protected]>
Signed-off-by: yewentao256 <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants