Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

No description provided.

@mergify mergify bot added the qwen Related to Qwen models label Sep 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new configuration file for the Fused Mixture-of-Experts (MoE) kernel, tailored for the NVIDIA H200 GPU. The configuration is for a model with 512 experts and a sharded intermediate size of 64, which is likely for the Qwen3-Next model as indicated in the pull request title. The JSON file contains tuned kernel parameters for various batch sizes. The structure of the file is consistent with existing configurations, and the values appear to be the result of a tuning process. The change is straightforward and I found no issues.

@simon-mo simon-mo merged commit c733bd5 into main Sep 11, 2025
9 of 13 checks passed
@simon-mo simon-mo deleted the woosuk/qwen3-next-h200 branch September 11, 2025 19:40
@WoosukKwon
Copy link
Collaborator Author

@simon-mo Actually this only includes the config for TP=8. Will add TP=1, 2, and 4 shortly.

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants