-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Bugfix] Fix Qwen3-coder moe tuned config #24072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jee Jee Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a crash that occurs when deploying the Qwen3-Coder model with MoE. The root cause was an incorrect tuned configuration file generated with use_deep_gemm
. The fix includes the correctly re-tuned configuration and a safeguard in the benchmark_moe.py
script to prevent use_deep_gemm
during tuning, as it's only supported for Triton kernels.
The changes are logical and address the issue. I have one suggestion on benchmark_moe.py
to make the safeguard more explicit by raising an error instead of silently correcting the flag, which will prevent potential user confusion and use of suboptimal configurations. Overall, this is a good fix.
benchmarks/kernels/benchmark_moe.py
Outdated
print("Only supports tuning triton kernels, set use_deep_gemm=False.") | ||
use_deep_gemm = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this change correctly prevents the crash by disabling use_deep_gemm
during tuning, it might be better to fail explicitly rather than silently changing the behavior. A user might intend to tune for DeepGEMM and miss the print statement in the logs, leading them to unknowingly use a Triton-tuned configuration for DeepGEMM workloads, which could be suboptimal.
Consider raising a ValueError
to make the incompatibility explicit and guide the user to the correct usage.
print("Only supports tuning triton kernels, set use_deep_gemm=False.") | |
use_deep_gemm = False | |
raise ValueError( | |
"Tuning with --use-deep-gemm is not supported as it only tunes Triton kernels. Please remove the flag.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, because tune only targets triton moe kernel, if use_deep_gemm is used in tuning, there will be issues with the results |
@bnellnm Curious why do we introduce |
This script can not only tune MoE kernel but can also benchmark it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me then, could you also update as the Gemini suggests?
We should avoid implicitly change of param
I probably just added the flag everywhere mechanically. |
Signed-off-by: Jee Jee Li <[email protected]>
5b350f8
to
cc160c9
Compare
Done in cc160c9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
When deploying the qwen3-coder model using the following scirpt(H20*8):
the following error is raised.
The root cause should be that #21700 used
use-deep-gemm
when tuning moe, resulting in an incorrect tuned config.This PR re-tunes the related configuration, and then to avoid similar problems in the future, when setting tune, set use_deep_gemm=false
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.