-
Notifications
You must be signed in to change notification settings - Fork 3k
[Inference] Add new wint2.75/wint2.5 quant type and support DeepseekV3 #10578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #10578 +/- ##
===========================================
- Coverage 46.98% 46.63% -0.36%
===========================================
Files 799 805 +6
Lines 132246 133244 +998
===========================================
+ Hits 62135 62137 +2
- Misses 70111 71107 +996 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
) | ||
import json | ||
|
||
with open(mix_bit_path, "r") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify the existence of mix_bit_path
if the file doesn't exist, initialize the default MixBitConfig with necessary logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code logic will be executed only when WINTX quantization or when mix_bit_path is given. At this time, if the file does not exist, an error will be reported. There is no need to print the error log repeatedly.
@@ -1238,9 +1272,292 @@ def set_state_dict(self, state_dict): | |||
self.transformer_block.shared_expert_ffn1_weights[idx].set_value(shared_expert_ffn1_weight) | |||
self.transformer_block.shared_expert_ffn2_weights[idx].set_value(shared_expert_ffn2_weight) | |||
|
|||
@paddle.no_grad() | |||
def set_wintx_state_dict(self, state_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easy to maintain if you extracted some common code with set_state_dict() to helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original set_state_dict
is already very bloated and does not support direct loading of quantized weights. In addition, wintx loading logic will be very complicated in this case.
@@ -0,0 +1,19 @@ | |||
# WINTX Triton kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of format should mix_bits_config.json be following? You may consider showing it directly in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be given when exporting the model, so users don't need to worry about this at present.
return ffn2_out | ||
|
||
|
||
class FusedMultiTransformerWINTX(FusedMultiTransformerBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个类和已有的类共性比较大,是不是可以基于已有的Transformer类来实现?
class FusedMultiTransformerWINTX(FusedMultiTransformerBase): | |
class FusedMultiTransformerWINTX(FusedMultiTransformerBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已有的类最相似的FusedMultiTransformerWeightOnly
。基于这个改的话需要改动的地方比较多。主要有以下问题:
- FusedMultiTransformerWeightOnly不支持灵活的混合bit配置
- weight shape设置与当前已导出权重差别比较大
PR types
New features
PR changes
混合bit量化支持
dsv3接入WINTX量化
dsv2/v3支持WINTX权重导入
WINT4 cutlass支持
极低bit推理(Triton)
Description
支持wint2.75/wint2.5 deepseekv3推理