-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[Trainer] fix save_model #9286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Trainer] fix save_model #9286
Conversation
|
Thanks for your contribution! |
| if isinstance(self.model, LoRAModel) and (self.model.quantized or self.args.pipeline_parallel_degree > 1): | ||
| self.save_model(output_dir, False, signal_dir) | ||
| elif isinstance(self.model, LoRAModel) or isinstance(self.model, PrefixModelForCausalLM): | ||
| self.save_model(output_dir, True, signal_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signal_dir = os.path.join(signal_dir, os.path.split(output_dir)[-1])5b20bd3 to
6ebe5b6
Compare
ZHUI
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9286 +/- ##
===========================================
- Coverage 53.27% 53.09% -0.19%
===========================================
Files 657 657
Lines 107194 106533 -661
===========================================
- Hits 57104 56559 -545
+ Misses 50090 49974 -116 ☔ View full report in Codecov by Sentry. |
693d6fb to
2eafad3
Compare
wawltor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* bug fix * bug fix
* bug fix * bug fix
* [Unified Checkpoint] Support expert parallel (#9055) * update code * [Unified Checkpoint] Fix generation config save (#9223) * [Unified Checkpoint] update async_save_info in develop (#9173) * [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix * bug fix * [Trainer] fix save_model (#9286) * bug fix * bug fix --------- Co-authored-by: Weiguo Zhu <[email protected]>
PR types
Others
PR changes
Others
Description
Modify the
save_modelcall to enhance compatibility.