Skip to content

Fix bugs #4150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/Instruction/Megatron-SWIFT训练.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,3 +280,4 @@ Megatron训练参数继承自Megatron参数和基本参数。基本参数的内
- lazy_tokenize: 默认为False。若该参数设置为False,则在训练之前对所有的数据集样本进行tokenize(这可以避免在训练中出现报错);设置为True,则在训练中对数据集进行tokenize(这可以节约内存)。
- dataloader_persistent_workers: 透传入dataloader的参数,默认为True。
- dataloader_prefetch_factor: 透传入dataloader的参数,默认为10。
- max_epochs: 训练到`max_epochs`时强制退出训练,并对权重进行验证和保存。该参数在使用流式数据集时很有用。默认为None。
4 changes: 2 additions & 2 deletions docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@
- logging_steps: 日志打印间隔,默认为5。
- predict_with_generate: 验证时使用生成式的方式,默认为False。
- metric_for_best_model: 默认为None,即当`predict_with_generate`设置为False时,设置为'loss',否则设置为'rouge-l'(在PPO训练时,不进行默认值设置;GRPO训练设置为'reward')。
- greater_is_better: 默认为None,即当`metric_for_best_model`含'loss'时,设置为False,否则设置为True.
- max_epochs: 训练到`max_epochs`时强制退出训练,并对权重进行验证和保存。该参数在使用流式数据集时很有用。
- greater_is_better: 默认为None,即当`metric_for_best_model`含'loss'时,设置为False,否则设置为True
- max_epochs: 训练到`max_epochs`时强制退出训练,并对权重进行验证和保存。该参数在使用流式数据集时很有用。默认为None。

其他重要参数:
- 🔥num_train_epochs: 训练的epoch数,默认为3。
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ This parameter list inherits from transformers `Seq2SeqTrainingArguments`, with
- predict_with_generate: Whether to use generative method during validation, default is False.
- metric_for_best_model: Default is None, which means that when predict_with_generate is set to False, it is set to 'loss'; otherwise, it is set to 'rouge-l' (during PPO training, the default value is not set; in GRPO training, it is set to 'reward').
- greater_is_better: Defaults to None, which sets it to False when `metric_for_best_model` contains 'loss', otherwise sets to True.
- max_epochs: Forces the training to exit after reaching `max_epochs`, and performs validation and saving of the model weights. This parameter is especially useful when using a streaming dataset.
- max_epochs: Forces the training to exit after reaching `max_epochs`, and performs validation and saving of the model weights. This parameter is especially useful when using a streaming dataset. Default is None.

Other important parameters:
- 🔥num_train_epochs: Number of training epochs, default is 3.
Expand Down
1 change: 1 addition & 0 deletions docs/source_en/Instruction/Megatron-SWIFT-Training.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,3 +292,4 @@ Megatron training parameters inherit from Megatron parameters and basic paramete
- lazy_tokenize: Default is False. If this parameter is set to False, all dataset samples are tokenized before training (this avoids errors during training); if set to True, tokenization occurs during training (this saves memory).
- dataloader_persistent_workers: A parameter passed directly to the dataloader, with a default value of True.
- dataloader_prefetch_factor: A parameter passed directly to the dataloader, with a default value of 10.
- max_epochs: Forces the training to exit after reaching `max_epochs`, and performs validation and saving of the model weights. This parameter is especially useful when using a streaming dataset. Default is None.
3 changes: 2 additions & 1 deletion swift/llm/train/sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,8 @@ def train(self, trainer):
try:
trainer.train(trainer.args.resume_from_checkpoint)
finally:
return self._save_trainer_state(trainer)
res = self._save_trainer_state(trainer)
return res

def _prepare_callbacks(self):
from .callback import DynamicLayerActivationCallback, TrainerAdapterCallback
Expand Down
3 changes: 3 additions & 0 deletions swift/megatron/argument/train_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ def _init_save(self):
os.makedirs(self.save, exist_ok=True)

def __post_init__(self):
if self.sequence_parallel_size > 1:
# please use `--sequence_parallel` or `--context_parallel_size`.
self.sequence_parallel_size = 1
self.load = to_abspath(self.load, check_path_exist=True)
BaseArguments.__post_init__(self)
self._init_save()
Expand Down