modelscope · Jintao-Huang · May 9, 2025 · May 9, 2025 · May 9, 2025 · May 9, 2025
diff --git a/docs/source/Instruction/Megatron-SWIFT训练.md b/docs/source/Instruction/Megatron-SWIFT训练.md
@@ -280,3 +280,4 @@ Megatron训练参数继承自Megatron参数和基本参数。基本参数的内
 - lazy_tokenize: 默认为False。若该参数设置为False，则在训练之前对所有的数据集样本进行tokenize（这可以避免在训练中出现报错）；设置为True，则在训练中对数据集进行tokenize（这可以节约内存）。
 - dataloader_persistent_workers: 透传入dataloader的参数，默认为True。
 - dataloader_prefetch_factor: 透传入dataloader的参数，默认为10。
+- max_epochs: 训练到`max_epochs`时强制退出训练，并对权重进行验证和保存。该参数在使用流式数据集时很有用。默认为None。
diff --git a/docs/source/Instruction/命令行参数.md b/docs/source/Instruction/命令行参数.md
@@ -136,8 +136,8 @@
 - logging_steps: 日志打印间隔，默认为5。
 - predict_with_generate: 验证时使用生成式的方式，默认为False。
 - metric_for_best_model: 默认为None，即当`predict_with_generate`设置为False时，设置为'loss'，否则设置为'rouge-l'（在PPO训练时，不进行默认值设置；GRPO训练设置为'reward'）。
-- greater_is_better: 默认为None，即当`metric_for_best_model`含'loss'时，设置为False，否则设置为True.
-- max_epochs: 训练到`max_epochs`时强制退出训练，并对权重进行验证和保存。该参数在使用流式数据集时很有用。
+- greater_is_better: 默认为None，即当`metric_for_best_model`含'loss'时，设置为False，否则设置为True。
+- max_epochs: 训练到`max_epochs`时强制退出训练，并对权重进行验证和保存。该参数在使用流式数据集时很有用。默认为None。
 
 其他重要参数：
 - 🔥num_train_epochs: 训练的epoch数，默认为3。

diff --git a/docs/source_en/Instruction/Command-line-parameters.md b/docs/source_en/Instruction/Command-line-parameters.md
@@ -142,7 +142,7 @@ This parameter list inherits from transformers `Seq2SeqTrainingArguments`, with
 - predict_with_generate: Whether to use generative method during validation, default is False.
 - metric_for_best_model: Default is None, which means that when predict_with_generate is set to False, it is set to 'loss'; otherwise, it is set to 'rouge-l' (during PPO training, the default value is not set; in GRPO training, it is set to 'reward').
 - greater_is_better: Defaults to None, which sets it to False when `metric_for_best_model` contains 'loss', otherwise sets to True.
-- max_epochs: Forces the training to exit after reaching `max_epochs`, and performs validation and saving of the model weights. This parameter is especially useful when using a streaming dataset.
+- max_epochs: Forces the training to exit after reaching `max_epochs`, and performs validation and saving of the model weights. This parameter is especially useful when using a streaming dataset. Default is None.
 
 Other important parameters:
 - 🔥num_train_epochs: Number of training epochs, default is 3.

diff --git a/docs/source_en/Instruction/Megatron-SWIFT-Training.md b/docs/source_en/Instruction/Megatron-SWIFT-Training.md
@@ -292,3 +292,4 @@ Megatron training parameters inherit from Megatron parameters and basic paramete
 - lazy_tokenize: Default is False. If this parameter is set to False, all dataset samples are tokenized before training (this avoids errors during training); if set to True, tokenization occurs during training (this saves memory).
 - dataloader_persistent_workers: A parameter passed directly to the dataloader, with a default value of True.
 - dataloader_prefetch_factor: A parameter passed directly to the dataloader, with a default value of 10.
+- max_epochs: Forces the training to exit after reaching `max_epochs`, and performs validation and saving of the model weights. This parameter is especially useful when using a streaming dataset. Default is None.
diff --git a/swift/llm/train/sft.py b/swift/llm/train/sft.py
@@ -211,7 +211,8 @@ def train(self, trainer):
         try:
             trainer.train(trainer.args.resume_from_checkpoint)
         finally:
-            return self._save_trainer_state(trainer)
+            res = self._save_trainer_state(trainer)
+        return res
 
     def _prepare_callbacks(self):
         from .callback import DynamicLayerActivationCallback, TrainerAdapterCallback

diff --git a/swift/megatron/argument/train_args.py b/swift/megatron/argument/train_args.py
@@ -41,6 +41,9 @@ def _init_save(self):
             os.makedirs(self.save, exist_ok=True)
 
     def __post_init__(self):
+        if self.sequence_parallel_size > 1:
+            # please use `--sequence_parallel` or `--context_parallel_size`.
+            self.sequence_parallel_size = 1
         self.load = to_abspath(self.load, check_path_exist=True)
         BaseArguments.__post_init__(self)
         self._init_save()