Skip to content

fix grpo resume_from_checkpoint #4035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/source/Instruction/Megatron-SWIFT训练.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

# Megatron-SWIFT训练

SWIFT引入了Megatron的并行技术来加速大模型的训练,包括数据并行、张量并行、流水线并行、序列并行,上下文并行,专家并行。支持Qwen3、Qwen3-MoE、Llama3、Deepseek-R1蒸馏系等模型的预训练和微调。完整支持的模型可以参考[支持的模型与数据集文档](./支持的模型和数据集.md)。
SWIFT引入了Megatron的并行技术来加速大模型的训练,包括数据并行、张量并行、流水线并行、序列并行,上下文并行,专家并行。支持Qwen3、[Qwen3-MoE](https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/qwen3_moe.sh)、Qwen2.5、Llama3、Deepseek-R1蒸馏系等模型的预训练和微调。完整支持的模型可以参考[支持的模型与数据集文档](./支持的模型和数据集.md)。

## 环境准备
使用Megatron-SWIFT,除了安装swift依赖外,还需要安装以下内容:
Expand Down Expand Up @@ -174,6 +174,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I

**checkpoint参数**:
- 🔥save: checkpoint的输出目录,默认None。在训练中,若未设置该参数,则默认为`f'megatron_output/{model_suffix}'`,例如`'megatron_output/Qwen2.5-7B-Instruct'`。
- 注意:若在多机训练时,请确保每个节点的保存路径指向相同位置。否则你需要在训练后手动集中这些权重。
- 🔥save_interval: checkpoint保存的间隔(steps),默认为500。
- 注意:训练结束时一定会保存权重。
- 🔥no_save_optim: 不保存optimizer,默认为False。
Expand Down
3 changes: 2 additions & 1 deletion docs/source_en/Instruction/Megatron-SWIFT-Training.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

# Megatron-SWIFT Training

SWIFT incorporates Megatron's parallelization techniques to accelerate the training of large models, including data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, context parallelism, and expert parallelism. It supports the pre-training and fine-tuning of models such as Qwen3, Qwen3-MoE, Llama3, and the Deepseek-R1 distillation series. For a complete list of supported models, please refer to the [Supported Models and Datasets documentation](./Supported-models-and-datasets.md).
SWIFT incorporates Megatron's parallelization techniques to accelerate the training of large models, including data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, context parallelism, and expert parallelism. It supports the pre-training and fine-tuning of models such as Qwen3, [Qwen3-MoE](https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/qwen3_moe.sh), Qwen2.5, Llama3, and the Deepseek-R1 distillation series. For a complete list of supported models, please refer to the [Supported Models and Datasets documentation](./Supported-models-and-datasets.md).

## Environment Setup

Expand Down Expand Up @@ -181,6 +181,7 @@ seq_length: Defaults to None, meaning it is set to `max_length`. To restrict the
**Checkpoint Parameters**:

- 🔥save: Output directory for checkpoints, default is None. During training, if this parameter is not set, it defaults to `f'megatron_output/{model_suffix}'`, e.g., `'megatron_output/Qwen2.5-7B-Instruct'`.
- Note: When training on multiple machines, ensure that the save paths on each node point to the same location. Otherwise, you will need to manually consolidate these weights after training.
- 🔥save_interval: Checkpoint saving interval (steps), default is 500.
- Note: Weights will always be saved at the end of training.
- 🔥no_save_optim: Do not save optimizer, default is False.
Expand Down
9 changes: 5 additions & 4 deletions swift/llm/argument/train_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,11 @@ def __post_init__(self) -> None:
'Please specify `--attn_impl flash_attn`.')
if self.resume_from_checkpoint:
self.resume_from_checkpoint = to_abspath(self.resume_from_checkpoint, True)
if self.train_type == 'full':
self.model = self.resume_from_checkpoint
else:
self.adapters = [self.resume_from_checkpoint]
if self.resume_only_model:
if self.train_type == 'full':
self.model = self.resume_from_checkpoint
else:
self.adapters = [self.resume_from_checkpoint]
BaseArguments.__post_init__(self)
Seq2SeqTrainingOverrideArguments.__post_init__(self)
TunerArguments.__post_init__(self)
Expand Down