Skip to content

Commit bca413b

Browse files
authored
[doc] fix image link (#4674)
1 parent 88dae7c commit bca413b

File tree

4 files changed

+35
-4
lines changed

4 files changed

+35
-4
lines changed

docs/source/Instruction/GRPO/DeveloperGuide/多轮训练.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ pip install -e .
2828
多轮规划器是多轮训练的核心组件,其工作流程如下图所示:
2929

3030

31-
<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/multiturn_pipeline.png" width="300" />
31+
<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/multiturn_pipeline.png " width="300" />
32+
3233

3334
多轮规划器主要承担两大功能:
3435
- 终止条件判断:通过 check_finished 方法判断当前轮次推理是否应该结束
@@ -105,7 +106,7 @@ RolloutResponseChoice(
105106

106107
推荐使用 AsyncEngine 来实现高效的批量数据异步多轮采样(只支持 external server mode),AsyncEngine 在多轮推理时能够减小推理过程中的计算气泡(如图)
107108

108-
<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/asyncengine.png" width="400" />
109+
<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/asyncengine.png" width="400" />
109110

110111

111112
`rollout` 命令中使用参数 `use_async_engine` 来指定engine的种类

docs/source/Instruction/GRPO/GetStarted/GRPO.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,3 +281,18 @@ $$
281281
当没有显式传入`val_dataset`时,参数`split_dataset_ratio`负责切分部分`dataset`为验证数据集,默认切分1%数据
282282

283283
通过设置`--split_dataset_ratio 0` 来取消验证过程
284+
285+
**7. 如何设置训练的 `mini-batch size`**
286+
287+
在 GRPO 训练中,我们可以通过以下两种方式配置 mini-batch 更新:
288+
289+
1. 配置选项:
290+
- 设置`generation_batch_size`为训练global-batch的整数倍
291+
- 或设置`steps_per_generation``gradient_accumulation_steps`的整数倍
292+
293+
2. 典型配置示例:
294+
当配置:
295+
steps_per_generation = 16
296+
gradient_accumulation_steps = 8
297+
298+
则一次 rollout 结果将拆分成两批 mini-batch 进行更新

docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ We can customize and set a multi-round sampling planner through the parameter `m
2424
## MultiTurnScheduler
2525
The multi-turn scheduler is the core component of multi-round training, and its workflow is shown in the following diagram:
2626

27-
<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/multiturn_pipeline.png" width="300" />
27+
<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/multiturn_pipeline.png" width="300" />
2828

2929
The multi-turn scheduler primarily performs two functions:
3030
- Termination condition judgment: Determines whether the current round of reasoning should end via the `check_finished` method.
@@ -101,7 +101,7 @@ The default check_finished logic stops reasoning under two conditions:
101101

102102
It is recommended to use AsyncEngine for efficient batch data asynchronous multi-round sampling (only supported in external server mode). AsyncEngine can reduce computational bubbles during multi-round reasoning (as shown in the diagram).
103103

104-
<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/asyncengine.png" width="400" />
104+
<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/asyncengine.png" width="400" />
105105

106106
Use the `use_async_engine` parameter in the `rollout` command to specify the engine type:
107107
```

docs/source_en/Instruction/GRPO/GetStarted/GRPO.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,3 +283,18 @@ Refer to [issue](https://github.com/huggingface/open-r1/issues/239#issuecomment-
283283
When `val_dataset` is not explicitly passed, the `split_dataset_ratio` parameter is responsible for splitting part of the `dataset` into a validation dataset, which defaults to splitting 1% of the data.
284284

285285
To disable the validation process, set `--split_dataset_ratio 0`.
286+
287+
**7. How to set the training `mini-batch size`**
288+
289+
In GRPO training, we can configure mini-batch updates in the following two ways:
290+
291+
1. Configuration options:
292+
- Set `generation_batch_size` to be an integer multiple of the training global batch size.
293+
- Or set `steps_per_generation` to be an integer multiple of `gradient_accumulation_steps`.
294+
295+
2. Typical configuration example:
296+
When configured with:
297+
steps_per_generation = 16
298+
gradient_accumulation_steps = 8
299+
300+
The results from one rollout will be split into two mini-batch updates.

0 commit comments

Comments
 (0)