[doc] fix image link (#4674)

hjh0119 · web-flow · commit bca413be6b46 · 2025-06-23T19:18:11.000+08:00
diff --git a/docs/source/Instruction/GRPO/DeveloperGuide/多轮训练.md b/docs/source/Instruction/GRPO/DeveloperGuide/多轮训练.md
@@ -28,7 +28,8 @@ pip install -e .
 多轮规划器是多轮训练的核心组件，其工作流程如下图所示：
 
 
-<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/multiturn_pipeline.png" width="300" />
+<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/multiturn_pipeline.png " width="300" />
+
 
 多轮规划器主要承担两大功能：
 - 终止条件判断：通过 check_finished 方法判断当前轮次推理是否应该结束
@@ -105,7 +106,7 @@ RolloutResponseChoice(
 
 推荐使用 AsyncEngine 来实现高效的批量数据异步多轮采样（只支持 external server mode），AsyncEngine 在多轮推理时能够减小推理过程中的计算气泡(如图)
 
-<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/asyncengine.png" width="400" />
+<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/asyncengine.png" width="400" />
 
 
 在 `rollout` 命令中使用参数 `use_async_engine` 来指定engine的种类
diff --git a/docs/source/Instruction/GRPO/GetStarted/GRPO.md b/docs/source/Instruction/GRPO/GetStarted/GRPO.md
@@ -281,3 +281,18 @@ $$
 当没有显式传入`val_dataset`时，参数`split_dataset_ratio`负责切分部分`dataset`为验证数据集，默认切分1%数据
 
 通过设置`--split_dataset_ratio 0` 来取消验证过程
+
+**7. 如何设置训练的 `mini-batch size`**
+
+在 GRPO 训练中，我们可以通过以下两种方式配置 mini-batch 更新：
+
+1. 配置选项：
+- 设置`generation_batch_size`为训练global-batch的整数倍
+- 或设置`steps_per_generation`为`gradient_accumulation_steps`的整数倍
+
+2. 典型配置示例：
+当配置：
+steps_per_generation = 16
+gradient_accumulation_steps = 8
+
+则一次 rollout 结果将拆分成两批 mini-batch 进行更新
diff --git a/docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md b/docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md
@@ -24,7 +24,7 @@ We can customize and set a multi-round sampling planner through the parameter `m
 ## MultiTurnScheduler
 The multi-turn scheduler is the core component of multi-round training, and its workflow is shown in the following diagram:
 
-<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/multiturn_pipeline.png" width="300" />
+<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/multiturn_pipeline.png" width="300" />
 
 The multi-turn scheduler primarily performs two functions:
 - Termination condition judgment: Determines whether the current round of reasoning should end via the `check_finished` method.
@@ -101,7 +101,7 @@ The default check_finished logic stops reasoning under two conditions:
 
 It is recommended to use AsyncEngine for efficient batch data asynchronous multi-round sampling (only supported in external server mode). AsyncEngine can reduce computational bubbles during multi-round reasoning (as shown in the diagram).
 
-<img src="https://github.com/modelscope/ms-swift/tree/main/docs/resources/asyncengine.png" width="400" />
+<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/asyncengine.png" width="400" />
 
 Use the `use_async_engine` parameter in the `rollout` command to specify the engine type:
 ```
diff --git a/docs/source_en/Instruction/GRPO/GetStarted/GRPO.md b/docs/source_en/Instruction/GRPO/GetStarted/GRPO.md
@@ -283,3 +283,18 @@ Refer to [issue](https://github.com/huggingface/open-r1/issues/239#issuecomment-
 When `val_dataset` is not explicitly passed, the `split_dataset_ratio` parameter is responsible for splitting part of the `dataset` into a validation dataset, which defaults to splitting 1% of the data.
 
 To disable the validation process, set `--split_dataset_ratio 0`.
+
+**7. How to set the training `mini-batch size`**
+
+In GRPO training, we can configure mini-batch updates in the following two ways:
+
+1. Configuration options:
+   - Set `generation_batch_size` to be an integer multiple of the training global batch size.
+   - Or set `steps_per_generation` to be an integer multiple of `gradient_accumulation_steps`.
+
+2. Typical configuration example:
+   When configured with:
+   steps_per_generation = 16
+   gradient_accumulation_steps = 8
+
+   The results from one rollout will be split into two mini-batch updates.