[Doc] Refine training tricks documentation (open-mmlab#2755)

MeowZheng · web-flow · commit 1ea397509e64 · 2023-03-22T14:37:54.000+08:00
diff --git a/docs/en/advanced_guides/training_tricks.md b/docs/en/advanced_guides/training_tricks.md
@@ -1,4 +1,4 @@
-# \[WIP\] Training Tricks
+# Training Tricks
 
 MMSegmentation support following training tricks out of box.
 
@@ -9,18 +9,19 @@ In semantic segmentation, some methods make the LR of heads larger than backbone
 In MMSegmentation, you may add following lines to config to make the LR of heads 10 times of backbone.
 
 ```python
-optimizer=dict(
+optim_wrapper=dict(
     paramwise_cfg = dict(
         custom_keys={
             'head': dict(lr_mult=10.)}))
 ```
 
 With this modification, the LR of any parameter group with `'head'` in name will be multiplied by 10.
-You may refer to [MMCV doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.DefaultOptimizerConstructor) for further details.
+You may refer to [MMEngine documentation](https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html#advanced-usages) for further details.
 
 ## Online Hard Example Mining (OHEM)
 
-We implement pixel sampler [here](https://github.com/open-mmlab/mmsegmentation/tree/master/mmseg/core/seg/sampler) for training sampling.
+We implement pixel sampler for training sampling, like OHEM (Online Hard Example Mining),
+which is used for remove the "easy" examples for model training.
 Here is an example config of training PSPNet with OHEM enabled.
 
 ```python
@@ -58,33 +59,17 @@ For loss calculation, we support multiple losses training concurrently. Here is
 ```python
 _base_ = './fcn_unet_s5-d16_64x64_40k_drive.py'
 model = dict(
-    decode_head=dict(loss_decode=[dict(type='CrossEntropyLoss', loss_name='loss_ce', loss_weight=1.0),
-            dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)]),
-    auxiliary_head=dict(loss_decode=[dict(type='CrossEntropyLoss', loss_name='loss_ce',loss_weight=1.0),
-            dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)]),
-    )
+    decode_head=dict(loss_decode=[
+        dict(type='CrossEntropyLoss', loss_name='loss_ce', loss_weight=1.0),
+        dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)
+    ]),
+    auxiliary_head=dict(loss_decode=[
+        dict(type='CrossEntropyLoss', loss_name='loss_ce', loss_weight=1.0),
+        dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)
+    ]),
+)
 ```
 
 In this way, `loss_weight` and `loss_name` will be weight and name in training log of corresponding loss, respectively.
 
 Note: If you want this loss item to be included into the backward graph, `loss_` must be the prefix of the name.
-
-## Ignore specified label index in loss calculation
-
-In default setting, `avg_non_ignore=False` which means each pixel counts for loss calculation although some of them belong to ignore-index labels.
-
-For loss calculation, we support ignore index of certain label by `avg_non_ignore` and `ignore_index`. In this way, the average loss would only be calculated in non-ignored labels which may achieve better performance, and here is the [reference](https://github.com/open-mmlab/mmsegmentation/pull/1409). Here is an example config of training `unet` on `Cityscapes` dataset: in loss calculation it would ignore label 0 which is background and loss average is only calculated on non-ignore labels:
-
-```python
-_base_ = './unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py'
-model = dict(
-    decode_head=dict(
-        ignore_index=0,
-        loss_decode=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0, avg_non_ignore=True),
-    auxiliary_head=dict(
-        ignore_index=0,
-        loss_decode=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0, avg_non_ignore=True)),
-    ))
-```
diff --git a/docs/zh_cn/advanced_guides/training_tricks.md b/docs/zh_cn/advanced_guides/training_tricks.md
@@ -1,4 +1,4 @@
-# 训练技巧（待更新）
+# 训练技巧
 
 MMSegmentation 支持如下训练技巧：
 
@@ -9,17 +9,17 @@ MMSegmentation 支持如下训练技巧：
 在 MMSegmentation 里面，您也可以在配置文件里添加如下行来让解码头组件的学习率是主干组件的10倍。
 
 ```python
-optimizer=dict(
+optim_wrapper=dict(
     paramwise_cfg = dict(
         custom_keys={
             'head': dict(lr_mult=10.)}))
 ```
 
-通过这种修改，任何被分组到 `'head'` 的参数的学习率都将乘以10。您也可以参照 [MMCV 文档](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.DefaultOptimizerConstructor)  获取更详细的信息。
+通过这种修改，任何被分组到 `'head'` 的参数的学习率都将乘以10。您也可以参照 [MMEngine 文档](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/optim_wrapper.html#id6)  获取更详细的信息。
 
 ## 在线难样本挖掘 (Online Hard Example Mining, OHEM)
 
-对于训练时采样，我们在 [这里](https://github.com/open-mmlab/mmsegmentation/tree/master/mmseg/core/seg/sampler) 做了像素采样器。
+MMSegmentation 中实现了像素采样器，训练时可以对特定像素进行采样，例如 OHEM(Online Hard Example Mining)，可以解决样本不平衡问题，
 如下例子是使用 PSPNet 训练并采用 OHEM 策略的配置：
 
 ```python
@@ -58,38 +58,17 @@ model=dict(
 ```python
 _base_ = './fcn_unet_s5-d16_64x64_40k_drive.py'
 model = dict(
-    decode_head=dict(loss_decode=[dict(type='CrossEntropyLoss', loss_name='loss_ce', loss_weight=1.0),
-            dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)]),
-    auxiliary_head=dict(loss_decode=[dict(type='CrossEntropyLoss', loss_name='loss_ce',loss_weight=1.0),
-            dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)]),
-    )
+    decode_head=dict(loss_decode=[
+        dict(type='CrossEntropyLoss', loss_name='loss_ce', loss_weight=1.0),
+        dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)
+    ]),
+    auxiliary_head=dict(loss_decode=[
+        dict(type='CrossEntropyLoss', loss_name='loss_ce', loss_weight=1.0),
+        dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)
+    ]),
+)
 ```
 
 通过这种方式，确定训练过程中损失函数的权重 `loss_weight` 和在训练日志里的名字 `loss_name`。
 
-注意： `loss_name` 的名字必须带有 `loss_` 前缀，这样它才能被包括在反传的图里。
-
-## 在损失函数中忽略特定的 label 类别
-
-默认设置 `avg_non_ignore=False`， 即每个像素都用来计算损失函数。尽管其中的一些像素属于需要被忽略的类别。
-
-对于训练时损失函数的计算，我们目前支持使用 `avg_non_ignore` 和 `ignore_index` 来忽略 label 特定的类别。 这样损失函数将只在非忽略类别像素中求平均值，会获得更好的表现。这里是[相关 PR](https://github.com/open-mmlab/mmsegmentation/pull/1409)。以 `unet` 使用 `Cityscapes` 数据集训练为例，
-在计算损失函数时，忽略 label 为0的背景，并且仅在不被忽略的像素上计算均值。配置文件写为:
-
-```python
-_base_ = './fcn_unet_s5-d16_4x4_512x1024_160k_cityscapes.py'
-model = dict(
-    decode_head=dict(
-        ignore_index=0,
-        loss_decode=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0, avg_non_ignore=True),
-    auxiliary_head=dict(
-        ignore_index=0,
-        loss_decode=dict(
-            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0, avg_non_ignore=True)),
-    ))
-```
-
-通过这种方式，确定训练过程中损失函数的权重 `loss_weight` 和在训练日志里的名字 `loss_name`。
-
-注意： `loss_name` 的名字必须带有 `loss_` 前缀，这样它才能被包括在反传的图里。
+注意： `loss_name` 的名字必须带有 `loss_` 前缀，这样它才能被包括在计算图里。