[Doc] Update train test doc #2061

xiexinch · 2022-09-13T06:43:18Z

Motivation

As title.

Modification

Update the structure:

Tutorial 4: Train and test with existing models
- Training and testing on a single machine with a single GPU
  - Training on a single GPU
  - Testing on a single GPU
- Training and testing on multiple GPUs and multiple machines
  - Training on multiple GPUs
  - Testing oh multiple GPUs
  - Launch multiple jobs on a single machine
  - Train with multiple machines
- Manage jobs with Slurm
  - Training on a cluster with Slurm
  - Testing on a cluster with Slurm

MeowZheng · 2022-09-15T01:44:14Z

docs/en/user_guides/4_train_test.md

-# use the pre-trained model for the whole PSPNet
-load_from = 'https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'  # model path can be found in model zoo
-```
+## Training and testing on a single machine with a single GPU


Suggested change

## Training and testing on a single machine with a single GPU

## Training and testing on a single GPU

MeowZheng · 2022-09-15T08:00:53Z

docs/en/user_guides/4_train_test.md

 MMSegmentation also provides out-of-the-box tools for training models.
 This section will show how to train and test models on standard datasets.


These lines are a little weird and repeats with the last sentence。

MeowZheng · 2022-09-15T08:02:19Z

docs/en/user_guides/4_train_test.md

-Difference between `--resume` and `load-from`:
-`--resume` loads both the model weights and optimizer status, and the iteration is also inherited from the specified checkpoint.
+**Note:** Difference between the argument `--resume` and the field `load-from` in the config file:
+`--resume` loads both the model weights and optimizer status and the iteration is also inherited from the specified checkpoint.


resume doesn't support load weights

MeowZheng · 2022-09-15T08:08:05Z

docs/en/user_guides/4_train_test.md

+`--resume` loads both the model weights and optimizer status and the iteration is also inherited from the specified checkpoint.
 It is usually used for resuming the training process that is interrupted accidentally.

 `load-from` only loads the model weights and the training iteration starts from 0. It is usually used for fine-tuning.


The Note might not be required as resume doesn't support loading the specific checkpoint, and there might be confusion between resume and load_from.

MeowZheng · 2022-09-15T08:10:34Z

docs/en/user_guides/4_train_test.md

+
+- `--work-dir`: If specified, results will be saved in this directory. If not specified, the results will be automatically saved to `work_dirs/{CONFIG_NAME}`.
+- `--show`: Show prediction results at runtime, available when `--show-dir` is not specified.
+- `--show-dir`: If specified, the visualized segmentation mask will be saved in the specified directory.


mmsegmentation/tools/test.py

Lines 24 to 28 in 3388cfd

parser.add_argument(

'--show-dir',

help='directory where painted images will be saved. '

'If specified, it will be automatically saved '

'to the work_dir/timestamp/show_dir')

MeowZheng · 2022-09-15T08:21:51Z

docs/en/user_guides/4_train_test.md

+### Launch multiple jobs on a single machine
+
+If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict. Otherwise, there will be an error message saying `RuntimeError: Address already in use`.
+If you use `dist_train.sh` to launch training jobs, you can set the port in commands with the environment variable \`PORT\`\`.


Suggested change

If you use `dist_train.sh` to launch training jobs, you can set the port in commands with the environment variable \`PORT\`\`.

If you use `dist_train.sh` to launch training jobs, you can set the port in commands with the environment variable `PORT`.

MeowZheng · 2022-09-16T10:57:06Z

docs/en/user_guides/4_train_test.md

+**Note:** Difference between the argument `--resume` and the field `load-from` in the config file:

-`load-from` only loads the model weights and the training iteration starts from 0. It is usually used for fine-tuning.
+`--resume` only determines whether to resume from the latest checkpoint in the work_dir. It is usually used for resuming the training process that is interrupted accidentally.

-### Training on CPU
+`load-from` will specify the checkpoint to be loaded and the training iteration starts from 0. It is usually used for fine-tuning.

-The process of training on the CPU is consistent with single GPU training if machine does not have GPU. If it has GPUs but not wanting to use it, we just need to disable GPUs before the training process.


Note:
If you would like to resume training from a specific checkpoint, you can use --resume with --cfg-options load-from=$CHECKPOINT.

* draft * refine structure * fix typo * rename single gpu title and redefine --resume * update introduction * add notes to load_from

xiexinch added 2 commits September 9, 2022 20:59

draft

f8a4e1d

refine structure

49b1b46

mm-assistant bot assigned xiexinch Sep 13, 2022

fix typo

6f996b5

MeowZheng reviewed Sep 15, 2022

View reviewed changes

rename single gpu title and redefine --resume

7e6deac

MeowZheng reviewed Sep 15, 2022

View reviewed changes

update introduction

6114b5c

MeowZheng reviewed Sep 16, 2022

View reviewed changes

add notes to load_from

6375b78

MeowZheng approved these changes Sep 16, 2022

View reviewed changes

MeowZheng merged commit 52ce34c into open-mmlab:1.x Sep 16, 2022

MeowZheng pushed a commit to MeowZheng/mmsegmentation that referenced this pull request Nov 1, 2022

[Doc] Update train test doc (open-mmlab#2061)

b8d87d7

* draft * refine structure * fix typo * rename single gpu title and redefine --resume * update introduction * add notes to load_from

MeowZheng added the Doc label Nov 2, 2022

wjkim81 pushed a commit to wjkim81/mmsegmentation that referenced this pull request Dec 3, 2023

Bump version to v1.0.0rc1 (open-mmlab#2061)

617d4d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc] Update train test doc #2061

[Doc] Update train test doc #2061

Uh oh!

xiexinch commented Sep 13, 2022

Uh oh!

MeowZheng Sep 15, 2022 •

edited

Loading

Uh oh!

MeowZheng Sep 15, 2022 •

edited

Loading

Uh oh!

MeowZheng Sep 15, 2022

Uh oh!

MeowZheng Sep 15, 2022

Uh oh!

MeowZheng Sep 15, 2022

Uh oh!

MeowZheng Sep 15, 2022

Uh oh!

MeowZheng Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	## Training and testing on a single machine with a single GPU
	## Training and testing on a single GPU

		MMSegmentation also provides out-of-the-box tools for training models.
		This section will show how to train and test models on standard datasets.

	parser.add_argument(
	'--show-dir',
	help='directory where painted images will be saved. '
	'If specified, it will be automatically saved '
	'to the work_dir/timestamp/show_dir')

	If you use `dist_train.sh` to launch training jobs, you can set the port in commands with the environment variable \`PORT\`\`.
	If you use `dist_train.sh` to launch training jobs, you can set the port in commands with the environment variable `PORT`.

[Doc] Update train test doc #2061

[Doc] Update train test doc #2061

Uh oh!

Conversation

xiexinch commented Sep 13, 2022

Motivation

Modification

Uh oh!

MeowZheng Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MeowZheng Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MeowZheng Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

MeowZheng Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

MeowZheng Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

MeowZheng Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

MeowZheng Sep 16, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MeowZheng Sep 15, 2022 •

edited

Loading

MeowZheng Sep 15, 2022 •

edited

Loading