Skip to content

Commit c4c2fdc

Browse files
authored
[Fix] Fix iter bug when resuming checkpoint in distributed train (open-mmlab#866)
* [Fix] Fix iter bug when resuming checkpoint in distributed train * fix lint error Signed-off-by: FreyWang <[email protected]>
1 parent 872e544 commit c4c2fdc

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

tools/train.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
import mmcv
99
import torch
10-
from mmcv.runner import init_dist
10+
from mmcv.runner import get_dist_info, init_dist
1111
from mmcv.utils import Config, DictAction, get_git_hash
1212

1313
from mmseg import __version__
@@ -94,6 +94,9 @@ def main():
9494
else:
9595
distributed = True
9696
init_dist(args.launcher, **cfg.dist_params)
97+
# gpu_ids is used to calculate iter when resuming checkpoint,
98+
_, world_size = get_dist_info()
99+
cfg.gpu_ids = range(world_size)
97100

98101
# create work_dir
99102
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))

0 commit comments

Comments
 (0)