Skip to content

Commit 42b7567

Browse files
authored
Merge pull request CSAILVision#189 from CSAILVision/hang
add HRNet model
2 parents d224f03 + 9b66e85 commit 42b7567

20 files changed

+690
-194
lines changed

README.md

Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ This is a PyTorch implementation of semantic segmentation models on MIT ADE20K s
55
ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7:
66
https://github.com/CSAILVision/sceneparsing
77

8-
If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and segment it!
8+
If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!
99

1010
All pretrained models can be found at:
1111
http://sceneparsing.csail.mit.edu/model/pytorch
@@ -19,6 +19,7 @@ https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj
1919

2020
## Updates
2121
- We use configuration files to store most options which were in argument parser. The definitions of options are detailed in ```config/defaults.py```.
22+
- HRNet model is now supported.
2223

2324

2425
## Highlights
@@ -36,21 +37,24 @@ For the task of semantic segmentation, it is good to keep aspect ratio of images
3637

3738
<sup>*Now the batch size of a dataloader always equals to the number of GPUs*, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for `__getitem__` function, we just ignore such request and send a random batch dict. Also, *the multiple workers forked by the dataloader all have the same seed*, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for `numpy.random` before activating multiple worker in dataloader.</sup>
3839

39-
### An Efficient and Effective Framework: UPerNet
40-
UPerNet is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. *Without bells and whistles*, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory (e.g., you cannot train a PSPNet-101 on TITAN Xp GPUs with only 12GB memory, while you can train a UPerNet-101 on such GPUs). Thanks to the efficient network design, we will soon open source stronger models of UPerNet based on ResNeXt that is able to run on normal GPUs. Please refer to [UperNet](https://arxiv.org/abs/1807.10221) for details.
40+
### State-of-the-Art models
41+
- **PSPNet** is scene parsing network that aggregates global representation with Pyramid Pooling Module (PPM). It is the winner model of ILSVRC'16 MIT Scene Parsing Challenge. Please refer to [https://arxiv.org/abs/1612.01105](https://arxiv.org/abs/1612.01105) for details.
42+
- **UPerNet** is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. *Without bells and whistles*, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory. Please refer to [https://arxiv.org/abs/1807.10221](https://arxiv.org/abs/1807.10221) for details.
43+
- **HRNet** is a recently proposed model that retains high resolution representations throughout the model, without the traditional bottleneck design. It achieves the SOTA performance on a series of pixel labeling tasks. Please refer to [https://arxiv.org/abs/1904.04514](https://arxiv.org/abs/1904.04514) for details.
4144

4245

4346
## Supported models
44-
We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling.
47+
We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the ```config``` folder.
4548

4649
Encoder:
4750
- MobileNetV2dilated
48-
- ResNet18dilated
49-
- ResNet50dilated
50-
- ResNet101dilated
51+
- ResNet18/ResNet18dilated
52+
- ResNet50/ResNet50dilated
53+
- ResNet101/ResNet101dilated
54+
- HRNet (HRNetV2-W48)
5155

5256
Decoder:
53-
- C1 (1 convolution module)
57+
- C1 (one convolution module)
5458
- C1_deepsup (C1 + deep supervision trick)
5559
- PPM (Pyramid Pooling Module, see [PSPNet](https://hszhao.github.io/projects/pspnet) paper for details.)
5660
- PPM_deepsup (PPM + deep supervision trick)
@@ -66,12 +70,10 @@ IMPORTANT: We use our self-trained base model on ImageNet. The model takes the i
6670
<th valign="bottom">Pixel Accuracy(%)</th>
6771
<th valign="bottom">Overall Score</th>
6872
<th valign="bottom">Inference Speed(fps)</th>
69-
<th valign="bottom">Training Time(hours)</th>
7073
<tr>
7174
<td rowspan="2">MobileNetV2dilated + C1_deepsup</td>
7275
<td>No</td><td>34.84</td><td>75.75</td><td>54.07</td>
7376
<td>17.2</td>
74-
<td rowspan="2">0.8 * 20 = 16</td>
7577
</tr>
7678
<tr>
7779
<td>Yes</td><td>33.84</td><td>76.80</td><td>55.32</td>
@@ -81,7 +83,6 @@ IMPORTANT: We use our self-trained base model on ImageNet. The model takes the i
8183
<td rowspan="2">MobileNetV2dilated + PPM_deepsup</td>
8284
<td>No</td><td>35.76</td><td>77.77</td><td>56.27</td>
8385
<td>14.9</td>
84-
<td rowspan="2">0.9 * 20 = 18.0</td>
8586
</tr>
8687
<tr>
8788
<td>Yes</td><td>36.28</td><td>78.26</td><td>57.27</td>
@@ -91,7 +92,6 @@ IMPORTANT: We use our self-trained base model on ImageNet. The model takes the i
9192
<td rowspan="2">ResNet18dilated + C1_deepsup</td>
9293
<td>No</td><td>33.82</td><td>76.05</td><td>54.94</td>
9394
<td>13.9</td>
94-
<td rowspan="2">0.42 * 20 = 8.4</td>
9595
</tr>
9696
<tr>
9797
<td>Yes</td><td>35.34</td><td>77.41</td><td>56.38</td>
@@ -101,7 +101,6 @@ IMPORTANT: We use our self-trained base model on ImageNet. The model takes the i
101101
<td rowspan="2">ResNet18dilated + PPM_deepsup</td>
102102
<td>No</td><td>38.00</td><td>78.64</td><td>58.32</td>
103103
<td>11.7</td>
104-
<td rowspan="2">1.1 * 20 = 22.0</td>
105104
</tr>
106105
<tr>
107106
<td>Yes</td><td>38.81</td><td>79.29</td><td>59.05</td>
@@ -111,7 +110,6 @@ IMPORTANT: We use our self-trained base model on ImageNet. The model takes the i
111110
<td rowspan="2">ResNet50dilated + PPM_deepsup</td>
112111
<td>No</td><td>41.26</td><td>79.73</td><td>60.50</td>
113112
<td>8.3</td>
114-
<td rowspan="2">1.67 * 20 = 33.4</td>
115113
</tr>
116114
<tr>
117115
<td>Yes</td><td>42.14</td><td>80.13</td><td>61.14</td>
@@ -121,35 +119,42 @@ IMPORTANT: We use our self-trained base model on ImageNet. The model takes the i
121119
<td rowspan="2">ResNet101dilated + PPM_deepsup</td>
122120
<td>No</td><td>42.19</td><td>80.59</td><td>61.39</td>
123121
<td>6.8</td>
124-
<td rowspan="2">3.82 * 25 = 95.5</td>
125122
</tr>
126123
<tr>
127124
<td>Yes</td><td>42.53</td><td>80.91</td><td>61.72</td>
128125
<td>2.0</td>
129126
</tr>
130127
<tr>
131-
<td rowspan="2"><b>UperNet50</b></td>
128+
<td rowspan="2">UperNet50</td>
132129
<td>No</td><td>40.44</td><td>79.80</td><td>60.12</td>
133130
<td>8.4</td>
134-
<td rowspan="2">1.75 * 20 = 35.0</td>
135131
</tr>
136132
<tr>
137133
<td>Yes</td><td>41.55</td><td>80.23</td><td>60.89</td>
138134
<td>2.9</td>
139135
</tr>
140136
<tr>
141-
<td rowspan="2"><b>UperNet101</b></td>
137+
<td rowspan="2">UperNet101</td>
142138
<td>No</td><td>42.00</td><td>80.79</td><td>61.40</td>
143139
<td>7.8</td>
144-
<td rowspan="2">2.5 * 25 = 62.5</td>
145140
</tr>
146141
<tr>
147142
<td>Yes</td><td>42.66</td><td>81.01</td><td>61.84</td>
148143
<td>2.3</td>
149144
</tr>
145+
<tr>
146+
<td rowspan="2">HRNetV2-W48</td>
147+
<td>No</td><td>41.74</td><td>80.59</td><td>61.17</td>
148+
<td>5.8</td>
149+
</tr>
150+
<tr>
151+
<td>Yes</td><td>42.99</td><td>81.25</td><td>62.12</td>
152+
<td>1.9</td>
153+
</tr>
154+
150155
</tbody></table>
151156

152-
The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), ***except for*** ResNet101dilated, which is benchmarked on a server with 8 NVIDIA Tesla P40 GPUS (22GB GPU memory), because of the insufficient memory issue when using dilated conv on a very deep network. The inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.
157+
The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.
153158

154159
## Environment
155160
The code is developed under the following configurations.
@@ -220,6 +225,7 @@ python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_dee
220225
* Evaluate UPerNet101
221226
```bash
222227
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml
228+
```
223229

224230
## Reference
225231

config/ade20k-hrnetv2.yaml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
DATASET:
2+
root_dataset: "./data/"
3+
list_train: "./data/training.odgt"
4+
list_val: "./data/validation.odgt"
5+
num_class: 150
6+
imgSizes: (300, 375, 450, 525, 600)
7+
imgMaxSize: 1000
8+
padding_constant: 32
9+
segm_downsampling_rate: 4
10+
random_flip: True
11+
12+
MODEL:
13+
arch_encoder: "hrnetv2"
14+
arch_decoder: "c1"
15+
fc_dim: 720
16+
17+
TRAIN:
18+
batch_size_per_gpu: 2
19+
num_epoch: 20
20+
start_epoch: 0
21+
epoch_iters: 5000
22+
optim: "SGD"
23+
lr_encoder: 0.02
24+
lr_decoder: 0.02
25+
lr_pow: 0.9
26+
beta1: 0.9
27+
weight_decay: 1e-4
28+
deep_sup_scale: 0.4
29+
fix_bn: False
30+
workers: 16
31+
disp_iter: 20
32+
seed: 304
33+
34+
VAL:
35+
visualize: False
36+
checkpoint: "epoch_20.pth"
37+
38+
TEST:
39+
checkpoint: "epoch_20.pth"
40+
result: "./"
41+
42+
DIR: "ckpt/ade20k-hrnetv2-c1"

config/ade20k-mobilenetv2dilated-c1_deepsup.yaml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,10 @@ DATASET:
1212
MODEL:
1313
arch_encoder: "mobilenetv2dilated"
1414
arch_decoder: "c1_deepsup"
15-
weights_encoder: ""
16-
weights_decoder: ""
1715
fc_dim: 320
1816

1917
TRAIN:
20-
batch_size_per_gpu: 2
18+
batch_size_per_gpu: 3
2119
num_epoch: 20
2220
start_epoch: 0
2321
epoch_iters: 5000
@@ -35,10 +33,10 @@ TRAIN:
3533

3634
VAL:
3735
visualize: False
38-
suffix: "_epoch_20.pth"
36+
checkpoint: "epoch_20.pth"
3937

4038
TEST:
41-
suffix: "_epoch_20.pth"
39+
checkpoint: "epoch_20.pth"
4240
result: "./"
4341

4442
DIR: "ckpt/ade20k-mobilenetv2dilated-c1_deepsup"

config/ade20k-resnet101-upernet.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ DATASET:
1212
MODEL:
1313
arch_encoder: "resnet101"
1414
arch_decoder: "upernet"
15-
weights_encoder: ""
16-
weights_decoder: ""
1715
fc_dim: 2048
1816

1917
TRAIN:
@@ -35,10 +33,10 @@ TRAIN:
3533

3634
VAL:
3735
visualize: False
38-
suffix: "_epoch_40.pth"
36+
checkpoint: "epoch_40.pth"
3937

4038
TEST:
41-
suffix: "_epoch_40.pth"
39+
checkpoint: "epoch_40.pth"
4240
result: "./"
4341

4442
DIR: "ckpt/ade20k-resnet101-upernet"

config/ade20k-resnet101dilated-ppm_deepsup.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ DATASET:
1212
MODEL:
1313
arch_encoder: "resnet50dilated"
1414
arch_decoder: "ppm_deepsup"
15-
weights_encoder: ""
16-
weights_decoder: ""
1715
fc_dim: 2048
1816

1917
TRAIN:
@@ -35,10 +33,10 @@ TRAIN:
3533

3634
VAL:
3735
visualize: False
38-
suffix: "_epoch_20.pth"
36+
checkpoint: "epoch_20.pth"
3937

4038
TEST:
41-
suffix: "_epoch_20.pth"
39+
checkpoint: "epoch_20.pth"
4240
result: "./"
4341

4442
DIR: "ckpt/ade20k-resnet50dilated-ppm_deepsup"

config/ade20k-resnet18dilated-ppm_deepsup.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ DATASET:
1212
MODEL:
1313
arch_encoder: "resnet18dilated"
1414
arch_decoder: "ppm_deepsup"
15-
weights_encoder: ""
16-
weights_decoder: ""
1715
fc_dim: 512
1816

1917
TRAIN:
@@ -35,10 +33,10 @@ TRAIN:
3533

3634
VAL:
3735
visualize: False
38-
suffix: "_epoch_20.pth"
36+
checkpoint: "epoch_20.pth"
3937

4038
TEST:
41-
suffix: "_epoch_20.pth"
39+
checkpoint: "epoch_20.pth"
4240
result: "./"
4341

4442
DIR: "ckpt/ade20k-resnet18dilated-ppm_deepsup"
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
DATASET:
2+
root_dataset: "./data/"
3+
list_train: "./data/training.odgt"
4+
list_val: "./data/validation.odgt"
5+
num_class: 150
6+
imgSizes: (300, 375, 450, 525, 600)
7+
imgMaxSize: 1000
8+
padding_constant: 32
9+
segm_downsampling_rate: 4
10+
random_flip: True
11+
12+
MODEL:
13+
arch_encoder: "resnet50"
14+
arch_decoder: "upernet"
15+
fc_dim: 2048
16+
17+
TRAIN:
18+
batch_size_per_gpu: 2
19+
num_epoch: 40
20+
start_epoch: 0
21+
epoch_iters: 5000
22+
optim: "SGD"
23+
lr_encoder: 0.02
24+
lr_decoder: 0.02
25+
lr_pow: 0.9
26+
beta1: 0.9
27+
weight_decay: 1e-4
28+
deep_sup_scale: 0.4
29+
fix_bn: False
30+
workers: 16
31+
disp_iter: 20
32+
seed: 304
33+
34+
VAL:
35+
visualize: False
36+
checkpoint: "epoch_40.pth"
37+
38+
TEST:
39+
checkpoint: "epoch_40.pth"
40+
result: "./"
41+
42+
DIR: "ckpt/ade20k-resnet50-upernet"

config/ade20k-resnet50dilated-ppm_deepsup.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ DATASET:
1212
MODEL:
1313
arch_encoder: "resnet50dilated"
1414
arch_decoder: "ppm_deepsup"
15-
weights_encoder: ""
16-
weights_decoder: ""
1715
fc_dim: 2048
1816

1917
TRAIN:
@@ -35,10 +33,10 @@ TRAIN:
3533

3634
VAL:
3735
visualize: False
38-
suffix: "_epoch_20.pth"
36+
checkpoint: "epoch_20.pth"
3937

4038
TEST:
41-
suffix: "_epoch_20.pth"
39+
checkpoint: "epoch_20.pth"
4240
result: "./"
4341

4442
DIR: "ckpt/ade20k-resnet50dilated-ppm_deepsup"

config/defaults.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
# epochs to train for
5050
_C.TRAIN.num_epoch = 20
5151
# epoch to start training. useful if continue from a checkpoint
52-
_C.TRAIN.start_epoch = 1
52+
_C.TRAIN.start_epoch = 0
5353
# iterations of each epoch (irrelevant to batch size)
5454
_C.TRAIN.epoch_iters = 5000
5555

@@ -83,7 +83,7 @@
8383
# output visualization during validation
8484
_C.VAL.visualize = False
8585
# the checkpoint to evaluate on
86-
_C.VAL.suffix = "_epoch_20.pth"
86+
_C.VAL.checkpoint = "epoch_20.pth"
8787

8888
# -----------------------------------------------------------------------------
8989
# Testing
@@ -92,6 +92,6 @@
9292
# currently only supports 1
9393
_C.TEST.batch_size = 1
9494
# the checkpoint to test on
95-
_C.TEST.suffix = "_epoch_20.pth"
95+
_C.TEST.checkpoint = "epoch_20.pth"
9696
# folder to output visualization results
9797
_C.TEST.result = "./"

0 commit comments

Comments
 (0)