Skip to content

Commit fa16a37

Browse files
authored
Merge pull request CSAILVision#49 from CSAILVision/xtt
Add new models and UPerNet
2 parents e8ce1a7 + 23ea6d2 commit fa16a37

File tree

9 files changed

+704
-62
lines changed

9 files changed

+704
-62
lines changed

README.md

Lines changed: 56 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -29,26 +29,28 @@ Different from image classification task, where the input images are resized to
2929

3030
So we re-implement the `DataParallel` module, and make it support distributing data to multiple GPUs in python dict. At the same time, the dataloader also operates differently. *Now the batch size of a dataloader always equals to the number of GPUs*, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for `__getitem__` function, we just ignore such request and send a random batch dict. Also, *the multiple workers forked by the dataloader all have the same seed*, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for `numpy.random` before activating multiple worker in dataloader.
3131

32+
### An Efficient and Effective Framework: UPerNet
33+
UPerNet based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM), with down-sampling rate of 4, 8 and 16. It doesn't need dilated convolution, a operator that is time-and-memory consuming. *Without bells and whistles*, it is comparable or even better compared with PSPNet, while requires much shorter training time and less GPU memory. E.g., you cannot train a PSPNet-101 on TITAN Xp GPUs with only 12GB memory, while you can train a UPerNet-101 on such GPUs.
34+
35+
Thanks to the efficient network design, we will soon opensource stronger models of UPerNet based on ResNeXt that is able to run on normal GPUs.
36+
3237

3338
## Supported models
3439
We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling.
3540

3641
Encoder: (resnetXX_dilatedYY: customized resnetXX with dilated convolutions, output feature map is 1/YY of input size.)
37-
- resnet34_dilated16, resnet34_dilated8
38-
- resnet50_dilated16, resnet50_dilated8
42+
- ResNet50: resnet50_dilated16, resnet50_dilated8
43+
- ResNet101: resnet101_dilated16, resnet101_dilated8
3944

4045
***Coming soon***:
41-
- resnet101_dilated16, resnet101_dilated8
46+
- ResNeXt101: resnext101_dilated16, resnext101_dilated8
4247

4348
Decoder:
4449
- c1_bilinear (1 conv + bilinear upsample)
4550
- c1_bilinear_deepsup (c1_blinear + deep supervision trick)
4651
- ppm_bilinear (pyramid pooling + bilinear upsample, see [PSPNet](https://hszhao.github.io/projects/pspnet) paper for details)
4752
- ppm_bilinear_deepsup (ppm_bilinear + deep supervision trick)
48-
49-
***Coming soon***:
50-
- UPerNet based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM), with down-sampling rate of 4, 8 and 16. It doesn't need dilated convolution, a operator that is time-and-memory consuming. *Without bells and whistles*, it is comparable or even better compared with PSPNet, while requires much shorter training time and less GPU memory.
51-
53+
- upernet (pyramid pooling + FPN head)
5254

5355
## Performance:
5456
IMPORTANT: We use our self-trained base model on ImageNet. The model takes the input in BGR form (consistent with opencv) instead of RGB form as used by default implementation of PyTorch. The base model will be automatically downloaded when needed.
@@ -63,44 +65,53 @@ IMPORTANT: We use our self-trained base model on ImageNet. The model takes the i
6365
<tr>
6466
<td>ResNet-50_dilated8 + c1_bilinear_deepsup</td>
6567
<td>No</td><td>34.88</td><td>76.54</td><td>55.71</td>
66-
<td>27.5 hours</td>
68+
<td>1.38 * 20 = 27.6 hours</td>
6769
</tr>
6870
<tr>
6971
<td rowspan="2">ResNet-50_dilated8 + ppm_bilinear_deepsup</td>
7072
<td>No</td><td>41.26</td><td>79.73</td><td>60.50</td>
71-
<td rowspan="2">33.4 hours</td>
73+
<td rowspan="2">1.67 * 20 = 33.4 hours</td>
7274
</tr>
7375
<tr>
7476
<td>Yes</td><td>42.04</td><td>80.23</td><td>61.14</td>
7577
</tr>
7678
<tr>
77-
<td>ResNet-101_dilated8 + c1_bilinear_deepsup</td>
78-
<td>-</td><td>-</td><td>-</td><td>-</td>
79-
<td>- hours</td>
79+
<td rowspan="2">ResNet-101_dilated8 + ppm_bilinear_deepsup</td>
80+
<td>No</td><td>42.19</td><td>80.59</td><td>61.39</td>
81+
<td rowspan="2">3.82 * 25 = 95.5 hours</td>
8082
</tr>
8183
<tr>
82-
<td>ResNet-101_dilated8 + ppm_bilinear_deepsup</td>
83-
<td>-</td><td>-</td><td>-</td><td>-</td>
84-
<td>- hours</td>
84+
<td>Yes</td><td>42.53</td><td>80.91</td><td>61.72</td>
8585
</tr>
8686
<tr>
87-
<td>UPerNet-50 (coming soon!)</td>
88-
<td>-</td><td>-</td><td>-</td><td>-</td>
89-
<td>- hours</td>
87+
<td rowspan="2"><b>UperNet-50</b></td>
88+
<td>No</td><td>40.44</td><td>79.80</td><td>60.12</td>
89+
<td rowspan="2">1.75 * 20 = 35.0 hours</td>
9090
</tr>
9191
<tr>
92-
<td>UPerNet-101 (coming soon!)</td>
92+
<td>Yes</td><td>41.55</td><td>80.23</td><td>60.89</td>
93+
</tr>
94+
<tr>
95+
<td rowspan="2"><b>UperNet-101</b></td>
96+
<td>No</td><td>41.98</td><td>80.63</td><td>61.34</td>
97+
<td rowspan="2">2.5 * 25 = 50.0 hours</td>
98+
</tr>
99+
<tr>
100+
<td>Yes</td><td>42.66</td><td>81.01</td><td>61.84</td>
101+
</tr>
102+
<tr>
103+
<td>UPerNet-ResNext101 (coming soon!)</td>
93104
<td>-</td><td>-</td><td>-</td><td>-</td>
94105
<td>- hours</td>
95106
</tr>
96107
</tbody></table>
97108

98-
The speed is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), except for ResNet-101_dilated8, which is benchmarked on a server with 8 NVIDIA Tesla P40 GPUS (22GB GPU memory), because of the insufficient memory issue when using dilated conv on a very deep network.
109+
The speed is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), ***except for*** ResNet-101_dilated8, which is benchmarked on a server with 8 NVIDIA Tesla P40 GPUS (22GB GPU memory), because of the insufficient memory issue when using dilated conv on a very deep network.
99110

100111
## Environment
101112
The code is developed under the following configurations.
102113
- Hardware: 2-8 GPUs (with at least 12G GPU memories) (change ```[--num_gpus NUM_GPUS]``` accordingly)
103-
- Software: Ubuntu 16.04.3 LTS, CUDA 8.0, ***Python3.5***, ***PyTorch 0.4.0***
114+
- Software: Ubuntu 16.04.3 LTS, CUDA 8.0, ***Python>=3.5***, ***PyTorch>=0.4.0***
104115

105116
*Warning:* We don't support the outdated Python 2 anymore. PyTorch 0.4.0 or higher is required to run the codes.
106117

@@ -134,11 +145,22 @@ usage: test.py [-h] --test_img TEST_IMG --model_path MODEL_PATH
134145
chmod +x download_ADE20K.sh
135146
./download_ADE20K.sh
136147
```
137-
2. Train a network (default: ResNet-50_dilated8 + ppm_bilinear_deepsup). During training, checkpoints will be saved in folder ```ckpt```.
148+
2. Train a default network (ResNet-50_dilated8 + ppm_bilinear_deepsup). During training, checkpoints will be saved in folder ```ckpt```.
138149
```bash
139150
python3 train.py --num_gpus NUM_GPUS
140151
```
141152
153+
Train a UPerNet (e.g., ResNet-50 or ResNet-101)
154+
```bash
155+
python3 train.py --num_gpus NUM_GPUS --arch_encoder resnet50 --arch_decoder upernet
156+
--segm_downsampling_rate 4 --padding_constant 32
157+
```
158+
or
159+
```bash
160+
python3 train.py --num_gpus NUM_GPUS --arch_encoder resnet101 --arch_decoder upernet
161+
--segm_downsampling_rate 4 --padding_constant 32
162+
```
163+
142164
3. Input arguments: (see full input arguments via ```python3 train.py -h ```)
143165
```bash
144166
usage: train.py [-h] [--id ID] [--arch_encoder ARCH_ENCODER]
@@ -163,10 +185,20 @@ usage: train.py [-h] [--id ID] [--arch_encoder ARCH_ENCODER]
163185
164186
165187
## Evaluation
166-
1. Evaluate a trained network on the validation set. Add ```--visualize``` option to output visualizations shown in teaser.
188+
1. Evaluate a trained network on the validation set. Add ```--visualize``` option to output visualizations as shown in teaser.
167189
```bash
168190
python3 eval.py --id MODEL_ID --suffix SUFFIX
169191
```
192+
Evaluate a UPerNet (e.g, UPerNet-50)
193+
```bash
194+
python3 eval.py --id MODEL_ID --suffix SUFFIX
195+
--arch_encoder resnet50 --arch_decoder upernet --padding_constant 32
196+
```
197+
198+
***We also provide a multi-GPU evaluation script.*** It is extremely easy to use. For example, to run the evaluation codes on 8 GPUs, simply add ```--device 0-7```. You can also choose which GPUs to use, for example, ```--device 0,2,4,6```.
199+
```bash
200+
python3 eval_multipro.py --id MODEL_ID --suffix SUFFIX --device DEVICE_ID
201+
```
170202
171203
2. Input arguments: (see full input arguments via ```python3 eval.py -h ```)
172204
```bash
@@ -176,8 +208,7 @@ usage: eval.py [-h] --id ID [--suffix SUFFIX] [--arch_encoder ARCH_ENCODER]
176208
[--num_val NUM_VAL] [--num_class NUM_CLASS]
177209
[--batch_size BATCH_SIZE] [--imgSize IMGSIZE]
178210
[--imgMaxSize IMGMAXSIZE] [--padding_constant PADDING_CONSTANT]
179-
[--segm_downsampling_rate SEGM_DOWNSAMPLING_RATE] [--ckpt CKPT]
180-
[--visualize] [--result RESULT] [--gpu_id GPU_ID]
211+
[--ckpt CKPT] [--visualize] [--result RESULT] [--gpu_id GPU_ID]
181212
```
182213
183214

dataset.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -158,14 +158,12 @@ def __len__(self):
158158

159159

160160
class ValDataset(torchdata.Dataset):
161-
def __init__(self, odgt, opt, max_sample=-1):
161+
def __init__(self, odgt, opt, max_sample=-1, start_idx=-1, end_idx=-1):
162162
self.root_dataset = opt.root_dataset
163163
self.imgSize = opt.imgSize
164164
self.imgMaxSize = opt.imgMaxSize
165165
# max down sampling rate of network to avoid rounding during conv or pooling
166166
self.padding_constant = opt.padding_constant
167-
# down sampling rate of segm labe
168-
self.segm_downsampling_rate = opt.segm_downsampling_rate
169167

170168
# mean and std
171169
self.img_transform = transforms.Compose([
@@ -176,11 +174,14 @@ def __init__(self, odgt, opt, max_sample=-1):
176174

177175
if max_sample > 0:
178176
self.list_sample = self.list_sample[0:max_sample]
177+
178+
if start_idx >= 0 and end_idx >= 0: # divide file list
179+
self.list_sample = self.list_sample[start_idx:end_idx]
180+
179181
self.num_sample = len(self.list_sample)
180182
assert self.num_sample > 0
181183
print('# samples: {}'.format(self.num_sample))
182184

183-
184185
def __getitem__(self, index):
185186
this_record = self.list_sample[index]
186187
# load image and label

eval.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,8 +169,6 @@ def main(args):
169169
help='maximum input image size of long edge')
170170
parser.add_argument('--padding_constant', default=8, type=int,
171171
help='maxmimum downsampling rate of the network')
172-
parser.add_argument('--segm_downsampling_rate', default=8, type=int,
173-
help='downsampling rate of the segmentation label')
174172

175173
# Misc arguments
176174
parser.add_argument('--ckpt', default='./ckpt',

0 commit comments

Comments
 (0)