Skip to content

Commit 7118eea

Browse files
[Feature] Add segformer‘s benchmark on cityscapes (open-mmlab#1155)
* add segformer cityscapes' benchmark * Update configs/segformer/README.md Co-authored-by: Junjun2016 <[email protected]> Co-authored-by: Junjun2016 <[email protected]>
1 parent 98a353b commit 7118eea

8 files changed

+221
-0
lines changed

configs/segformer/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,3 +93,16 @@ test_pipeline = [
9393
])
9494
]
9595
```
96+
97+
### Cityscapes
98+
99+
The lower fps result is caused by the sliding window inference scheme (window size:1024x1024).
100+
101+
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
102+
| ------ | -------- | --------- | ------: | -------: | -------------- | ---: | ------------- | ------ | -------- |
103+
|Segformer | MIT-B0 | 1024x1024 | 160000 | 3.64 | 4.74 | 76.54 | 78.22 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b0_8x1_1024x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b0_8x1_1024x1024_160k_cityscapes/segformer_mit-b0_8x1_1024x1024_160k_cityscapes_20211208_101857-e7f88502.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b0_8x1_1024x1024_160k_cityscapes/segformer_mit-b0_8x1_1024x1024_160k_cityscapes_20211208_101857.log.json) |
104+
|Segformer | MIT-B1 | 1024x1024 | 160000 | 4.49 | 4.3 | 78.56 | 79.73 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b1_8x1_1024x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b1_8x1_1024x1024_160k_cityscapes/segformer_mit-b1_8x1_1024x1024_160k_cityscapes_20211208_064213-655c7b3f.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b1_8x1_1024x1024_160k_cityscapes/segformer_mit-b1_8x1_1024x1024_160k_cityscapes_20211208_064213.log.json) |
105+
|Segformer | MIT-B2 | 1024x1024 | 160000 | 7.42 | 3.36 | 81.08 | 82.18 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b2_8x1_1024x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b2_8x1_1024x1024_160k_cityscapes/segformer_mit-b2_8x1_1024x1024_160k_cityscapes_20211207_134205-6096669a.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b2_8x1_1024x1024_160k_cityscapes/segformer_mit-b2_8x1_1024x1024_160k_cityscapes_20211207_134205.log.json) |
106+
|Segformer | MIT-B3 | 1024x1024 | 160000 | 10.86 | 2.53 | 81.94 | 83.14 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b3_8x1_1024x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b3_8x1_1024x1024_160k_cityscapes/segformer_mit-b3_8x1_1024x1024_160k_cityscapes_20211206_224823-a8f8a177.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b3_8x1_1024x1024_160k_cityscapes/segformer_mit-b3_8x1_1024x1024_160k_cityscapes_20211206_224823.log.json) |
107+
|Segformer | MIT-B4 | 1024x1024 | 160000 | 15.07 | 1.88 | 81.89 | 83.38 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b4_8x1_1024x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b4_8x1_1024x1024_160k_cityscapes/segformer_mit-b4_8x1_1024x1024_160k_cityscapes_20211207_080709-07f6c333.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b4_8x1_1024x1024_160k_cityscapes/segformer_mit-b4_8x1_1024x1024_160k_cityscapes_20211207_080709.log.json) |
108+
|Segformer | MIT-B5 | 1024x1024 | 160000 | 18.00 | 1.39 | 82.25 | 83.48 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segformer/segformer_mit-b5_8x1_1024x1024_160k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_8x1_1024x1024_160k_cityscapes/segformer_mit-b5_8x1_1024x1024_160k_cityscapes_20211206_072934-87a052ec.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_8x1_1024x1024_160k_cityscapes/segformer_mit-b5_8x1_1024x1024_160k_cityscapes_20211206_072934.log.json) |

configs/segformer/segformer.yml

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ Collections:
33
Metadata:
44
Training Data:
55
- ADE20K
6+
- Cityscapes
67
Paper:
78
URL: https://arxiv.org/abs/2105.15203
89
Title: resize image to multiple of 32, improve SegFormer by 0.5-1.0 mIoU.
@@ -167,3 +168,135 @@ Models:
167168
mIoU(ms+flip): 50.36
168169
Config: configs/segformer/segformer_mit-b5_640x640_160k_ade20k.py
169170
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_640x640_160k_ade20k/segformer_mit-b5_640x640_160k_ade20k_20210801_121243-41d2845b.pth
171+
- Name: segformer_mit-b0_8x1_1024x1024_160k_cityscapes
172+
In Collection: segformer
173+
Metadata:
174+
backbone: MIT-B0
175+
crop size: (1024,1024)
176+
lr schd: 160000
177+
inference time (ms/im):
178+
- value: 210.97
179+
hardware: V100
180+
backend: PyTorch
181+
batch size: 1
182+
mode: FP32
183+
resolution: (1024,1024)
184+
Training Memory (GB): 3.64
185+
Results:
186+
- Task: Semantic Segmentation
187+
Dataset: Cityscapes
188+
Metrics:
189+
mIoU: 76.54
190+
mIoU(ms+flip): 78.22
191+
Config: configs/segformer/segformer_mit-b0_8x1_1024x1024_160k_cityscapes.py
192+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b0_8x1_1024x1024_160k_cityscapes/segformer_mit-b0_8x1_1024x1024_160k_cityscapes_20211208_101857-e7f88502.pth
193+
- Name: segformer_mit-b1_8x1_1024x1024_160k_cityscapes
194+
In Collection: segformer
195+
Metadata:
196+
backbone: MIT-B1
197+
crop size: (1024,1024)
198+
lr schd: 160000
199+
inference time (ms/im):
200+
- value: 232.56
201+
hardware: V100
202+
backend: PyTorch
203+
batch size: 1
204+
mode: FP32
205+
resolution: (1024,1024)
206+
Training Memory (GB): 4.49
207+
Results:
208+
- Task: Semantic Segmentation
209+
Dataset: Cityscapes
210+
Metrics:
211+
mIoU: 78.56
212+
mIoU(ms+flip): 79.73
213+
Config: configs/segformer/segformer_mit-b1_8x1_1024x1024_160k_cityscapes.py
214+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b1_8x1_1024x1024_160k_cityscapes/segformer_mit-b1_8x1_1024x1024_160k_cityscapes_20211208_064213-655c7b3f.pth
215+
- Name: segformer_mit-b2_8x1_1024x1024_160k_cityscapes
216+
In Collection: segformer
217+
Metadata:
218+
backbone: MIT-B2
219+
crop size: (1024,1024)
220+
lr schd: 160000
221+
inference time (ms/im):
222+
- value: 297.62
223+
hardware: V100
224+
backend: PyTorch
225+
batch size: 1
226+
mode: FP32
227+
resolution: (1024,1024)
228+
Training Memory (GB): 7.42
229+
Results:
230+
- Task: Semantic Segmentation
231+
Dataset: Cityscapes
232+
Metrics:
233+
mIoU: 81.08
234+
mIoU(ms+flip): 82.18
235+
Config: configs/segformer/segformer_mit-b2_8x1_1024x1024_160k_cityscapes.py
236+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b2_8x1_1024x1024_160k_cityscapes/segformer_mit-b2_8x1_1024x1024_160k_cityscapes_20211207_134205-6096669a.pth
237+
- Name: segformer_mit-b3_8x1_1024x1024_160k_cityscapes
238+
In Collection: segformer
239+
Metadata:
240+
backbone: MIT-B3
241+
crop size: (1024,1024)
242+
lr schd: 160000
243+
inference time (ms/im):
244+
- value: 395.26
245+
hardware: V100
246+
backend: PyTorch
247+
batch size: 1
248+
mode: FP32
249+
resolution: (1024,1024)
250+
Training Memory (GB): 10.86
251+
Results:
252+
- Task: Semantic Segmentation
253+
Dataset: Cityscapes
254+
Metrics:
255+
mIoU: 81.94
256+
mIoU(ms+flip): 83.14
257+
Config: configs/segformer/segformer_mit-b3_8x1_1024x1024_160k_cityscapes.py
258+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b3_8x1_1024x1024_160k_cityscapes/segformer_mit-b3_8x1_1024x1024_160k_cityscapes_20211206_224823-a8f8a177.pth
259+
- Name: segformer_mit-b4_8x1_1024x1024_160k_cityscapes
260+
In Collection: segformer
261+
Metadata:
262+
backbone: MIT-B4
263+
crop size: (1024,1024)
264+
lr schd: 160000
265+
inference time (ms/im):
266+
- value: 531.91
267+
hardware: V100
268+
backend: PyTorch
269+
batch size: 1
270+
mode: FP32
271+
resolution: (1024,1024)
272+
Training Memory (GB): 15.07
273+
Results:
274+
- Task: Semantic Segmentation
275+
Dataset: Cityscapes
276+
Metrics:
277+
mIoU: 81.89
278+
mIoU(ms+flip): 83.38
279+
Config: configs/segformer/segformer_mit-b4_8x1_1024x1024_160k_cityscapes.py
280+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b4_8x1_1024x1024_160k_cityscapes/segformer_mit-b4_8x1_1024x1024_160k_cityscapes_20211207_080709-07f6c333.pth
281+
- Name: segformer_mit-b5_8x1_1024x1024_160k_cityscapes
282+
In Collection: segformer
283+
Metadata:
284+
backbone: MIT-B5
285+
crop size: (1024,1024)
286+
lr schd: 160000
287+
inference time (ms/im):
288+
- value: 719.42
289+
hardware: V100
290+
backend: PyTorch
291+
batch size: 1
292+
mode: FP32
293+
resolution: (1024,1024)
294+
Training Memory (GB): 18.0
295+
Results:
296+
- Task: Semantic Segmentation
297+
Dataset: Cityscapes
298+
Metrics:
299+
mIoU: 82.25
300+
mIoU(ms+flip): 83.48
301+
Config: configs/segformer/segformer_mit-b5_8x1_1024x1024_160k_cityscapes.py
302+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segformer/segformer_mit-b5_8x1_1024x1024_160k_cityscapes/segformer_mit-b5_8x1_1024x1024_160k_cityscapes_20211206_072934-87a052ec.pth
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
_base_ = [
2+
'../_base_/models/segformer_mit-b0.py',
3+
'../_base_/datasets/cityscapes_1024x1024.py',
4+
'../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
5+
]
6+
7+
model = dict(
8+
backbone=dict(
9+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mit_b0.pth')),
10+
test_cfg=dict(mode='slide', crop_size=(1024, 1024), stride=(768, 768)))
11+
12+
# optimizer
13+
optimizer = dict(
14+
_delete_=True,
15+
type='AdamW',
16+
lr=0.00006,
17+
betas=(0.9, 0.999),
18+
weight_decay=0.01,
19+
paramwise_cfg=dict(
20+
custom_keys={
21+
'pos_block': dict(decay_mult=0.),
22+
'norm': dict(decay_mult=0.),
23+
'head': dict(lr_mult=10.)
24+
}))
25+
26+
lr_config = dict(
27+
_delete_=True,
28+
policy='poly',
29+
warmup='linear',
30+
warmup_iters=1500,
31+
warmup_ratio=1e-6,
32+
power=1.0,
33+
min_lr=0.0,
34+
by_epoch=False)
35+
36+
data = dict(samples_per_gpu=1, workers_per_gpu=1)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
_base_ = ['./segformer_mit-b0_8x1_1024x1024_160k_cityscapes.py']
2+
3+
model = dict(
4+
backbone=dict(
5+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mit_b1.pth'),
6+
embed_dims=64),
7+
decode_head=dict(in_channels=[64, 128, 320, 512]))
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
_base_ = ['./segformer_mit-b0_8x1_1024x1024_160k_cityscapes.py']
2+
3+
model = dict(
4+
backbone=dict(
5+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mit_b2.pth'),
6+
embed_dims=64,
7+
num_layers=[3, 4, 6, 3]),
8+
decode_head=dict(in_channels=[64, 128, 320, 512]))
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
_base_ = ['./segformer_mit-b0_8x1_1024x1024_160k_cityscapes.py']
2+
3+
model = dict(
4+
backbone=dict(
5+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mit_b3.pth'),
6+
embed_dims=64,
7+
num_layers=[3, 4, 18, 3]),
8+
decode_head=dict(in_channels=[64, 128, 320, 512]))
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
_base_ = ['./segformer_mit-b0_8x1_1024x1024_160k_cityscapes.py']
2+
3+
model = dict(
4+
backbone=dict(
5+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mit_b4.pth'),
6+
embed_dims=64,
7+
num_layers=[3, 8, 27, 3]),
8+
decode_head=dict(in_channels=[64, 128, 320, 512]))
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
_base_ = ['./segformer_mit-b0_8x1_1024x1024_160k_cityscapes.py']
2+
3+
model = dict(
4+
backbone=dict(
5+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mit_b5.pth'),
6+
embed_dims=64,
7+
num_layers=[3, 6, 40, 3]),
8+
decode_head=dict(in_channels=[64, 128, 320, 512]))

0 commit comments

Comments
 (0)