Skip to content

Commit 737544f

Browse files
author
谢昕辰
authored
add configs for vit backbone plus decode_heads (open-mmlab#520)
* add config * add cityscapes config * add default value to docstring * fix lint * add deit-s and deit-b * add readme * add eps at norm_cfg * add drop_path_rate experiment * add deit case at init_weight * add upernet result * update result and add upernet 160k config * update upernet result and fix settings * Update iters number * update result and delete some configs * fix import error * fix drop_path_rate * update result and restore config * update benchmark result * remove cityscapes exp * remove neck * neck exp * add more configs * fix init error * fix ffn setting * update result * update results * update result * update results and fill table * delete or rename configs * fix link delimiter * rename configs and fix link * rename neck to mln
1 parent 36c8144 commit 737544f

15 files changed

+270
-2
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# model settings
2+
norm_cfg = dict(type='SyncBN', requires_grad=True)
3+
model = dict(
4+
type='EncoderDecoder',
5+
pretrained='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth', # noqa
6+
backbone=dict(
7+
type='VisionTransformer',
8+
img_size=(512, 512),
9+
patch_size=16,
10+
in_channels=3,
11+
embed_dims=768,
12+
num_layers=12,
13+
num_heads=12,
14+
mlp_ratio=4,
15+
out_indices=(2, 5, 8, 11),
16+
qkv_bias=True,
17+
drop_rate=0.0,
18+
attn_drop_rate=0.0,
19+
drop_path_rate=0.0,
20+
with_cls_token=True,
21+
norm_cfg=dict(type='LN', eps=1e-6),
22+
act_cfg=dict(type='GELU'),
23+
norm_eval=False,
24+
out_shape='NCHW',
25+
interpolate_mode='bicubic'),
26+
neck=dict(
27+
type='MultiLevelNeck',
28+
in_channels=[768, 768, 768, 768],
29+
out_channels=768,
30+
scales=[4, 2, 1, 0.5]),
31+
decode_head=dict(
32+
type='UPerHead',
33+
in_channels=[768, 768, 768, 768],
34+
in_index=[0, 1, 2, 3],
35+
pool_scales=(1, 2, 3, 6),
36+
channels=512,
37+
dropout_ratio=0.1,
38+
num_classes=19,
39+
norm_cfg=norm_cfg,
40+
align_corners=False,
41+
loss_decode=dict(
42+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
43+
auxiliary_head=dict(
44+
type='FCNHead',
45+
in_channels=768,
46+
in_index=3,
47+
channels=256,
48+
num_convs=1,
49+
concat_input=False,
50+
dropout_ratio=0.1,
51+
num_classes=19,
52+
norm_cfg=norm_cfg,
53+
align_corners=False,
54+
loss_decode=dict(
55+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
56+
# model training and testing settings
57+
train_cfg=dict(),
58+
test_cfg=dict(mode='whole')) # yapf: disable

configs/vit/README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Vision Transformer
2+
3+
## Introduction
4+
5+
<!-- [ALGORITHM] -->
6+
7+
```latex
8+
@article{dosoViTskiy2020,
9+
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
10+
author={DosoViTskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
11+
journal={arXiv preprint arXiv:2010.11929},
12+
year={2020}
13+
}
14+
```
15+
16+
## Results and models
17+
18+
### ADE20K
19+
20+
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
21+
| ------- | -------- | --------- | ------: | -------- | -------------- | ----: | ------------: | ---------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
22+
| UPerNet | ViT-B + MLN | 512x512 | 80000 | 9.20 | 6.94 | 47.71 | 49.51 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_vit-b16_mln_512x512_80k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_vit-b16_mln_512x512_80k_ade20k/upernet_vit-b16_mln_512x512_80k_ade20k-0403cee1.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_vit-b16_mln_512x512_80k_ade20k/20210624_130547.log.json) |
23+
| UPerNet | ViT-B + MLN | 512x512 | 160000 | 9.20 | 7.58 | 46.75 | 48.46 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_vit-b16_mln_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_vit-b16_mln_512x512_160k_ade20k/upernet_vit-b16_mln_512x512_160k_ade20k-852fa768.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_vit-b16_mln_512x512_160k_ade20k/20210623_192432.log.json) |
24+
| UPerNet | ViT-B + LN + MLN | 512x512 | 160000 | 9.21 | 6.82 | 47.73 | 49.95 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_vit-b16_ln_mln_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_vit-b16_ln_mln_512x512_160k_ade20k/upernet_vit-b16_ln_mln_512x512_160k_ade20k-f444c077.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_vit-b16_ln_mln_512x512_160k_ade20k/20210621_172828.log.json) |
25+
| UPerNet | DeiT-S | 512x512 | 80000 | 4.68 | 29.85 | 42.96 | 43.79 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-s16_512x512_80k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_512x512_80k_ade20k/upernet_deit-s16_512x512_80k_ade20k-afc93ec2.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_512x512_80k_ade20k/20210624_095228.log.json) |
26+
| UPerNet | DeiT-S | 512x512 | 160000 | 4.68 | 29.19 | 42.87 | 43.79 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-s16_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_512x512_160k_ade20k/upernet_deit-s16_512x512_160k_ade20k-5110d916.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_512x512_160k_ade20k/20210621_160903.log.json) |
27+
| UPerNet | DeiT-S + MLN | 512x512 | 160000 | 5.69 | 11.18 | 43.82 | 45.07 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-s16_mln_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_mln_512x512_160k_ade20k/upernet_deit-s16_mln_512x512_160k_ade20k-fb9a5dfb.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_mln_512x512_160k_ade20k/20210621_161021.log.json) |
28+
| UPerNet | DeiT-S + LN + MLN | 512x512 | 160000 | 5.69 | 12.39 | 43.52 | 45.01 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-s16_ln_mln_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_ln_mln_512x512_160k_ade20k/upernet_deit-s16_ln_mln_512x512_160k_ade20k-c0cd652f.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-s16_ln_mln_512x512_160k_ade20k/20210621_161021.log.json) |
29+
| UPerNet | DeiT-B | 512x512 | 80000 | 7.75 | 9.69 | 45.24 | 46.73 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-b16_512x512_80k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_512x512_80k_ade20k/upernet_deit-b16_512x512_80k_ade20k-1e090789.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_512x512_80k_ade20k/20210624_130529.log.json) |
30+
| UPerNet | DeiT-B | 512x512 | 160000 | 7.75 | 10.39 | 45.36 | 47.16 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-b16_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_512x512_160k_ade20k/upernet_deit-b16_512x512_160k_ade20k-828705d7.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_512x512_160k_ade20k/20210621_180100.log.json) |
31+
| UPerNet | DeiT-B + MLN | 512x512 | 160000 | 9.21 | 7.78 | 45.46 | 47.16 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-b16_mln_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_mln_512x512_160k_ade20k/upernet_deit-b16_mln_512x512_160k_ade20k-4e1450f3.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_mln_512x512_160k_ade20k/20210621_191949.log.json) |
32+
| UPerNet | DeiT-B + LN + MLN | 512x512 | 160000 | 9.21 | 7.75 | 45.37 | 47.23 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/vit/upernet_deit-b16_ln_mln_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_ln_mln_512x512_160k_ade20k/upernet_deit-b16_ln_mln_512x512_160k_ade20k-8a959c14.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/vit/upernet_deit-b16_ln_mln_512x512_160k_ade20k/20210623_153535.log.json) |
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_160k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', # noqa
5+
backbone=dict(drop_path_rate=0.1),
6+
neck=None) # yapf: disable
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_80k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', # noqa
5+
backbone=dict(drop_path_rate=0.1),
6+
neck=None) # yapf: disable
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_160k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', # noqa
5+
backbone=dict(drop_path_rate=0.1, final_norm=True)) # yapf: disable
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_160k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', # noqa
5+
backbone=dict(drop_path_rate=0.1),) # yapf: disable
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_160k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth', # noqa
5+
backbone=dict(num_heads=6, embed_dims=384, drop_path_rate=0.1),
6+
decode_head=dict(num_classes=150, in_channels=[384, 384, 384, 384]),
7+
neck=None,
8+
auxiliary_head=dict(num_classes=150, in_channels=384)) # yapf: disable
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_80k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth', # noqa
5+
backbone=dict(num_heads=6, embed_dims=384, drop_path_rate=0.1),
6+
decode_head=dict(num_classes=150, in_channels=[384, 384, 384, 384]),
7+
neck=None,
8+
auxiliary_head=dict(num_classes=150, in_channels=384)) # yapf: disable
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_160k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth', # noqa
5+
backbone=dict(
6+
num_heads=6,
7+
embed_dims=384,
8+
drop_path_rate=0.1,
9+
final_norm=True),
10+
decode_head=dict(num_classes=150, in_channels=[384, 384, 384, 384]),
11+
neck=dict(in_channels=[384, 384, 384, 384], out_channels=384),
12+
auxiliary_head=dict(num_classes=150, in_channels=384)) # yapf: disable
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
_base_ = './upernet_vit-b16_mln_512x512_160k_ade20k.py'
2+
3+
model = dict(
4+
pretrained='https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth', # noqa
5+
backbone=dict(num_heads=6, embed_dims=384, drop_path_rate=0.1),
6+
decode_head=dict(num_classes=150, in_channels=[384, 384, 384, 384]),
7+
neck=dict(in_channels=[384, 384, 384, 384], out_channels=384),
8+
auxiliary_head=dict(num_classes=150, in_channels=384)) # yapf: disable

0 commit comments

Comments
 (0)