Skip to content

Commit 70477d2

Browse files
[NEW][Feature]Support SegNeXt(NeurIPS'2022) in master branch (#2600)
## Motivation Support SegNeXt. Due to many commits & changed files caused by WIP too long (perhaps it could be resolved by `git merge` or `git rebase`). This PR is created only for backup of old PR #2247 Co-authored-by: MeowZheng <[email protected]> Co-authored-by: Miao Zheng <[email protected]>
1 parent b2fdae7 commit 70477d2

15 files changed

+1216
-2
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ Supported backbones:
145145
- [x] [ConvNeXt (CVPR'2022)](configs/convnext)
146146
- [x] [MAE (CVPR'2022)](configs/mae)
147147
- [x] [PoolFormer (CVPR'2022)](configs/poolformer)
148+
- [x] [SegNeXt (NeurIPS'2022)](configs/segnext)
148149

149150
Supported methods:
150151

README_zh-CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ MMSegmentation 是一个基于 PyTorch 的语义分割开源工具箱。它是 O
128128
- [x] [ConvNeXt (CVPR'2022)](configs/convnext)
129129
- [x] [MAE (CVPR'2022)](configs/mae)
130130
- [x] [PoolFormer (CVPR'2022)](configs/poolformer)
131+
- [x] [SegNeXt (NeurIPS'2022)](configs/segnext)
131132

132133
已支持的算法:
133134

configs/segnext/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# SegNeXt
2+
3+
[SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation](https://arxiv.org/abs/2209.08575)
4+
5+
## Introduction
6+
7+
<!-- [ALGORITHM] -->
8+
9+
<a href="https://github.com/visual-attention-network/segnext">Official Repo</a>
10+
11+
<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.31.0/mmseg/models/backbones/mscan.py#L328">Code Snippet</a>
12+
13+
## Abstract
14+
15+
<!-- [ABSTRACT] -->
16+
17+
We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of semantic segmentation due to the efficiency of self-attention in encoding spatial information. In this paper, we show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers. By re-examining the characteristics owned by successful segmentation models, we discover several key components leading to the performance improvement of segmentation models. This motivates us to design a novel convolutional attention network that uses cheap convolutional operations. Without bells and whistles, our SegNeXt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and iSAID. Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90.6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, SegNeXt achieves about 2.0% mIoU improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations. Code is available at [this https URL](https://github.com/uyzhang/JSeg) (Jittor) and [this https URL](https://github.com/Visual-Attention-Network/SegNeXt) (Pytorch).
18+
19+
<!-- [IMAGE] -->
20+
21+
<div align=center>
22+
<img src="https://user-images.githubusercontent.com/24582831/215688018-5d4c8366-7793-4fdf-9397-960a09fac951.png" width="70%"/>
23+
</div>
24+
25+
```bibtex
26+
@article{guo2022segnext,
27+
title={SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation},
28+
author={Guo, Meng-Hao and Lu, Cheng-Ze and Hou, Qibin and Liu, Zhengning and Cheng, Ming-Ming and Hu, Shi-Min},
29+
journal={arXiv preprint arXiv:2209.08575},
30+
year={2022}
31+
}
32+
```
33+
34+
## Pretrained model
35+
36+
The pretrained model could be found [here](https://cloud.tsinghua.edu.cn/d/c15b25a6745946618462/) from [original repo](https://github.com/Visual-Attention-Network/SegNeXt). You can download and put them in `./pretrain` folder.
37+
38+
## Results and models
39+
40+
### ADE20K
41+
42+
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
43+
| ------- | -------- | --------- | ------- | -------- | -------------- | ----- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
44+
| SegNeXt | MSCAN-T | 512x512 | 160000 | 17.88 | 52.38 | 41.50 | 42.59 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k_20230210_140244-05bd8466.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k_20230210_140244.log.json) |
45+
| SegNeXt | MSCAN-S | 512x512 | 160000 | 21.47 | 42.27 | 44.16 | 45.81 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k_20230214_113014-43013668.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k_20230214_113014.log.json) |
46+
| SegNeXt | MSCAN-B | 512x512 | 160000 | 31.03 | 35.15 | 48.03 | 49.68 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k_20230209_172053-b6f6c70c.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k_20230209_172053.log.json) |
47+
| SegNeXt | MSCAN-L | 512x512 | 160000 | 43.32 | 22.91 | 50.99 | 52.10 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k_20230209_172055-19b14b63.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k_20230209_172055.log.json) |
48+
49+
Note:
50+
51+
- The total batch size is 16. We trained for SegNeXt with a single GPU as the performance degrades significantly when using`SyncBN` (mainly in `OverlapPatchEmbed` modules of `MSCAN`) of PyTorch 1.9.
52+
53+
- There will be subtle differences when model testing as Non-negative Matrix Factorization (NMF) in `LightHamHead` will be initialized randomly. To control this randomness, please set the random seed when model testing. You can modify [`./tools/test.py`](https://github.com/open-mmlab/mmsegmentation/blob/master/tools/test.py) like:
54+
55+
```python
56+
def main():
57+
from mmseg.apis import set_random_seed
58+
random_seed = xxx # set random seed recorded in training log
59+
set_random_seed(random_seed, deterministic=False)
60+
...
61+
```
62+
63+
- This model performance is sensitive to the seed values used, please refer to the log file for the specific settings of the seed. If you choose a different seed, the results might differ from the table results. Take SegNeXt Large for example, its results range from 49.60 to 51.0.

configs/segnext/segnext.yml

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
Collections:
2+
- Name: SegNeXt
3+
Metadata:
4+
Training Data:
5+
- ADE20K
6+
Paper:
7+
URL: https://arxiv.org/abs/2209.08575
8+
Title: 'SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation'
9+
README: configs/segnext/README.md
10+
Code:
11+
URL: https://github.com/open-mmlab/mmsegmentation/blob/v0.31.0/mmseg/models/backbones/mscan.py#L328
12+
Version: v0.31.0
13+
Converted From:
14+
Code: https://github.com/visual-attention-network/segnext
15+
Models:
16+
- Name: segnext_mscan-t_1x16_512x512_adamw_160k_ade20k
17+
In Collection: SegNeXt
18+
Metadata:
19+
backbone: MSCAN-T
20+
crop size: (512,512)
21+
lr schd: 160000
22+
inference time (ms/im):
23+
- value: 19.09
24+
hardware: A100
25+
backend: PyTorch
26+
batch size: 1
27+
mode: FP32
28+
resolution: (512,512)
29+
Training Memory (GB): 17.88
30+
Results:
31+
- Task: Semantic Segmentation
32+
Dataset: ADE20K
33+
Metrics:
34+
mIoU: 41.5
35+
mIoU(ms+flip): 42.59
36+
Config: configs/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py
37+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k_20230210_140244-05bd8466.pth
38+
- Name: segnext_mscan-s_1x16_512x512_adamw_160k_ade20k
39+
In Collection: SegNeXt
40+
Metadata:
41+
backbone: MSCAN-S
42+
crop size: (512,512)
43+
lr schd: 160000
44+
inference time (ms/im):
45+
- value: 23.66
46+
hardware: A100
47+
backend: PyTorch
48+
batch size: 1
49+
mode: FP32
50+
resolution: (512,512)
51+
Training Memory (GB): 21.47
52+
Results:
53+
- Task: Semantic Segmentation
54+
Dataset: ADE20K
55+
Metrics:
56+
mIoU: 44.16
57+
mIoU(ms+flip): 45.81
58+
Config: configs/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k.py
59+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k_20230214_113014-43013668.pth
60+
- Name: segnext_mscan-b_1x16_512x512_adamw_160k_ade20k
61+
In Collection: SegNeXt
62+
Metadata:
63+
backbone: MSCAN-B
64+
crop size: (512,512)
65+
lr schd: 160000
66+
inference time (ms/im):
67+
- value: 28.45
68+
hardware: A100
69+
backend: PyTorch
70+
batch size: 1
71+
mode: FP32
72+
resolution: (512,512)
73+
Training Memory (GB): 31.03
74+
Results:
75+
- Task: Semantic Segmentation
76+
Dataset: ADE20K
77+
Metrics:
78+
mIoU: 48.03
79+
mIoU(ms+flip): 49.68
80+
Config: configs/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k.py
81+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k_20230209_172053-b6f6c70c.pth
82+
- Name: segnext_mscan-l_1x16_512x512_adamw_160k_ade20k
83+
In Collection: SegNeXt
84+
Metadata:
85+
backbone: MSCAN-L
86+
crop size: (512,512)
87+
lr schd: 160000
88+
inference time (ms/im):
89+
- value: 43.65
90+
hardware: A100
91+
backend: PyTorch
92+
batch size: 1
93+
mode: FP32
94+
resolution: (512,512)
95+
Training Memory (GB): 43.32
96+
Results:
97+
- Task: Semantic Segmentation
98+
Dataset: ADE20K
99+
Metrics:
100+
mIoU: 50.99
101+
mIoU(ms+flip): 52.1
102+
Config: configs/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k.py
103+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k_20230209_172055-19b14b63.pth
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
_base_ = './segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py'
2+
# model settings
3+
ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
4+
model = dict(
5+
type='EncoderDecoder',
6+
backbone=dict(
7+
embed_dims=[64, 128, 320, 512],
8+
depths=[3, 3, 12, 3],
9+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mscan_b.pth'),
10+
drop_path_rate=0.1,
11+
norm_cfg=dict(type='BN', requires_grad=True)),
12+
decode_head=dict(
13+
type='LightHamHead',
14+
in_channels=[128, 320, 512],
15+
in_index=[1, 2, 3],
16+
channels=512,
17+
ham_channels=512,
18+
dropout_ratio=0.1,
19+
num_classes=150,
20+
norm_cfg=ham_norm_cfg,
21+
align_corners=False,
22+
loss_decode=dict(
23+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
24+
# model training and testing settings
25+
train_cfg=dict(),
26+
test_cfg=dict(mode='whole'))
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
_base_ = './segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py'
2+
# model settings
3+
ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
4+
model = dict(
5+
type='EncoderDecoder',
6+
backbone=dict(
7+
embed_dims=[64, 128, 320, 512],
8+
depths=[3, 5, 27, 3],
9+
init_cfg=dict(type='Pretrained', checkpoint='pretrain/mscan_l.pth'),
10+
drop_path_rate=0.3,
11+
norm_cfg=dict(type='BN', requires_grad=True)),
12+
decode_head=dict(
13+
type='LightHamHead',
14+
in_channels=[128, 320, 512],
15+
in_index=[1, 2, 3],
16+
channels=1024,
17+
ham_channels=1024,
18+
dropout_ratio=0.1,
19+
num_classes=150,
20+
norm_cfg=ham_norm_cfg,
21+
align_corners=False,
22+
loss_decode=dict(
23+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
24+
# model training and testing settings
25+
train_cfg=dict(),
26+
test_cfg=dict(mode='whole'))
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
_base_ = './segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py'
2+
# model settings
3+
ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
4+
model = dict(
5+
type='EncoderDecoder',
6+
backbone=dict(
7+
embed_dims=[64, 128, 320, 512],
8+
depths=[2, 2, 4, 2],
9+
init_cfg=dict(type='Pretrained', checkpoint='./pretrain/mscan_s.pth'),
10+
norm_cfg=dict(type='BN', requires_grad=True)),
11+
decode_head=dict(
12+
type='LightHamHead',
13+
in_channels=[128, 320, 512],
14+
in_index=[1, 2, 3],
15+
channels=256,
16+
ham_channels=256,
17+
ham_kwargs=dict(MD_R=16),
18+
dropout_ratio=0.1,
19+
num_classes=150,
20+
norm_cfg=ham_norm_cfg,
21+
align_corners=False,
22+
loss_decode=dict(
23+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
24+
# model training and testing settings
25+
train_cfg=dict(),
26+
test_cfg=dict(mode='whole'))
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
_base_ = [
2+
'../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
3+
]
4+
# model settings
5+
ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
6+
model = dict(
7+
type='EncoderDecoder',
8+
pretrained=None,
9+
backbone=dict(
10+
type='MSCAN',
11+
init_cfg=dict(type='Pretrained', checkpoint='./pretrain/mscan_t.pth'),
12+
embed_dims=[32, 64, 160, 256],
13+
mlp_ratios=[8, 8, 4, 4],
14+
drop_rate=0.0,
15+
drop_path_rate=0.1,
16+
depths=[3, 3, 5, 2],
17+
attention_kernel_sizes=[5, [1, 7], [1, 11], [1, 21]],
18+
attention_kernel_paddings=[2, [0, 3], [0, 5], [0, 10]],
19+
act_cfg=dict(type='GELU'),
20+
norm_cfg=dict(type='BN', requires_grad=True)),
21+
decode_head=dict(
22+
type='LightHamHead',
23+
in_channels=[64, 160, 256],
24+
in_index=[1, 2, 3],
25+
channels=256,
26+
ham_channels=256,
27+
dropout_ratio=0.1,
28+
num_classes=150,
29+
norm_cfg=ham_norm_cfg,
30+
align_corners=False,
31+
loss_decode=dict(
32+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
33+
ham_kwargs=dict(
34+
MD_S=1,
35+
MD_R=16,
36+
train_steps=6,
37+
eval_steps=7,
38+
inv_t=100,
39+
rand_init=True)),
40+
# model training and testing settings
41+
train_cfg=dict(),
42+
test_cfg=dict(mode='whole'))
43+
44+
# dataset settings
45+
dataset_type = 'ADE20KDataset'
46+
data_root = 'data/ade/ADEChallengeData2016'
47+
img_norm_cfg = dict(
48+
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
49+
crop_size = (512, 512)
50+
train_pipeline = [
51+
dict(type='LoadImageFromFile'),
52+
dict(type='LoadAnnotations', reduce_zero_label=True),
53+
dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
54+
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
55+
dict(type='RandomFlip', prob=0.5),
56+
dict(type='PhotoMetricDistortion'),
57+
dict(type='Normalize', **img_norm_cfg),
58+
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
59+
dict(type='DefaultFormatBundle'),
60+
dict(type='Collect', keys=['img', 'gt_semantic_seg']),
61+
]
62+
test_pipeline = [
63+
dict(type='LoadImageFromFile'),
64+
dict(
65+
type='MultiScaleFlipAug',
66+
img_scale=(2048, 512),
67+
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
68+
flip=False,
69+
transforms=[
70+
dict(type='Resize', keep_ratio=True),
71+
dict(type='ResizeToMultiple', size_divisor=32),
72+
dict(type='RandomFlip'),
73+
dict(type='Normalize', **img_norm_cfg),
74+
dict(type='ImageToTensor', keys=['img']),
75+
dict(type='Collect', keys=['img']),
76+
])
77+
]
78+
data = dict(
79+
samples_per_gpu=16,
80+
workers_per_gpu=4,
81+
train=dict(
82+
type='RepeatDataset',
83+
times=50,
84+
dataset=dict(
85+
type=dataset_type,
86+
data_root=data_root,
87+
img_dir='images/training',
88+
ann_dir='annotations/training',
89+
pipeline=train_pipeline)),
90+
val=dict(
91+
type=dataset_type,
92+
data_root=data_root,
93+
img_dir='images/validation',
94+
ann_dir='annotations/validation',
95+
pipeline=test_pipeline),
96+
test=dict(
97+
type=dataset_type,
98+
data_root=data_root,
99+
img_dir='images/validation',
100+
ann_dir='annotations/validation',
101+
pipeline=test_pipeline))
102+
103+
# optimizer
104+
optimizer = dict(
105+
_delete_=True,
106+
type='AdamW',
107+
lr=0.00006,
108+
betas=(0.9, 0.999),
109+
weight_decay=0.01,
110+
paramwise_cfg=dict(
111+
custom_keys={
112+
'pos_block': dict(decay_mult=0.),
113+
'norm': dict(decay_mult=0.),
114+
'head': dict(lr_mult=10.)
115+
}))
116+
117+
lr_config = dict(
118+
_delete_=True,
119+
policy='poly',
120+
warmup='linear',
121+
warmup_iters=1500,
122+
warmup_ratio=1e-6,
123+
power=1.0,
124+
min_lr=0.0,
125+
by_epoch=False)

0 commit comments

Comments
 (0)