Skip to content

Commit 6c746fa

Browse files
authored
[Feature] Add PoolFormer (CVPR'2022) (open-mmlab#1537)
* [Feature] Add PoolFormer (CVPR'2022) * Upload README.md, models and log.json * fix wrong base config name in config file * refactor alignresize * delete align_resize.py * change config name * use ResizetoMultiple to replace AlignResize * update readme * fix config bug * resolve conflict
1 parent ee25adc commit 6c746fa

11 files changed

+334
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ Supported backbones:
130130
- [x] [BEiT (ICLR'2022)](configs/beit)
131131
- [x] [ConvNeXt (CVPR'2022)](configs/convnext)
132132
- [x] [MAE (CVPR'2022)](configs/mae)
133+
- [x] [PoolFormer (CVPR'2022)](configs/poolformer)
133134

134135
Supported methods:
135136

README_zh-CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ MMSegmentation 是一个基于 PyTorch 的语义分割开源工具箱。它是 O
127127
- [x] [BEiT (ICLR'2022)](configs/beit)
128128
- [x] [ConvNeXt (CVPR'2022)](configs/convnext)
129129
- [x] [MAE (CVPR'2022)](configs/mae)
130+
- [x] [PoolFormer (CVPR'2022)](configs/poolformer)
130131

131132
已支持的算法:
132133

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# model settings
2+
norm_cfg = dict(type='SyncBN', requires_grad=True)
3+
checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s12_3rdparty_32xb128_in1k_20220414-f8d83051.pth' # noqa
4+
custom_imports = dict(imports='mmcls.models', allow_failed_imports=False)
5+
model = dict(
6+
type='EncoderDecoder',
7+
backbone=dict(
8+
type='mmcls.PoolFormer',
9+
arch='s12',
10+
init_cfg=dict(
11+
type='Pretrained', checkpoint=checkpoint_file, prefix='backbone.'),
12+
in_patch_size=7,
13+
in_stride=4,
14+
in_pad=2,
15+
down_patch_size=3,
16+
down_stride=2,
17+
down_pad=1,
18+
drop_rate=0.,
19+
drop_path_rate=0.,
20+
out_indices=(0, 2, 4, 6),
21+
frozen_stages=0,
22+
),
23+
neck=dict(
24+
type='FPN',
25+
in_channels=[256, 512, 1024, 2048],
26+
out_channels=256,
27+
num_outs=4),
28+
decode_head=dict(
29+
type='FPNHead',
30+
in_channels=[256, 256, 256, 256],
31+
in_index=[0, 1, 2, 3],
32+
feature_strides=[4, 8, 16, 32],
33+
channels=128,
34+
dropout_ratio=0.1,
35+
num_classes=19,
36+
norm_cfg=norm_cfg,
37+
align_corners=False,
38+
loss_decode=dict(
39+
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
40+
# model training and testing settings
41+
train_cfg=dict(),
42+
test_cfg=dict(mode='whole'))

configs/poolformer/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# PoolFormer
2+
3+
[MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418)
4+
5+
## Introduction
6+
7+
<!-- [BACKBONE] -->
8+
9+
<a href="https://github.com/sail-sg/poolformer/tree/main/segmentation">Official Repo</a>
10+
11+
<a href="https://github.com/open-mmlab/mmclassification/blob/v0.23.0/mmcls/models/backbones/poolformer.py#L198">Code Snippet</a>
12+
13+
## Abstract
14+
15+
<!-- [ABSTRACT] -->
16+
17+
Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance. To verify this, we deliberately replace the attention module in transformers with an embarrassingly simple spatial pooling operator to conduct only the most basic token mixing. Surprisingly, we observe that the derived model, termed as PoolFormer, achieves competitive performance on multiple computer vision tasks. For example, on ImageNet-1K, PoolFormer achieves 82.1% top-1 accuracy, surpassing well-tuned vision transformer/MLP-like baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with 35%/52% fewer parameters and 48%/60% fewer MACs. The effectiveness of PoolFormer verifies our hypothesis and urges us to initiate the concept of "MetaFormer", a general architecture abstracted from transformers without specifying the token mixer. Based on the extensive experiments, we argue that MetaFormer is the key player in achieving superior results for recent transformer and MLP-like models on vision tasks. This work calls for more future research dedicated to improving MetaFormer instead of focusing on the token mixer modules. Additionally, our proposed PoolFormer could serve as a starting baseline for future MetaFormer architecture design. Code is available at [this https URL](https://github.com/sail-sg/poolformer)
18+
19+
<!-- [IMAGE] -->
20+
21+
<div align=center>
22+
<img src="https://user-images.githubusercontent.com/15921929/144710761-1635f59a-abde-4946-984c-a2c3f22a19d2.png" width="70%"/>
23+
</div>
24+
25+
## Citation
26+
27+
```bibtex
28+
@inproceedings{yu2022metaformer,
29+
title={Metaformer is actually what you need for vision},
30+
author={Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng},
31+
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
32+
pages={10819--10829},
33+
year={2022}
34+
}
35+
```
36+
37+
### Usage
38+
39+
- PoolFormer backbone needs to install [MMClassification](https://github.com/open-mmlab/mmclassification) first, which has abundant backbones for downstream tasks.
40+
41+
```shell
42+
pip install mmcls>=0.23.0
43+
```
44+
45+
- The pretrained models could also be downloaded from [PoolFormer config of MMClassification](https://github.com/open-mmlab/mmclassification/tree/master/configs/poolformer).
46+
47+
## Results and models
48+
49+
### ADE20K
50+
51+
| Method | Backbone | Crop Size | pretrain | Batch Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | mIoU\* | mIoU\*(ms+flip) | config | download |
52+
| ------ | -------------- | --------- | ----------- | ---------- | ------- | -------- | -------------- | ----- | ------------: | ------ | --------------: | ---------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
53+
| FPN | PoolFormer-S12 | 512x512 | ImageNet-1K | 32 | 40000 | 4.17 | 23.48 | 36.0 | 36.42 | 37.07 | 38.44 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/poolformer/fpn_poolformer_s12_8x4_512x512_40k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s12_8x4_512x512_40k_ade20k/fpn_poolformer_s12_8x4_512x512_40k_ade20k_20220501_115154-b5aa2f49.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s12_8x4_512x512_40k_ade20k/fpn_poolformer_s12_8x4_512x512_40k_ade20k_20220501_115154.log.json) |
54+
| FPN | PoolFormer-S24 | 512x512 | ImageNet-1K | 32 | 40000 | 5.47 | 15.74 | 39.35 | 39.73 | 40.36 | 41.08 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/poolformer/fpn_poolformer_s24_8x4_512x512_40k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s24_8x4_512x512_40k_ade20k/fpn_poolformer_s24_8x4_512x512_40k_ade20k_20220503_222049-394a7cf7.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s24_8x4_512x512_40k_ade20k/fpn_poolformer_s24_8x4_512x512_40k_ade20k_20220503_222049.log.json) |
55+
| FPN | PoolFormer-S36 | 512x512 | ImageNet-1K | 32 | 40000 | 6.77 | 11.34 | 40.64 | 40.99 | 41.81 | 42.72 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/poolformer/fpn_poolformer_s36_8x4_512x512_40k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s36_8x4_512x512_40k_ade20k/fpn_poolformer_s36_8x4_512x512_40k_ade20k_20220501_151122-b47e607d.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s36_8x4_512x512_40k_ade20k/fpn_poolformer_s36_8x4_512x512_40k_ade20k_20220501_151122.log.json) |
56+
| FPN | PoolFormer-M36 | 512x512 | ImageNet-1K | 32 | 40000 | 8.59 | 8.97 | 40.91 | 41.28 | 42.35 | 43.34 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/poolformer/fpn_poolformer_m36_8x4_512x512_40k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_m36_8x4_512x512_40k_ade20k/fpn_poolformer_m36_8x4_512x512_40k_ade20k_20220501_164230-3dc83921.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_m36_8x4_512x512_40k_ade20k/fpn_poolformer_m36_8x4_512x512_40k_ade20k_20220501_164230.log.json) |
57+
| FPN | PoolFormer-M48 | 512x512 | ImageNet-1K | 32 | 40000 | 10.48 | 6.69 | 41.82 | 42.2 | 42.76 | 43.57 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/poolformer/fpn_poolformer_m48_8x4_512x512_40k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_m48_8x4_512x512_40k_ade20k/fpn_poolformer_m48_8x4_512x512_40k_ade20k_20220504_003923-64168d3b.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_m48_8x4_512x512_40k_ade20k/fpn_poolformer_m48_8x4_512x512_40k_ade20k_20220504_003923.log.json) |
58+
59+
Note:
60+
61+
- We replace `AlignedResize` in original PoolFormer implementation to `Resize + ResizeToMultiple`.
62+
63+
- `mIoU` with * is collected when `Resize + ResizeToMultiple` is adopted in `test_pipeline`, so do `mIoU` in logs.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
_base_ = './fpn_poolformer_s12_8x4_512x512_40k_ade20k.py'
2+
checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-m36_3rdparty_32xb128_in1k_20220414-c55e0949.pth' # noqa
3+
4+
# model settings
5+
model = dict(
6+
backbone=dict(
7+
arch='m36',
8+
init_cfg=dict(
9+
type='Pretrained', checkpoint=checkpoint_file,
10+
prefix='backbone.')),
11+
neck=dict(in_channels=[96, 192, 384, 768]))
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
_base_ = './fpn_poolformer_s12_8x4_512x512_40k_ade20k.py'
2+
checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-m48_3rdparty_32xb128_in1k_20220414-9378f3eb.pth' # noqa
3+
4+
# model settings
5+
model = dict(
6+
backbone=dict(
7+
arch='m48',
8+
init_cfg=dict(
9+
type='Pretrained', checkpoint=checkpoint_file,
10+
prefix='backbone.')),
11+
neck=dict(in_channels=[96, 192, 384, 768]))
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
_base_ = [
2+
'../_base_/models/fpn_poolformer_s12.py', '../_base_/default_runtime.py',
3+
'../_base_/schedules/schedule_40k.py'
4+
]
5+
6+
# model settings
7+
model = dict(
8+
neck=dict(in_channels=[64, 128, 320, 512]),
9+
decode_head=dict(num_classes=150))
10+
11+
# optimizer
12+
optimizer = dict(_delete_=True, type='AdamW', lr=0.0002, weight_decay=0.0001)
13+
optimizer_config = dict()
14+
# learning policy
15+
lr_config = dict(policy='poly', power=0.9, min_lr=0.0, by_epoch=False)
16+
17+
# dataset settings
18+
dataset_type = 'ADE20KDataset'
19+
data_root = 'data/ade/ADEChallengeData2016'
20+
img_norm_cfg = dict(
21+
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
22+
crop_size = (512, 512)
23+
train_pipeline = [
24+
dict(type='LoadImageFromFile'),
25+
dict(type='LoadAnnotations', reduce_zero_label=True),
26+
dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
27+
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
28+
dict(type='RandomFlip', prob=0.5),
29+
dict(type='PhotoMetricDistortion'),
30+
dict(type='Normalize', **img_norm_cfg),
31+
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
32+
dict(type='DefaultFormatBundle'),
33+
dict(type='Collect', keys=['img', 'gt_semantic_seg']),
34+
]
35+
test_pipeline = [
36+
dict(type='LoadImageFromFile'),
37+
dict(
38+
type='MultiScaleFlipAug',
39+
img_scale=(2048, 512),
40+
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
41+
flip=False,
42+
transforms=[
43+
dict(type='Resize', keep_ratio=True),
44+
dict(type='ResizeToMultiple', size_divisor=32),
45+
dict(type='RandomFlip'),
46+
dict(type='Normalize', **img_norm_cfg),
47+
dict(type='ImageToTensor', keys=['img']),
48+
dict(type='Collect', keys=['img']),
49+
])
50+
]
51+
data = dict(
52+
samples_per_gpu=4,
53+
workers_per_gpu=4,
54+
train=dict(
55+
type='RepeatDataset',
56+
times=50,
57+
dataset=dict(
58+
type=dataset_type,
59+
data_root=data_root,
60+
img_dir='images/training',
61+
ann_dir='annotations/training',
62+
pipeline=train_pipeline)),
63+
val=dict(
64+
type=dataset_type,
65+
data_root=data_root,
66+
img_dir='images/validation',
67+
ann_dir='annotations/validation',
68+
pipeline=test_pipeline),
69+
test=dict(
70+
type=dataset_type,
71+
data_root=data_root,
72+
img_dir='images/validation',
73+
ann_dir='annotations/validation',
74+
pipeline=test_pipeline))
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
_base_ = './fpn_poolformer_s12_8x4_512x512_40k_ade20k.py'
2+
checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s24_3rdparty_32xb128_in1k_20220414-d7055904.pth' # noqa
3+
# model settings
4+
model = dict(
5+
backbone=dict(
6+
arch='s24',
7+
init_cfg=dict(
8+
type='Pretrained', checkpoint=checkpoint_file,
9+
prefix='backbone.')))
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
_base_ = './fpn_poolformer_s12_8x4_512x512_40k_ade20k.py'
2+
checkpoint_file = 'https://download.openmmlab.com/mmclassification/v0/poolformer/poolformer-s36_3rdparty_32xb128_in1k_20220414-d78ff3e8.pth' # noqa
3+
4+
# model settings
5+
model = dict(
6+
backbone=dict(
7+
arch='s36',
8+
init_cfg=dict(
9+
type='Pretrained', checkpoint=checkpoint_file,
10+
prefix='backbone.')))

configs/poolformer/poolformer.yml

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
Models:
2+
- Name: fpn_poolformer_s12_8x4_512x512_40k_ade20k
3+
In Collection: FPN
4+
Metadata:
5+
backbone: PoolFormer-S12
6+
crop size: (512,512)
7+
lr schd: 40000
8+
inference time (ms/im):
9+
- value: 42.59
10+
hardware: V100
11+
backend: PyTorch
12+
batch size: 1
13+
mode: FP32
14+
resolution: (512,512)
15+
Training Memory (GB): 4.17
16+
Results:
17+
- Task: Semantic Segmentation
18+
Dataset: ADE20K
19+
Metrics:
20+
mIoU: 36.0
21+
mIoU(ms+flip): 36.42
22+
Config: configs/poolformer/fpn_poolformer_s12_8x4_512x512_40k_ade20k.py
23+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s12_8x4_512x512_40k_ade20k/fpn_poolformer_s12_8x4_512x512_40k_ade20k_20220501_115154-b5aa2f49.pth
24+
- Name: fpn_poolformer_s24_8x4_512x512_40k_ade20k
25+
In Collection: FPN
26+
Metadata:
27+
backbone: PoolFormer-S24
28+
crop size: (512,512)
29+
lr schd: 40000
30+
inference time (ms/im):
31+
- value: 63.53
32+
hardware: V100
33+
backend: PyTorch
34+
batch size: 1
35+
mode: FP32
36+
resolution: (512,512)
37+
Training Memory (GB): 5.47
38+
Results:
39+
- Task: Semantic Segmentation
40+
Dataset: ADE20K
41+
Metrics:
42+
mIoU: 39.35
43+
mIoU(ms+flip): 39.73
44+
Config: configs/poolformer/fpn_poolformer_s24_8x4_512x512_40k_ade20k.py
45+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s24_8x4_512x512_40k_ade20k/fpn_poolformer_s24_8x4_512x512_40k_ade20k_20220503_222049-394a7cf7.pth
46+
- Name: fpn_poolformer_s36_8x4_512x512_40k_ade20k
47+
In Collection: FPN
48+
Metadata:
49+
backbone: PoolFormer-S36
50+
crop size: (512,512)
51+
lr schd: 40000
52+
inference time (ms/im):
53+
- value: 88.18
54+
hardware: V100
55+
backend: PyTorch
56+
batch size: 1
57+
mode: FP32
58+
resolution: (512,512)
59+
Training Memory (GB): 6.77
60+
Results:
61+
- Task: Semantic Segmentation
62+
Dataset: ADE20K
63+
Metrics:
64+
mIoU: 40.64
65+
mIoU(ms+flip): 40.99
66+
Config: configs/poolformer/fpn_poolformer_s36_8x4_512x512_40k_ade20k.py
67+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_s36_8x4_512x512_40k_ade20k/fpn_poolformer_s36_8x4_512x512_40k_ade20k_20220501_151122-b47e607d.pth
68+
- Name: fpn_poolformer_m36_8x4_512x512_40k_ade20k
69+
In Collection: FPN
70+
Metadata:
71+
backbone: PoolFormer-M36
72+
crop size: (512,512)
73+
lr schd: 40000
74+
inference time (ms/im):
75+
- value: 111.48
76+
hardware: V100
77+
backend: PyTorch
78+
batch size: 1
79+
mode: FP32
80+
resolution: (512,512)
81+
Training Memory (GB): 8.59
82+
Results:
83+
- Task: Semantic Segmentation
84+
Dataset: ADE20K
85+
Metrics:
86+
mIoU: 40.91
87+
mIoU(ms+flip): 41.28
88+
Config: configs/poolformer/fpn_poolformer_m36_8x4_512x512_40k_ade20k.py
89+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_m36_8x4_512x512_40k_ade20k/fpn_poolformer_m36_8x4_512x512_40k_ade20k_20220501_164230-3dc83921.pth
90+
- Name: fpn_poolformer_m48_8x4_512x512_40k_ade20k
91+
In Collection: FPN
92+
Metadata:
93+
backbone: PoolFormer-M48
94+
crop size: (512,512)
95+
lr schd: 40000
96+
inference time (ms/im):
97+
- value: 149.48
98+
hardware: V100
99+
backend: PyTorch
100+
batch size: 1
101+
mode: FP32
102+
resolution: (512,512)
103+
Training Memory (GB): 10.48
104+
Results:
105+
- Task: Semantic Segmentation
106+
Dataset: ADE20K
107+
Metrics:
108+
mIoU: 41.82
109+
mIoU(ms+flip): 42.2
110+
Config: configs/poolformer/fpn_poolformer_m48_8x4_512x512_40k_ade20k.py
111+
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/poolformer/fpn_poolformer_m48_8x4_512x512_40k_ade20k/fpn_poolformer_m48_8x4_512x512_40k_ade20k_20220504_003923-64168d3b.pth

model-index.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Import:
3030
- configs/nonlocal_net/nonlocal_net.yml
3131
- configs/ocrnet/ocrnet.yml
3232
- configs/point_rend/point_rend.yml
33+
- configs/poolformer/poolformer.yml
3334
- configs/psanet/psanet.yml
3435
- configs/pspnet/pspnet.yml
3536
- configs/resnest/resnest.yml

0 commit comments

Comments
 (0)