open-mmlab
diff --git a/‎README.md
Lines changed: 1 addition & 0 deletions b/‎README.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎README_zh-CN.md
Lines changed: 1 addition & 0 deletions b/‎README_zh-CN.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎configs/segnext/README.md
Lines changed: 63 additions & 0 deletions b/‎configs/segnext/README.md
Lines changed: 63 additions & 0 deletions
diff --git a/‎configs/segnext/segnext.yml
Lines changed: 103 additions & 0 deletions b/‎configs/segnext/segnext.yml
Lines changed: 103 additions & 0 deletions
diff --git a/‎configs/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 26 additions & 0 deletions b/‎configs/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 26 additions & 0 deletions
diff --git a/‎configs/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 26 additions & 0 deletions b/‎configs/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 26 additions & 0 deletions
diff --git a/‎configs/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 26 additions & 0 deletions b/‎configs/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 26 additions & 0 deletions
diff --git a/‎configs/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 125 additions & 0 deletions b/‎configs/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py
Lines changed: 125 additions & 0 deletions
@@ -145,6 +145,7 @@ Supported backbones:
 - [x] [ConvNeXt (CVPR'2022)](configs/convnext)
 - [x] [MAE (CVPR'2022)](configs/mae)
 - [x] [PoolFormer (CVPR'2022)](configs/poolformer)
+- [x] [SegNeXt (NeurIPS'2022)](configs/segnext)
 
 Supported methods:
 
 
@@ -128,6 +128,7 @@ MMSegmentation 是一个基于 PyTorch 的语义分割开源工具箱。它是 O
 - [x] [ConvNeXt (CVPR'2022)](configs/convnext)
 - [x] [MAE (CVPR'2022)](configs/mae)
 - [x] [PoolFormer (CVPR'2022)](configs/poolformer)
+- [x] [SegNeXt (NeurIPS'2022)](configs/segnext)
 
 已支持的算法：
 
 
@@ -0,0 +1,63 @@
+# SegNeXt
+
+[SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation](https://arxiv.org/abs/2209.08575)
+
+## Introduction
+
+<!-- [ALGORITHM] -->
+
+<a href="https://github.com/visual-attention-network/segnext">Official Repo</a>
+
+<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.31.0/mmseg/models/backbones/mscan.py#L328">Code Snippet</a>
+
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of semantic segmentation due to the efficiency of self-attention in encoding spatial information. In this paper, we show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers. By re-examining the characteristics owned by successful segmentation models, we discover several key components leading to the performance improvement of segmentation models. This motivates us to design a novel convolutional attention network that uses cheap convolutional operations. Without bells and whistles, our SegNeXt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and iSAID. Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90.6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, SegNeXt achieves about 2.0% mIoU improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations. Code is available at [this https URL](https://github.com/uyzhang/JSeg) (Jittor) and [this https URL](https://github.com/Visual-Attention-Network/SegNeXt) (Pytorch).
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/24582831/215688018-5d4c8366-7793-4fdf-9397-960a09fac951.png" width="70%"/>
+</div>
+
+```bibtex
+@article{guo2022segnext,
+  title={SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation},
+  author={Guo, Meng-Hao and Lu, Cheng-Ze and Hou, Qibin and Liu, Zhengning and Cheng, Ming-Ming and Hu, Shi-Min},
+  journal={arXiv preprint arXiv:2209.08575},
+  year={2022}
+}
+```
+
+## Pretrained model
+
+The pretrained model could be found [here](https://cloud.tsinghua.edu.cn/d/c15b25a6745946618462/) from [original repo](https://github.com/Visual-Attention-Network/SegNeXt). You can download and put them in `./pretrain` folder.
+
+## Results and models
+
+### ADE20K
+
+| Method  | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU  | mIoU(ms+flip) | config                                                                                                                               | download                                                                                                                                                                                                                                                                                                                                                                                   |
+| ------- | -------- | --------- | ------- | -------- | -------------- | ----- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| SegNeXt | MSCAN-T  | 512x512   | 160000  | 17.88    | 52.38          | 41.50 | 42.59         | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k_20230210_140244-05bd8466.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k_20230210_140244.log.json) |
+| SegNeXt | MSCAN-S  | 512x512   | 160000  | 21.47    | 42.27          | 44.16 | 45.81         | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k_20230214_113014-43013668.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k_20230214_113014.log.json) |
+| SegNeXt | MSCAN-B  | 512x512   | 160000  | 31.03    | 35.15          | 48.03 | 49.68         | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k_20230209_172053-b6f6c70c.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k_20230209_172053.log.json) |
+| SegNeXt | MSCAN-L  | 512x512   | 160000  | 43.32    | 22.91          | 50.99 | 52.10         | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k_20230209_172055-19b14b63.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k_20230209_172055.log.json) |
+
+Note:
+
+- The total batch size is 16. We trained for SegNeXt with a single GPU as the performance degrades significantly when using`SyncBN` (mainly in `OverlapPatchEmbed` modules of `MSCAN`) of PyTorch 1.9.
+
+- There will be subtle differences when model testing as Non-negative Matrix Factorization (NMF) in `LightHamHead` will be initialized randomly. To control this randomness, please set the random seed when model testing. You can modify [`./tools/test.py`](https://github.com/open-mmlab/mmsegmentation/blob/master/tools/test.py) like:
+
+```python
+def main():
+    from mmseg.apis import set_random_seed
+    random_seed = xxx # set random seed recorded in training log
+    set_random_seed(random_seed, deterministic=False)
+    ...
+```
+
+- This model performance is sensitive to the seed values used, please refer to the log file for the specific settings of the seed. If you choose a different seed, the results might differ from the table results. Take SegNeXt Large for example, its results range from 49.60 to 51.0.
@@ -0,0 +1,103 @@
+Collections:
+- Name: SegNeXt
+  Metadata:
+    Training Data:
+    - ADE20K
+  Paper:
+    URL: https://arxiv.org/abs/2209.08575
+    Title: 'SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation'
+  README: configs/segnext/README.md
+  Code:
+    URL: https://github.com/open-mmlab/mmsegmentation/blob/v0.31.0/mmseg/models/backbones/mscan.py#L328
+    Version: v0.31.0
+  Converted From:
+    Code: https://github.com/visual-attention-network/segnext
+Models:
+- Name: segnext_mscan-t_1x16_512x512_adamw_160k_ade20k
+  In Collection: SegNeXt
+  Metadata:
+    backbone: MSCAN-T
+    crop size: (512,512)
+    lr schd: 160000
+    inference time (ms/im):
+    - value: 19.09
+      hardware: A100
+      backend: PyTorch
+      batch size: 1
+      mode: FP32
+      resolution: (512,512)
+    Training Memory (GB): 17.88
+  Results:
+  - Task: Semantic Segmentation
+    Dataset: ADE20K
+    Metrics:
+      mIoU: 41.5
+      mIoU(ms+flip): 42.59
+  Config: configs/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py
+  Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k/segnext_mscan-t_1x16_512x512_adamw_160k_ade20k_20230210_140244-05bd8466.pth
+- Name: segnext_mscan-s_1x16_512x512_adamw_160k_ade20k
+  In Collection: SegNeXt
+  Metadata:
+    backbone: MSCAN-S
+    crop size: (512,512)
+    lr schd: 160000
+    inference time (ms/im):
+    - value: 23.66
+      hardware: A100
+      backend: PyTorch
+      batch size: 1
+      mode: FP32
+      resolution: (512,512)
+    Training Memory (GB): 21.47
+  Results:
+  - Task: Semantic Segmentation
+    Dataset: ADE20K
+    Metrics:
+      mIoU: 44.16
+      mIoU(ms+flip): 45.81
+  Config: configs/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k.py
+  Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k/segnext_mscan-s_1x16_512x512_adamw_160k_ade20k_20230214_113014-43013668.pth
+- Name: segnext_mscan-b_1x16_512x512_adamw_160k_ade20k
+  In Collection: SegNeXt
+  Metadata:
+    backbone: MSCAN-B
+    crop size: (512,512)
+    lr schd: 160000
+    inference time (ms/im):
+    - value: 28.45
+      hardware: A100
+      backend: PyTorch
+      batch size: 1
+      mode: FP32
+      resolution: (512,512)
+    Training Memory (GB): 31.03
+  Results:
+  - Task: Semantic Segmentation
+    Dataset: ADE20K
+    Metrics:
+      mIoU: 48.03
+      mIoU(ms+flip): 49.68
+  Config: configs/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k.py
+  Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k/segnext_mscan-b_1x16_512x512_adamw_160k_ade20k_20230209_172053-b6f6c70c.pth
+- Name: segnext_mscan-l_1x16_512x512_adamw_160k_ade20k
+  In Collection: SegNeXt
+  Metadata:
+    backbone: MSCAN-L
+    crop size: (512,512)
+    lr schd: 160000
+    inference time (ms/im):
+    - value: 43.65
+      hardware: A100
+      backend: PyTorch
+      batch size: 1
+      mode: FP32
+      resolution: (512,512)
+    Training Memory (GB): 43.32
+  Results:
+  - Task: Semantic Segmentation
+    Dataset: ADE20K
+    Metrics:
+      mIoU: 50.99
+      mIoU(ms+flip): 52.1
+  Config: configs/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k.py
+  Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segnext/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k/segnext_mscan-l_1x16_512x512_adamw_160k_ade20k_20230209_172055-19b14b63.pth
@@ -0,0 +1,26 @@
+_base_ = './segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py'
+# model settings
+ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
+model = dict(
+    type='EncoderDecoder',
+    backbone=dict(
+        embed_dims=[64, 128, 320, 512],
+        depths=[3, 3, 12, 3],
+        init_cfg=dict(type='Pretrained', checkpoint='pretrain/mscan_b.pth'),
+        drop_path_rate=0.1,
+        norm_cfg=dict(type='BN', requires_grad=True)),
+    decode_head=dict(
+        type='LightHamHead',
+        in_channels=[128, 320, 512],
+        in_index=[1, 2, 3],
+        channels=512,
+        ham_channels=512,
+        dropout_ratio=0.1,
+        num_classes=150,
+        norm_cfg=ham_norm_cfg,
+        align_corners=False,
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    # model training and testing settings
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole'))
@@ -0,0 +1,26 @@
+_base_ = './segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py'
+# model settings
+ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
+model = dict(
+    type='EncoderDecoder',
+    backbone=dict(
+        embed_dims=[64, 128, 320, 512],
+        depths=[3, 5, 27, 3],
+        init_cfg=dict(type='Pretrained', checkpoint='pretrain/mscan_l.pth'),
+        drop_path_rate=0.3,
+        norm_cfg=dict(type='BN', requires_grad=True)),
+    decode_head=dict(
+        type='LightHamHead',
+        in_channels=[128, 320, 512],
+        in_index=[1, 2, 3],
+        channels=1024,
+        ham_channels=1024,
+        dropout_ratio=0.1,
+        num_classes=150,
+        norm_cfg=ham_norm_cfg,
+        align_corners=False,
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    # model training and testing settings
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole'))
@@ -0,0 +1,26 @@
+_base_ = './segnext_mscan-t_1x16_512x512_adamw_160k_ade20k.py'
+# model settings
+ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
+model = dict(
+    type='EncoderDecoder',
+    backbone=dict(
+        embed_dims=[64, 128, 320, 512],
+        depths=[2, 2, 4, 2],
+        init_cfg=dict(type='Pretrained', checkpoint='./pretrain/mscan_s.pth'),
+        norm_cfg=dict(type='BN', requires_grad=True)),
+    decode_head=dict(
+        type='LightHamHead',
+        in_channels=[128, 320, 512],
+        in_index=[1, 2, 3],
+        channels=256,
+        ham_channels=256,
+        ham_kwargs=dict(MD_R=16),
+        dropout_ratio=0.1,
+        num_classes=150,
+        norm_cfg=ham_norm_cfg,
+        align_corners=False,
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
+    # model training and testing settings
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole'))
@@ -0,0 +1,125 @@
+_base_ = [
+    '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
+]
+# model settings
+ham_norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
+model = dict(
+    type='EncoderDecoder',
+    pretrained=None,
+    backbone=dict(
+        type='MSCAN',
+        init_cfg=dict(type='Pretrained', checkpoint='./pretrain/mscan_t.pth'),
+        embed_dims=[32, 64, 160, 256],
+        mlp_ratios=[8, 8, 4, 4],
+        drop_rate=0.0,
+        drop_path_rate=0.1,
+        depths=[3, 3, 5, 2],
+        attention_kernel_sizes=[5, [1, 7], [1, 11], [1, 21]],
+        attention_kernel_paddings=[2, [0, 3], [0, 5], [0, 10]],
+        act_cfg=dict(type='GELU'),
+        norm_cfg=dict(type='BN', requires_grad=True)),
+    decode_head=dict(
+        type='LightHamHead',
+        in_channels=[64, 160, 256],
+        in_index=[1, 2, 3],
+        channels=256,
+        ham_channels=256,
+        dropout_ratio=0.1,
+        num_classes=150,
+        norm_cfg=ham_norm_cfg,
+        align_corners=False,
+        loss_decode=dict(
+            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
+        ham_kwargs=dict(
+            MD_S=1,
+            MD_R=16,
+            train_steps=6,
+            eval_steps=7,
+            inv_t=100,
+            rand_init=True)),
+    # model training and testing settings
+    train_cfg=dict(),
+    test_cfg=dict(mode='whole'))
+
+# dataset settings
+dataset_type = 'ADE20KDataset'
+data_root = 'data/ade/ADEChallengeData2016'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+crop_size = (512, 512)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations', reduce_zero_label=True),
+    dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
+    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
+    dict(type='RandomFlip', prob=0.5),
+    dict(type='PhotoMetricDistortion'),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(2048, 512),
+        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='ResizeToMultiple', size_divisor=32),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    samples_per_gpu=16,
+    workers_per_gpu=4,
+    train=dict(
+        type='RepeatDataset',
+        times=50,
+        dataset=dict(
+            type=dataset_type,
+            data_root=data_root,
+            img_dir='images/training',
+            ann_dir='annotations/training',
+            pipeline=train_pipeline)),
+    val=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='images/validation',
+        ann_dir='annotations/validation',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        data_root=data_root,
+        img_dir='images/validation',
+        ann_dir='annotations/validation',
+        pipeline=test_pipeline))
+
+# optimizer
+optimizer = dict(
+    _delete_=True,
+    type='AdamW',
+    lr=0.00006,
+    betas=(0.9, 0.999),
+    weight_decay=0.01,
+    paramwise_cfg=dict(
+        custom_keys={
+            'pos_block': dict(decay_mult=0.),
+            'norm': dict(decay_mult=0.),
+            'head': dict(lr_mult=10.)
+        }))
+
+lr_config = dict(
+    _delete_=True,
+    policy='poly',
+    warmup='linear',
+    warmup_iters=1500,
+    warmup_ratio=1e-6,
+    power=1.0,
+    min_lr=0.0,
+    by_epoch=False)