|
1 | | -<div align="center"> |
2 | | - <img src="resources/mmseg-logo.png" width="600"/> |
3 | | - <div> </div> |
4 | | - <div align="center"> |
5 | | - <b><font size="5">OpenMMLab website</font></b> |
6 | | - <sup> |
7 | | - <a href="https://openmmlab.com"> |
8 | | - <i><font size="4">HOT</font></i> |
9 | | - </a> |
10 | | - </sup> |
11 | | - |
12 | | - <b><font size="5">OpenMMLab platform</font></b> |
13 | | - <sup> |
14 | | - <a href="https://platform.openmmlab.com"> |
15 | | - <i><font size="4">TRY IT OUT</font></i> |
16 | | - </a> |
17 | | - </sup> |
18 | | - </div> |
19 | | - <div> </div> |
20 | | -</div> |
21 | | -<br /> |
22 | | - |
23 | | -[](https://pypi.org/project/mmsegmentation/) |
24 | | -[](https://pypi.org/project/mmsegmentation) |
25 | | -[](https://mmsegmentation.readthedocs.io/en/latest/) |
26 | | -[](https://github.com/open-mmlab/mmsegmentation/actions) |
27 | | -[](https://codecov.io/gh/open-mmlab/mmsegmentation) |
28 | | -[](https://github.com/open-mmlab/mmsegmentation/blob/master/LICENSE) |
29 | | -[](https://github.com/open-mmlab/mmsegmentation/issues) |
30 | | -[](https://github.com/open-mmlab/mmsegmentation/issues) |
31 | | - |
32 | | -Documentation: https://mmsegmentation.readthedocs.io/ |
33 | | - |
34 | | -English | [简体中文](README_zh-CN.md) |
35 | | - |
36 | | -## Introduction |
37 | | - |
38 | | -MMSegmentation is an open source semantic segmentation toolbox based on PyTorch. |
39 | | -It is a part of the OpenMMLab project. |
40 | | - |
41 | | -The master branch works with **PyTorch 1.5+**. |
42 | | - |
43 | | - |
44 | | - |
45 | | -### Major features |
46 | | - |
47 | | -- **Unified Benchmark** |
48 | | - |
49 | | - We provide a unified benchmark toolbox for various semantic segmentation methods. |
50 | | - |
51 | | -- **Modular Design** |
52 | | - |
53 | | - We decompose the semantic segmentation framework into different components and one can easily construct a customized semantic segmentation framework by combining different modules. |
54 | | - |
55 | | -- **Support of multiple methods out of box** |
56 | | - |
57 | | - The toolbox directly supports popular and contemporary semantic segmentation frameworks, *e.g.* PSPNet, DeepLabV3, PSANet, DeepLabV3+, etc. |
58 | | - |
59 | | -- **High efficiency** |
60 | | - |
61 | | - The training speed is faster than or comparable to other codebases. |
62 | | - |
63 | | -## License |
64 | | - |
65 | | -This project is released under the [Apache 2.0 license](LICENSE). |
66 | | - |
67 | | -## Changelog |
68 | | - |
69 | | -v0.20.2 was released in 12/15/2021. |
70 | | -Please refer to [changelog.md](docs/en/changelog.md) for details and release history. |
71 | | - |
72 | | -## Benchmark and model zoo |
73 | | - |
74 | | -Results and models are available in the [model zoo](docs/en/model_zoo.md). |
75 | | - |
76 | | -Supported backbones: |
77 | | - |
78 | | -- [x] ResNet (CVPR'2016) |
79 | | -- [x] ResNeXt (CVPR'2017) |
80 | | -- [x] [HRNet (CVPR'2019)](configs/hrnet) |
81 | | -- [x] [ResNeSt (ArXiv'2020)](configs/resnest) |
82 | | -- [x] [MobileNetV2 (CVPR'2018)](configs/mobilenet_v2) |
83 | | -- [x] [MobileNetV3 (ICCV'2019)](configs/mobilenet_v3) |
84 | | -- [x] [Vision Transformer (ICLR'2021)](configs/vit) |
85 | | -- [x] [Swin Transformer (ICCV'2021)](configs/swin) |
86 | | -- [x] [Twins (NeurIPS'2021)](configs/twins) |
87 | | - |
88 | | -Supported methods: |
89 | | - |
90 | | -- [x] [FCN (CVPR'2015/TPAMI'2017)](configs/fcn) |
91 | | -- [x] [ERFNet (T-ITS'2017)](configs/erfnet) |
92 | | -- [x] [UNet (MICCAI'2016/Nat. Methods'2019)](configs/unet) |
93 | | -- [x] [PSPNet (CVPR'2017)](configs/pspnet) |
94 | | -- [x] [DeepLabV3 (ArXiv'2017)](configs/deeplabv3) |
95 | | -- [x] [BiSeNetV1 (ECCV'2018)](configs/bisenetv1) |
96 | | -- [x] [PSANet (ECCV'2018)](configs/psanet) |
97 | | -- [x] [DeepLabV3+ (CVPR'2018)](configs/deeplabv3plus) |
98 | | -- [x] [UPerNet (ECCV'2018)](configs/upernet) |
99 | | -- [x] [ICNet (ECCV'2018)](configs/icnet) |
100 | | -- [x] [NonLocal Net (CVPR'2018)](configs/nonlocal_net) |
101 | | -- [x] [EncNet (CVPR'2018)](configs/encnet) |
102 | | -- [x] [Semantic FPN (CVPR'2019)](configs/sem_fpn) |
103 | | -- [x] [DANet (CVPR'2019)](configs/danet) |
104 | | -- [x] [APCNet (CVPR'2019)](configs/apcnet) |
105 | | -- [x] [EMANet (ICCV'2019)](configs/emanet) |
106 | | -- [x] [CCNet (ICCV'2019)](configs/ccnet) |
107 | | -- [x] [DMNet (ICCV'2019)](configs/dmnet) |
108 | | -- [x] [ANN (ICCV'2019)](configs/ann) |
109 | | -- [x] [GCNet (ICCVW'2019/TPAMI'2020)](configs/gcnet) |
110 | | -- [x] [FastFCN (ArXiv'2019)](configs/fastfcn) |
111 | | -- [x] [Fast-SCNN (ArXiv'2019)](configs/fastscnn) |
112 | | -- [x] [ISANet (ArXiv'2019/IJCV'2021)](configs/isanet) |
113 | | -- [x] [OCRNet (ECCV'2020)](configs/ocrnet) |
114 | | -- [x] [DNLNet (ECCV'2020)](configs/dnlnet) |
115 | | -- [x] [PointRend (CVPR'2020)](configs/point_rend) |
116 | | -- [x] [CGNet (TIP'2020)](configs/cgnet) |
117 | | -- [x] [BiSeNetV2 (IJCV'2021)](configs/bisenetv2) |
118 | | -- [x] [STDC (CVPR'2021)](configs/stdc) |
119 | | -- [x] [SETR (CVPR'2021)](configs/setr) |
120 | | -- [x] [DPT (ArXiv'2021)](configs/dpt) |
121 | | -- [x] [SegFormer (NeurIPS'2021)](configs/segformer) |
122 | | -- [x] [MaskCLIP](configs/maskclip) |
123 | | -- [x] [MaskCLIP+](configs/maskclip_plus) |
| 1 | +# Extract Free Dense Labels from CLIP |
| 2 | +``` |
| 3 | + ███╗ ███╗ █████╗ ███████╗██╗ ██╗ ██████╗██╗ ██╗██████╗ |
| 4 | + ████╗ ████║██╔══██╗██╔════╝██║ ██╔╝██╔════╝██║ ██║██╔══██╗ |
| 5 | + ██╔████╔██║███████║███████╗█████╔╝ ██║ ██║ ██║██████╔╝ |
| 6 | + ██║╚██╔╝██║██╔══██║╚════██║██╔═██╗ ██║ ██║ ██║██╔═══╝ |
| 7 | + ██║ ╚═╝ ██║██║ ██║███████║██║ ██╗╚██████╗███████╗██║██║ |
| 8 | + ╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝ ╚═════╝╚══════╝╚═╝╚═╝ |
| 9 | +``` |
| 10 | +This is the code for our paper: [Extract Free Dense Labels from CLIP](https://arxiv.org/abs/2112.01071). |
| 11 | + |
| 12 | +This repo is a fork of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation). So the installation and data preparation is pretty similar. |
| 13 | + |
| 14 | +# Installation |
| 15 | +**Step 0.** Install PyTorch and Torchvision following [official instructions](https://pytorch.org/get-started/locally/), e.g., |
| 16 | + |
| 17 | +```shell |
| 18 | +pip install torch torchvision |
| 19 | +# FYI, we're using torch==1.9.1 and torchvision==0.10.1 |
| 20 | +``` |
| 21 | + |
| 22 | +**Step 1.** Install [MMCV](https://github.com/open-mmlab/mmcv) using [MIM](https://github.com/open-mmlab/mim). |
| 23 | +```shell |
| 24 | +pip install -U openmim |
| 25 | +mim install mmcv-full |
| 26 | +``` |
| 27 | + |
| 28 | +**Step 2.** Install [CLIP](https://github.com/openai/CLIP). |
| 29 | +```shell |
| 30 | +pip install ftfy regex tqdm |
| 31 | +pip install git+https://github.com/openai/CLIP.git |
| 32 | +``` |
| 33 | + |
| 34 | +**Step 3.** Install MaskCLIP. |
| 35 | +```shell |
| 36 | +git clone https://github.com/chongzhou96/MaskCLIP.git |
| 37 | +cd MaskCLIP |
| 38 | +pip install -v -e . |
| 39 | +# "-v" means verbose, or more output |
| 40 | +# "-e" means installing a project in editable mode, |
| 41 | +# thus any local modifications made to the code will take effect without reinstallation. |
| 42 | +``` |
| 43 | + |
| 44 | +# Dataset Preparation |
| 45 | +Please refer to [dataset_prepare.md](docs/en/dataset_prepare.md#prepare-datasets). In our paper, we experiment with [Pascal VOC](docs/en/dataset_prepare.md#pascal-voc), [Pascal Context](docs/en/dataset_prepare.md#pascal-context), and [COCO Stuff 164k](docs/en/dataset_prepare.md#coco-stuff-164k). |
| 46 | + |
| 47 | +# MaskCLIP |
| 48 | +MaskCLIP doesn't require any training. We only need to (1) download and convert the CLIP model and (2) prepare the text embeddings of the objects of interest. |
124 | 49 |
|
125 | | -Supported datasets: |
| 50 | +**Step 0.** Download and convert the CLIP models, e.g., |
| 51 | +```shell |
| 52 | +mkdir -p pretrain |
| 53 | +python tools/maskclip_utils/convert_clip_weights.py --model ViT16 --backbone |
| 54 | +# Other options for model: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT32, ViT16, ViT14 |
| 55 | +``` |
| 56 | + |
| 57 | +**Step 1.** Prepare the text embeddings of the objects of interest, e.g., |
| 58 | +```shell |
| 59 | +python tools/maskclip_utils/prompt_engineering.py --model ViT16 --class-set context |
| 60 | +# Other options for model: RN50, RN101, RN50x4, RN50x16, ViT32, ViT16 |
| 61 | +# Other options for class-set: voc, context, stuff |
| 62 | +# Actually, we've played around with many more interesting target classes. (See prompt_engineering.py) |
| 63 | +``` |
| 64 | + |
| 65 | +**Step 2.** Get quantitative results (mIoU): |
| 66 | +```shell |
| 67 | +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --eval mIoU |
| 68 | +# e.g., python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --eval mIoU |
| 69 | +``` |
| 70 | + |
| 71 | +**Step 3. (optional)** Get qualitative results: |
| 72 | +```shell |
| 73 | +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show-dir ${OUTPUT_DIR} |
| 74 | +# e.g., python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --show-dir output/ |
| 75 | +``` |
126 | 76 |
|
127 | | -- [x] [Cityscapes](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#cityscapes) |
128 | | -- [x] [PASCAL VOC](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#pascal-voc) |
129 | | -- [x] [ADE20K](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#ade20k) |
130 | | -- [x] [Pascal Context](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#pascal-context) |
131 | | -- [x] [COCO-Stuff 10k](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#coco-stuff-10k) |
132 | | -- [x] [COCO-Stuff 164k](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#coco-stuff-164k) |
133 | | -- [x] [CHASE_DB1](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#chase-db1) |
134 | | -- [x] [DRIVE](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#drive) |
135 | | -- [x] [HRF](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#hrf) |
136 | | -- [x] [STARE](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#stare) |
137 | | -- [x] [Dark Zurich](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#dark-zurich) |
138 | | -- [x] [Nighttime Driving](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#nighttime-driving) |
139 | | -- [x] [LoveDA](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#loveda) |
| 77 | +# MaskCLIP+ |
| 78 | +MaskCLIP+ trains another segmentation model with pseudo labels extracted from MaskCLIP. |
140 | 79 |
|
141 | | -## Installation |
| 80 | +**Step 0.** Download and convert the CLIP models, e.g., |
| 81 | +```shell |
| 82 | +mkdir -p pretrain |
| 83 | +python tools/maskclip_utils/convert_clip_weights.py --model ViT16 |
| 84 | +# Other options for model: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT32, ViT16, ViT14 |
| 85 | +``` |
| 86 | + |
| 87 | +**Step 1.** Prepare the text embeddings of the target dataset, e.g., |
| 88 | +```shell |
| 89 | +python tools/maskclip_utils/prompt_engineering.py --model ViT16 --class-set context |
| 90 | +# Other options for model: RN50, RN101, RN50x4, RN50x16, ViT32, ViT16 |
| 91 | +# Other options for class-set: voc, context, stuff |
| 92 | +``` |
142 | 93 |
|
143 | | -Please refer to [get_started.md](docs/en/get_started.md#installation) for installation and [dataset_prepare.md](docs/en/dataset_prepare.md#prepare-datasets) for dataset preparation. |
| 94 | +**Train.** Depending on your setup (single/mutiple GPU(s), multiple machines), the training script can be different. Here, we give an example of multiple GPUs on a single machine. For more infomation, please refer to [train.md](docs/en/train.md). |
| 95 | +```shell |
| 96 | +sh tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} |
| 97 | +# e.g., sh tools/dist_train.sh configs/maskclip_plus/zero_shot/maskclip_plus_r50_deeplabv3plus_r101-d8_480x480_40k_pascal_context.py 4 |
| 98 | +``` |
144 | 99 |
|
145 | | -## Get Started |
146 | | - |
147 | | -Please see [train.md](docs/en/train.md) and [inference.md](docs/en/inference.md) for the basic usage of MMSegmentation. |
148 | | -There are also tutorials for [customizing dataset](docs/en/tutorials/customize_datasets.md), [designing data pipeline](docs/en/tutorials/data_pipeline.md), [customizing modules](docs/en/tutorials/customize_models.md), and [customizing runtime](docs/en/tutorials/customize_runtime.md). |
149 | | -We also provide many [training tricks](docs/en/tutorials/training_tricks.md) for better training and [useful tools](docs/en/useful_tools.md) for deployment. |
150 | | - |
151 | | -A Colab tutorial is also provided. You may preview the notebook [here](demo/MMSegmentation_Tutorial.ipynb) or directly [run](https://colab.research.google.com/github/open-mmlab/mmsegmentation/blob/master/demo/MMSegmentation_Tutorial.ipynb) on Colab. |
152 | | - |
153 | | -## Citation |
154 | | - |
155 | | -If you find this project useful in your research, please consider cite: |
156 | | - |
157 | | -```latex |
158 | | -@misc{mmseg2020, |
159 | | - title={{MMSegmentation}: OpenMMLab Semantic Segmentation Toolbox and Benchmark}, |
160 | | - author={MMSegmentation Contributors}, |
161 | | - howpublished = {\url{https://github.com/open-mmlab/mmsegmentation}}, |
162 | | - year={2020} |
| 100 | +**Inference.** See step 2 and step 3 under the MaskCLIP section. (We will release the trained models soon.) |
| 101 | + |
| 102 | + |
| 103 | +# Citation |
| 104 | +If you use MaskCLIP or this code base in your work, please cite |
| 105 | +``` |
| 106 | +@InProceedings{zhou2022maskclip, |
| 107 | + author = {Zhou, Chong and Loy, Chen Change and Dai, Bo}, |
| 108 | + title = {Extract Free Dense Labels from CLIP}, |
| 109 | + booktitle = {European Conference on Computer Vision (ECCV)}, |
| 110 | + year = {2022} |
163 | 111 | } |
164 | 112 | ``` |
165 | 113 |
|
166 | | -## Contributing |
167 | | - |
168 | | -We appreciate all contributions to improve MMSegmentation. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline. |
169 | | - |
170 | | -## Acknowledgement |
171 | | - |
172 | | -MMSegmentation is an open source project that welcome any contribution and feedback. |
173 | | -We wish that the toolbox and benchmark could serve the growing research |
174 | | -community by providing a flexible as well as standardized toolkit to reimplement existing methods |
175 | | -and develop their own new semantic segmentation methods. |
176 | | - |
177 | | -## Projects in OpenMMLab |
178 | | - |
179 | | -- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision. |
180 | | -- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark. |
181 | | -- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark. |
182 | | -- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection. |
183 | | -- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark. |
184 | | -- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark. |
185 | | -- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark. |
186 | | -- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark. |
187 | | -- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox. |
188 | | -- [MMOCR](https://github.com/open-mmlab/mmocr): A Comprehensive Toolbox for Text Detection, Recognition and Understanding. |
189 | | -- [MMGeneration](https://github.com/open-mmlab/mmgeneration): A powerful toolkit for generative models. |
190 | | -- [MIM](https://github.com/open-mmlab/mim): MIM Installs OpenMMLab Packages. |
191 | | -- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark. |
192 | | -- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab few shot learning toolbox and benchmark. |
193 | | -- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark. |
194 | | -- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark. |
195 | | -- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab Model Compression Toolbox and Benchmark. |
196 | | -- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab Model Deployment Framework. |
| 114 | +# Contact |
| 115 | +For questions about our paper or code, please contact [Chong Zhou ](mailto:[email protected]). |
0 commit comments