add readme

chongzhou96 · chongzhou96 · commit 16013dd0e409 · 2022-07-17T16:43:17.000+08:00
diff --git a/README.md b/README.md
@@ -1,196 +1,115 @@
-<div align="center">
-  <img src="resources/mmseg-logo.png" width="600"/>
-  <div>&nbsp;</div>
-  <div align="center">
-    <b><font size="5">OpenMMLab website</font></b>
-    <sup>
-      <a href="https://openmmlab.com">
-        <i><font size="4">HOT</font></i>
-      </a>
-    </sup>
-    &nbsp;&nbsp;&nbsp;&nbsp;
-    <b><font size="5">OpenMMLab platform</font></b>
-    <sup>
-      <a href="https://platform.openmmlab.com">
-        <i><font size="4">TRY IT OUT</font></i>
-      </a>
-    </sup>
-  </div>
-  <div>&nbsp;</div>
-</div>
-<br />
-
-[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mmsegmentation)](https://pypi.org/project/mmsegmentation/)
-[![PyPI](https://img.shields.io/pypi/v/mmsegmentation)](https://pypi.org/project/mmsegmentation)
-[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mmsegmentation.readthedocs.io/en/latest/)
-[![badge](https://github.com/open-mmlab/mmsegmentation/workflows/build/badge.svg)](https://github.com/open-mmlab/mmsegmentation/actions)
-[![codecov](https://codecov.io/gh/open-mmlab/mmsegmentation/branch/master/graph/badge.svg)](https://codecov.io/gh/open-mmlab/mmsegmentation)
-[![license](https://img.shields.io/github/license/open-mmlab/mmsegmentation.svg)](https://github.com/open-mmlab/mmsegmentation/blob/master/LICENSE)
-[![issue resolution](https://isitmaintained.com/badge/resolution/open-mmlab/mmsegmentation.svg)](https://github.com/open-mmlab/mmsegmentation/issues)
-[![open issues](https://isitmaintained.com/badge/open/open-mmlab/mmsegmentation.svg)](https://github.com/open-mmlab/mmsegmentation/issues)
-
-Documentation: https://mmsegmentation.readthedocs.io/
-
-English | [简体中文](README_zh-CN.md)
-
-## Introduction
-
-MMSegmentation is an open source semantic segmentation toolbox based on PyTorch.
-It is a part of the OpenMMLab project.
-
-The master branch works with **PyTorch 1.5+**.
-
-![demo image](resources/seg_demo.gif)
-
-### Major features
-
-- **Unified Benchmark**
-
-  We provide a unified benchmark toolbox for various semantic segmentation methods.
-
-- **Modular Design**
-
-  We decompose the semantic segmentation framework into different components and one can easily construct a customized semantic segmentation framework by combining different modules.
-
-- **Support of multiple methods out of box**
-
-  The toolbox directly supports popular and contemporary semantic segmentation frameworks, *e.g.* PSPNet, DeepLabV3, PSANet, DeepLabV3+, etc.
-
-- **High efficiency**
-
-  The training speed is faster than or comparable to other codebases.
-
-## License
-
-This project is released under the [Apache 2.0 license](LICENSE).
-
-## Changelog
-
-v0.20.2 was released in 12/15/2021.
-Please refer to [changelog.md](docs/en/changelog.md) for details and release history.
-
-## Benchmark and model zoo
-
-Results and models are available in the [model zoo](docs/en/model_zoo.md).
-
-Supported backbones:
-
-- [x] ResNet (CVPR'2016)
-- [x] ResNeXt (CVPR'2017)
-- [x] [HRNet (CVPR'2019)](configs/hrnet)
-- [x] [ResNeSt (ArXiv'2020)](configs/resnest)
-- [x] [MobileNetV2 (CVPR'2018)](configs/mobilenet_v2)
-- [x] [MobileNetV3 (ICCV'2019)](configs/mobilenet_v3)
-- [x] [Vision Transformer (ICLR'2021)](configs/vit)
-- [x] [Swin Transformer (ICCV'2021)](configs/swin)
-- [x] [Twins (NeurIPS'2021)](configs/twins)
-
-Supported methods:
-
-- [x] [FCN (CVPR'2015/TPAMI'2017)](configs/fcn)
-- [x] [ERFNet (T-ITS'2017)](configs/erfnet)
-- [x] [UNet (MICCAI'2016/Nat. Methods'2019)](configs/unet)
-- [x] [PSPNet (CVPR'2017)](configs/pspnet)
-- [x] [DeepLabV3 (ArXiv'2017)](configs/deeplabv3)
-- [x] [BiSeNetV1 (ECCV'2018)](configs/bisenetv1)
-- [x] [PSANet (ECCV'2018)](configs/psanet)
-- [x] [DeepLabV3+ (CVPR'2018)](configs/deeplabv3plus)
-- [x] [UPerNet (ECCV'2018)](configs/upernet)
-- [x] [ICNet (ECCV'2018)](configs/icnet)
-- [x] [NonLocal Net (CVPR'2018)](configs/nonlocal_net)
-- [x] [EncNet (CVPR'2018)](configs/encnet)
-- [x] [Semantic FPN (CVPR'2019)](configs/sem_fpn)
-- [x] [DANet (CVPR'2019)](configs/danet)
-- [x] [APCNet (CVPR'2019)](configs/apcnet)
-- [x] [EMANet (ICCV'2019)](configs/emanet)
-- [x] [CCNet (ICCV'2019)](configs/ccnet)
-- [x] [DMNet (ICCV'2019)](configs/dmnet)
-- [x] [ANN (ICCV'2019)](configs/ann)
-- [x] [GCNet (ICCVW'2019/TPAMI'2020)](configs/gcnet)
-- [x] [FastFCN (ArXiv'2019)](configs/fastfcn)
-- [x] [Fast-SCNN (ArXiv'2019)](configs/fastscnn)
-- [x] [ISANet (ArXiv'2019/IJCV'2021)](configs/isanet)
-- [x] [OCRNet (ECCV'2020)](configs/ocrnet)
-- [x] [DNLNet (ECCV'2020)](configs/dnlnet)
-- [x] [PointRend (CVPR'2020)](configs/point_rend)
-- [x] [CGNet (TIP'2020)](configs/cgnet)
-- [x] [BiSeNetV2 (IJCV'2021)](configs/bisenetv2)
-- [x] [STDC (CVPR'2021)](configs/stdc)
-- [x] [SETR (CVPR'2021)](configs/setr)
-- [x] [DPT (ArXiv'2021)](configs/dpt)
-- [x] [SegFormer (NeurIPS'2021)](configs/segformer)
-- [x] [MaskCLIP](configs/maskclip)
-- [x] [MaskCLIP+](configs/maskclip_plus)
+# Extract Free Dense Labels from CLIP
+```
+        ███╗   ███╗ █████╗ ███████╗██╗  ██╗ ██████╗██╗     ██╗██████╗
+        ████╗ ████║██╔══██╗██╔════╝██║ ██╔╝██╔════╝██║     ██║██╔══██╗
+        ██╔████╔██║███████║███████╗█████╔╝ ██║     ██║     ██║██████╔╝
+        ██║╚██╔╝██║██╔══██║╚════██║██╔═██╗ ██║     ██║     ██║██╔═══╝
+        ██║ ╚═╝ ██║██║  ██║███████║██║  ██╗╚██████╗███████╗██║██║
+        ╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝ ╚═════╝╚══════╝╚═╝╚═╝
+```
+This is the code for our paper: [Extract Free Dense Labels from CLIP](https://arxiv.org/abs/2112.01071).
+
+This repo is a fork of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation). So the installation and data preparation is pretty similar.
+
+# Installation
+**Step 0.** Install PyTorch and Torchvision following [official instructions](https://pytorch.org/get-started/locally/), e.g.,
+
+```shell
+pip install torch torchvision
+# FYI, we're using torch==1.9.1 and torchvision==0.10.1
+```
+
+**Step 1.** Install [MMCV](https://github.com/open-mmlab/mmcv) using [MIM](https://github.com/open-mmlab/mim).
+```shell
+pip install -U openmim
+mim install mmcv-full
+```
+
+**Step 2.** Install [CLIP](https://github.com/openai/CLIP).
+```shell
+pip install ftfy regex tqdm
+pip install git+https://github.com/openai/CLIP.git
+```
+
+**Step 3.** Install MaskCLIP.
+```shell
+git clone https://github.com/chongzhou96/MaskCLIP.git
+cd MaskCLIP
+pip install -v -e .
+# "-v" means verbose, or more output
+# "-e" means installing a project in editable mode,
+# thus any local modifications made to the code will take effect without reinstallation.
+```
+
+# Dataset Preparation
+Please refer to [dataset_prepare.md](docs/en/dataset_prepare.md#prepare-datasets). In our paper, we experiment with [Pascal VOC](docs/en/dataset_prepare.md#pascal-voc), [Pascal Context](docs/en/dataset_prepare.md#pascal-context), and [COCO Stuff 164k](docs/en/dataset_prepare.md#coco-stuff-164k).
+
+# MaskCLIP
+MaskCLIP doesn't require any training. We only need to (1) download and convert the CLIP model and (2) prepare the text embeddings of the objects of interest.
 
-Supported datasets:
+**Step 0.** Download and convert the CLIP models, e.g.,
+```shell
+mkdir -p pretrain
+python tools/maskclip_utils/convert_clip_weights.py --model ViT16 --backbone
+# Other options for model: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT32, ViT16, ViT14
+```
+
+**Step 1.** Prepare the text embeddings of the objects of interest, e.g.,
+```shell
+python tools/maskclip_utils/prompt_engineering.py --model ViT16 --class-set context
+# Other options for model: RN50, RN101, RN50x4, RN50x16, ViT32, ViT16
+# Other options for class-set: voc, context, stuff
+# Actually, we've played around with many more interesting target classes. (See prompt_engineering.py)
+```
+
+**Step 2.** Get quantitative results (mIoU):
+```shell
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --eval mIoU
+# e.g., python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --eval mIoU
+```
+
+**Step 3. (optional)** Get qualitative results:
+```shell
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show-dir ${OUTPUT_DIR}
+# e.g., python tools/test.py configs/maskclip/maskclip_vit16_520x520_pascal_context_59.py pretrain/ViT16_clip_backbone.pth --show-dir output/
+```
 
-- [x] [Cityscapes](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#cityscapes)
-- [x] [PASCAL VOC](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#pascal-voc)
-- [x] [ADE20K](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#ade20k)
-- [x] [Pascal Context](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#pascal-context)
-- [x] [COCO-Stuff 10k](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#coco-stuff-10k)
-- [x] [COCO-Stuff 164k](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#coco-stuff-164k)
-- [x] [CHASE_DB1](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#chase-db1)
-- [x] [DRIVE](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#drive)
-- [x] [HRF](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#hrf)
-- [x] [STARE](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#stare)
-- [x] [Dark Zurich](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#dark-zurich)
-- [x] [Nighttime Driving](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#nighttime-driving)
-- [x] [LoveDA](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/dataset_prepare.md#loveda)
+# MaskCLIP+
+MaskCLIP+ trains another segmentation model with pseudo labels extracted from MaskCLIP.
 
-## Installation
+**Step 0.** Download and convert the CLIP models, e.g.,
+```shell
+mkdir -p pretrain
+python tools/maskclip_utils/convert_clip_weights.py --model ViT16
+# Other options for model: RN50, RN101, RN50x4, RN50x16, RN50x64, ViT32, ViT16, ViT14
+```
+
+**Step 1.** Prepare the text embeddings of the target dataset, e.g.,
+```shell
+python tools/maskclip_utils/prompt_engineering.py --model ViT16 --class-set context
+# Other options for model: RN50, RN101, RN50x4, RN50x16, ViT32, ViT16
+# Other options for class-set: voc, context, stuff
+```
 
-Please refer to [get_started.md](docs/en/get_started.md#installation) for installation and [dataset_prepare.md](docs/en/dataset_prepare.md#prepare-datasets) for dataset preparation.
+**Train.** Depending on your setup (single/mutiple GPU(s), multiple machines), the training script can be different. Here, we give an example of multiple GPUs on a single machine. For more infomation, please refer to [train.md](docs/en/train.md).
+```shell
+sh tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}
+# e.g., sh tools/dist_train.sh configs/maskclip_plus/zero_shot/maskclip_plus_r50_deeplabv3plus_r101-d8_480x480_40k_pascal_context.py 4
+```
 
-## Get Started
-
-Please see [train.md](docs/en/train.md) and [inference.md](docs/en/inference.md) for the basic usage of MMSegmentation.
-There are also tutorials for [customizing dataset](docs/en/tutorials/customize_datasets.md), [designing data pipeline](docs/en/tutorials/data_pipeline.md), [customizing modules](docs/en/tutorials/customize_models.md), and [customizing runtime](docs/en/tutorials/customize_runtime.md).
-We also provide many [training tricks](docs/en/tutorials/training_tricks.md) for better training and [useful tools](docs/en/useful_tools.md) for deployment.
-
-A Colab tutorial is also provided. You may preview the notebook [here](demo/MMSegmentation_Tutorial.ipynb) or directly [run](https://colab.research.google.com/github/open-mmlab/mmsegmentation/blob/master/demo/MMSegmentation_Tutorial.ipynb) on Colab.
-
-## Citation
-
-If you find this project useful in your research, please consider cite:
-
-```latex
-@misc{mmseg2020,
-    title={{MMSegmentation}: OpenMMLab Semantic Segmentation Toolbox and Benchmark},
-    author={MMSegmentation Contributors},
-    howpublished = {\url{https://github.com/open-mmlab/mmsegmentation}},
-    year={2020}
+**Inference.** See step 2 and step 3 under the MaskCLIP section. (We will release the trained models soon.)
+
+
+# Citation
+If you use MaskCLIP or this code base in your work, please cite
+```
+@InProceedings{zhou2022maskclip,
+    author = {Zhou, Chong and Loy, Chen Change and Dai, Bo},
+    title = {Extract Free Dense Labels from CLIP},
+    booktitle = {European Conference on Computer Vision (ECCV)},
+    year = {2022}
 }
 ```
 
-## Contributing
-
-We appreciate all contributions to improve MMSegmentation. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.
-
-## Acknowledgement
-
-MMSegmentation is an open source project that welcome any contribution and feedback.
-We wish that the toolbox and benchmark could serve the growing research
-community by providing a flexible as well as standardized toolkit to reimplement existing methods
-and develop their own new semantic segmentation methods.
-
-## Projects in OpenMMLab
-
-- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision.
-- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark.
-- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark.
-- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.
-- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.
-- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.
-- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark.
-- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark.
-- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox.
-- [MMOCR](https://github.com/open-mmlab/mmocr): A Comprehensive Toolbox for Text Detection, Recognition and Understanding.
-- [MMGeneration](https://github.com/open-mmlab/mmgeneration): A powerful toolkit for generative models.
-- [MIM](https://github.com/open-mmlab/mim): MIM Installs OpenMMLab Packages.
-- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark.
-- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab few shot learning toolbox and benchmark.
-- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.
-- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.
-- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab Model Compression Toolbox and Benchmark.
-- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab Model Deployment Framework.
+# Contact
+For questions about our paper or code, please contact [Chong Zhou](mailto:chong033@ntu.edu.sg).
diff --git a/docs/en/dataset_prepare.md b/docs/en/dataset_prepare.md
@@ -1,10 +1,10 @@
 ## Prepare datasets
 
-It is recommended to symlink the dataset root to `$MMSEGMENTATION/data`.
+It is recommended to symlink the dataset root to `$MASKCLIP/data`.
 If your folder structure is different, you may need to change the corresponding paths in config files.
 
 ```none
-mmsegmentation
+maskclip
 ├── mmseg
 ├── tools
 ├── configs
diff --git a/mmseg/models/backbones/vit.py b/mmseg/models/backbones/vit.py
@@ -465,7 +465,6 @@ def forward(self, inputs):
 
     def train(self, mode=True):
         super(VisionTransformer, self).train(mode)
-        self._freeze()
         if mode and self.norm_eval:
             for m in self.modules():
                 if isinstance(m, nn.LayerNorm):