A collection of referring image segmentation papers and datasets.
Feel free to create a PR or an issue.
Outline
| Short name | Paper | Source | Code/Project Link |
|---|---|---|---|
| MeViS | MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions | ICCV 2023 | [dataset] [project] |
| gRefCOCO | GRES: Generalized Referring Expression Segmentation | CVPR 2023 | [dataset] [project] |
| ClevrTex | ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation | NeurIPS Datasets and Benchmarks 2021 | [project] |
| ScanRefer | ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language | ECCV 2020 | [project] |
| VGPhraseCut | PhraseCut: Language-based Image Segmentation in the Wild | CVPR 2020 | [project] |
| CLEVR-Ref+ | CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions | CVPR 2019 | [project] |
| UNC | Modeling context in referring expressions | ECCV 2016 | [dataset] |
| UNC+ | Modeling context in referring expressions | ECCV 2016 | [dataset] |
| Google-Ref | Generation and comprehension of unambiguous object descriptions | CVPR 2016 | [dataset] |
| ReferIt | Referit game: Referring to objects in photographs of natural scenes | EMNLP 2014 | [project] |
| Name | Workshop | Date | Submission Link |
|---|---|---|---|
| 1st MeViS Challenge | CVPR 2024 Workshop: Pixel-level Video Understanding in the Wild | May 2024 | [CodaLab] |
| RVOS Challenge | ECCV 2024 Workshop: The 6th Large-scale Video Object Segmentation Challenge | Aug 2024 | [CodaLab] |
| Short name | Paper | Source | Code/Project Link |
|---|---|---|---|
| UniPixel | UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning | NeurIPS 2025 | [code] [webpage] |
| PhraseClick | PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click | ECCV 2020 |
| Short name | Paper | Source | Code/Project Link |
|---|---|---|---|
| OV-BIS | OV-BIS: Open-Vocabulary Boundary Guide Zero-Shot 3D Instance Segmentation | TMM 2025 | |
| X-RefSeg3D | X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks | AAAI 2024 | [code] |
| 3D-STMN | 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation | AAAI 2024 | [code] |
| SegPoint | SegPoint: Segment Any Point Cloud via Large Language Model | ECCV 2024 | [project] |
| 3D-GRES | 3D-GRES: Generalized 3D Referring Expression Segmentation | ACM MM 2024 | [code] |
| RefMask3D | RefMask3D: Language-Guided Transformer for 3D Referring Segmentation | ACM MM 2024 | [code] |
| TGNN | Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation | AAAI 2021 | |
| InstanceRefer | InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring | ICCV 2021 | [code] |
