Official implementation of the paper: "Vision-Language Subspace Prompting".
- (April 27, 2025)
- Released pre-trained models and evaluation scripts to reproduce SuPr's official benchmark results.
- Released training scripts for SuPr.
- This repository also supports other prompting methods, including DePT (CVPR'24), TCP (CVPR'24), PromptSRC (ICCV'23), KgCoOp (CVPR'23), MaPLe (CVPR'23), CoOp (IJCV'22), and Co-CoOp (CVPR'22).
In adapting vision-language models like CLIP to downstream tasks, existing methods often struggle to balance task-specific objectives with the need to preserve CLIP’s generalizable embedding space. Traditional regularization techniques constrain optimization flexibility, limiting the adaptability of soft prompts to new tasks (left figure).
In contrast, our Subspace Prompting (SuPr) method circumvents this tradeoff. It enables the integration of high-dimensional, semantically rich subspaces that simultaneously capture task-specific knowledge while retaining CLIP's generalizable features (right figure).
Abstract:
Prompting vision-language models (e.g., CLIP) to adapt to downstream tasks has emerged as a crucial research topic. A prominent approach is context optimization, which replaces a subset of text tokens with learnable parameters, known as soft prompts. However, conventional pipelines leverage only a single vector embedding derived from these soft prompts for visual classification.
This design risks overfitting to base class training data and leads to degraded performance on novel classes. Previous works attempt to address this by regularizing soft prompts toward handcrafted hard prompts. Yet, excessive regularization hampers model adaptability on base classes.To strike a better balance, we introduce SuPr, a subspace-based prompting method. SuPr models a shared subspace between learnable soft prompts and textual hard prompts, enabling flexible yet structured adaptation. This approach achieves superior performance on both base and novel classes.
With the advantages of subspace modeling, SuPr demonstrates strong effectiveness across diverse scenarios, including domain generalization, domain adaptation, cross-dataset transfer, and few-shot learning. Moreover, we provide extensive analysis by visualizing the learned subspace and applying SuPr to text-to-image generation tasks to understand the nature of the learned prompts.
Method | Paper/Reference | Configurations | Training Scripts |
---|---|---|---|
Independent V-L Prompting | - | link | link |
CoOp | IJCV 2022 | link | link |
Co-CoOp | CVPR 2022 | link | link |
MaPLe | CVPR 2023 | link | link |
KgCoOp | CVPR 2023 | link | link |
PromptSRC | ICCV 2023 | link | link |
TCP | CVPR 2024 | link | link |
DePT | CVPR 2024 | link | link |
SuPr (ours) | arXiv | link | link |
Model | Base Accuracy | Novel Accuracy | Harmonic Mean (HM) |
---|---|---|---|
CLIP | 69.34 | 74.22 | 71.70 |
Independent V-L Prompting | 84.14 | 71.42 | 77.26 |
SuPr (Ours) | 84.15 | 76.48 | 80.13 |
SuPr's subspace modeling captures diverse intra-class variations, including fine-grained features like color, texture, and depiction style. This enables richer semantic representations compared to traditional soft prompts, which often focus only on dominant concepts. Additionally, interpolations within the subspace reveal smooth semantic transitions along various attributes.
Please follow the instructions in INSTALL.md for environment setup and package requirements.
Datasets required for training and evaluation can be prepared by following DATASETS.md.
Configurations | Model Checkpoints |
---|---|
SuPr | link |
SuPr + PromptSRC | link |
SuPr Ens | link |
Please refer to TRAIN.md for detailed instructions on training SuPr, PromptSRC, and IVLP baselines from scratch.
Please refer to EVAL.md for reproducing official results using our pre-trained models.
For questions, issues, or discussions, please open an issue in this repository or contact: [email protected]
Our codebase builds upon and extends the following repositories:
We sincerely thank the authors for sharing their codebases. If you find our work useful, please also consider citing these related works.
If you find our work useful, please consider citing:
@misc{supr2025,
title={Vision-Language Subspace Prompting},
author={Your Name and Collaborators},
year={2025},
eprint={2307.06948},
archivePrefix={arXiv},
primaryClass={cs.CV}
}