Skip to content

PRIS-CV/Subspace-Prompting

Repository files navigation

Vision-Language Subspace Prompting (SuPr)

Official implementation of the paper: "Vision-Language Subspace Prompting".


📚 Table of Contents


🚀 News


✨ Introduction

In adapting vision-language models like CLIP to downstream tasks, existing methods often struggle to balance task-specific objectives with the need to preserve CLIP’s generalizable embedding space. Traditional regularization techniques constrain optimization flexibility, limiting the adaptability of soft prompts to new tasks (left figure).
In contrast, our Subspace Prompting (SuPr) method circumvents this tradeoff. It enables the integration of high-dimensional, semantically rich subspaces that simultaneously capture task-specific knowledge while retaining CLIP's generalizable features (right figure).


Abstract:
Prompting vision-language models (e.g., CLIP) to adapt to downstream tasks has emerged as a crucial research topic. A prominent approach is context optimization, which replaces a subset of text tokens with learnable parameters, known as soft prompts. However, conventional pipelines leverage only a single vector embedding derived from these soft prompts for visual classification.
This design risks overfitting to base class training data and leads to degraded performance on novel classes. Previous works attempt to address this by regularizing soft prompts toward handcrafted hard prompts. Yet, excessive regularization hampers model adaptability on base classes.

To strike a better balance, we introduce SuPr, a subspace-based prompting method. SuPr models a shared subspace between learnable soft prompts and textual hard prompts, enabling flexible yet structured adaptation. This approach achieves superior performance on both base and novel classes.

With the advantages of subspace modeling, SuPr demonstrates strong effectiveness across diverse scenarios, including domain generalization, domain adaptation, cross-dataset transfer, and few-shot learning. Moreover, we provide extensive analysis by visualizing the learned subspace and applying SuPr to text-to-image generation tasks to understand the nature of the learned prompts.


📦 Supported Methods

Method Paper/Reference Configurations Training Scripts
Independent V-L Prompting - link link
CoOp IJCV 2022 link link
Co-CoOp CVPR 2022 link link
MaPLe CVPR 2023 link link
KgCoOp CVPR 2023 link link
PromptSRC ICCV 2023 link link
TCP CVPR 2024 link link
DePT CVPR 2024 link link
SuPr (ours) arXiv link link

📊 Results

Model Base Accuracy Novel Accuracy Harmonic Mean (HM)
CLIP 69.34 74.22 71.70
Independent V-L Prompting 84.14 71.42 77.26
SuPr (Ours) 84.15 76.48 80.13

🎨 Visualization

SuPr's subspace modeling captures diverse intra-class variations, including fine-grained features like color, texture, and depiction style. This enables richer semantic representations compared to traditional soft prompts, which often focus only on dominant concepts. Additionally, interpolations within the subspace reveal smooth semantic transitions along various attributes.


⚙️ Installation

Please follow the instructions in INSTALL.md for environment setup and package requirements.


📂 Data Preparation

Datasets required for training and evaluation can be prepared by following DATASETS.md.


🏛️ Model Zoo

Configurations Model Checkpoints
SuPr link
SuPr + PromptSRC link
SuPr Ens link

🏋️ Training

Please refer to TRAIN.md for detailed instructions on training SuPr, PromptSRC, and IVLP baselines from scratch.


📈 Evaluation

Please refer to EVAL.md for reproducing official results using our pre-trained models.


📬 Contact

For questions, issues, or discussions, please open an issue in this repository or contact: [email protected]


🙏 Acknowledgements

Our codebase builds upon and extends the following repositories:

We sincerely thank the authors for sharing their codebases. If you find our work useful, please also consider citing these related works.


🔖 Citation

If you find our work useful, please consider citing:

@misc{supr2025,
  title={Vision-Language Subspace Prompting},
  author={Your Name and Collaborators},
  year={2025},
  eprint={2307.06948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

About

Vision-Language Subspace Prompting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published