[IJCAI 2025] Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning (Trident)
- 🧑💻 Authors: Xudong Yan, Songhe Feng, Yang Zhang, Jian Yang, Yueguan Lin, Haojun Fei
- 🏛️ Affiliations: Beijing Jiaotong University, Qifu Technology
- 🔍 More details: [arXiv version] | [IJCAI version] | [code]
Note: The supplementary material is provided in the paper's arXiv version.
TL;DR: We employ both LLM and MLLM to guide attribute-object disentanglement by generating auxiliary attributes and representing primitive words for CZSL, respectively.
Our work is implemented in PyTorch framework. Create a conda environment trident using:
conda create --name trident python=3.8.0
conda activate trident
pip install -r requirements.txt
Datasets: In our work, we conduct experiments on three datasets: MIT-States, C-GQA, and VAW-CZSL. For VAW-CZSL, you can download this dataset from this website. For MIT-States and C-GQA, please using:
bash utils/download_data.sh
Pre-trained models: ViT-Large-Patch14-336px (the backbone) can be downloaded here. LLaVA-v1.5-7b can be found here.
-
Before training Trident, please obtain the auxiliary attributes by GPT-3.5 through OpenAI official API, and get the last hidden states of LLaVA v1.5 offline, which can be found in utils folder.
-
Train Trident model with a specified configure file using:
python train.py --cfg config/{DATASET_NAME}.yml
Evaluate Trident model using:
python test.py --cfg config/{DATASET_NAME}.yml --load TRIDENT_MODEL.pth
If you find our work helpful, please cite our paper:
@inproceedings{Yan_2025_IJCAI,
title = {Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning},
author = {Yan, Xudong and Feng, Songhe and Zhang, Yang and Yang, Jian and Lin, Yueguan and Fei, Haojun},
booktitle = {Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, {IJCAI-25}},
pages = {2179--2187},
year = {2025},
}
or
@inproceedings{Yan_2025_IJCAI,
title={Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning},
author={Yan, Xudong and Feng, Songhe and Zhang, Yang and Yang, Jian and Lin, Yueguan and Fei, Haojun},
journal={arXiv preprint arXiv:2411.12584},
year={2024}
}
Thanks for the publicly available code of OADis and LLaVA.
If you have any questions or are interested in collaboration, please feel free to contact me at [email protected] / [email protected] .
