Our codebase requires CUDA version 11.8.
conda create -n symmpo python=3.10 -y
conda activate symmpo
pip install -r requirements.txt1. Prepare data
The training dataset can be downloaded from SymMPO_Dataset.
2. Download the pretrained models
Download LLaVA model from liuhaotian/llava-v1.5-7b.
Download the vision tower model (CLIP) from openai/clip-vit-large-patch14-336.
3. Modify model paths
To integrate the downloaded models, update the following paths in the code:
- Set the path to the LLaVA model in the 3rd line of
run.sh. - Set the path to the CLIP model:
- In the 4th line of
run.sh. - In the 6th line of
llava/model/multimodal_encoder/builder.py. - In the 14th line of
llava/model/multimodal_encoder/clip_encoder.py.
- In the 4th line of
4. Start Training
Run the following command to start training.
bash run.shDuring evaluation, HallusionBench/object-halbench/mmhal-bench need to be assessed using Deepseek-V3/GPT-3.5/GPT-4.
- Download Questions and Annotations and Figures.
- Eval model.
bash script/eval/eval_hallusion.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_DEEPSEEK_API_KEY] [GPU_ID]We default use DeepSeek-V3, Please replace {YOUR_DEEPSEEK_API_KEY} with a valid DeekSeek api-key or directly modify the 48th line in eval/hallusion_evaluation.py.
-
Download data from COCO.
-
Download eval supplement model in python.
import nltk
nltk.download('wordnet')
nltk.download('punkt')- Download eval supplement model in terminal.
python -m spacy download en_core_web_trf- Eval model.
bash script/eval/eval_objhal.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]We default use gpt-3.5-turbo-0125, Please replace {YOUR_OPENAI_API_KEY} with a valid OpenAI api-key or directly modify the 51th line in eval/gpt4_grpc.py.
-
Download data from MMHal-Bench.
-
Eval model.
bash script/eval/eval_mmhal.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]We default use gpt-4-1106-preview, Please replace {YOUR_OPENAI_API_KEY} with a valid OpenAI api-key or directly modify the 51th line in eval/gpt4_grpc.py.
python -m spacy download en_core_web_lg- Eval model.
bash script/eval/eval_amber.sh [ckpt_path] [base_path if use lora ckpt else "No"] [GPU_ID] [data_dir]-
Download data from MMSTAR.
-
Eval model.
bash script/eval/eval_mmstar.sh [ckpt_path] [base_path if use lora ckpt else "No"] [GPU_ID] [data_dir]- TPO and RLAIF-V: This work extends the implementations provided by these projects, whose concise and effective DPO solutions are greatly appreciated.
- LLaVA: The training process was carried out on the LLaVA model, and we acknowledge the valuable contributions of this work to our research.
If you find our work helpful, please consider citing it:
@article{liu2025mitigating,
title={Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization},
author={Liu, Wenqi and Song, Xuemeng and Li, Jiaxi and Wei, Yinwei and Zheng, Na and Yin, Jianhua and Nie, Liqiang},
journal={arXiv preprint arXiv:2506.11712},
year={2025}
}