Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Install

Our codebase requires CUDA version 11.8.

conda create -n symmpo python=3.10 -y
conda activate symmpo
pip install -r requirements.txt

Train

1. Prepare data

The training dataset can be downloaded from SymMPO_Dataset.

2. Download the pretrained models

Download LLaVA model from liuhaotian/llava-v1.5-7b.

Download the vision tower model (CLIP) from openai/clip-vit-large-patch14-336.

3. Modify model paths

To integrate the downloaded models, update the following paths in the code:

Set the path to the LLaVA model in the 3rd line of run.sh.
Set the path to the CLIP model:
- In the 4th line of run.sh.
- In the 6th line of llava/model/multimodal_encoder/builder.py.
- In the 14th line of llava/model/multimodal_encoder/clip_encoder.py.

4. Start Training

Run the following command to start training.

bash run.sh

Evaluation

During evaluation, HallusionBench/object-halbench/mmhal-bench need to be assessed using Deepseek-V3/GPT-3.5/GPT-4.

HallusionBench

Download Questions and Annotations and Figures.
Eval model.

bash script/eval/eval_hallusion.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_DEEPSEEK_API_KEY] [GPU_ID]

We default use DeepSeek-V3, Please replace {YOUR_DEEPSEEK_API_KEY} with a valid DeekSeek api-key or directly modify the 48th line in eval/hallusion_evaluation.py.

Object-HalBench

Download data from COCO.
Download eval supplement model in python.

import nltk
nltk.download('wordnet')
nltk.download('punkt')

Download eval supplement model in terminal.

python -m spacy download en_core_web_trf

Eval model.

bash script/eval/eval_objhal.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]

We default use gpt-3.5-turbo-0125, Please replace {YOUR_OPENAI_API_KEY} with a valid OpenAI api-key or directly modify the 51th line in eval/gpt4_grpc.py.

MMHal-Bench

Download data from MMHal-Bench.
Eval model.

bash script/eval/eval_mmhal.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]

We default use gpt-4-1106-preview, Please replace {YOUR_OPENAI_API_KEY} with a valid OpenAI api-key or directly modify the 51th line in eval/gpt4_grpc.py.

AMBER

Download AMBER data and image.
Download eval supplement model in terminal.

python -m spacy download en_core_web_lg

Eval model.

bash script/eval/eval_amber.sh [ckpt_path] [base_path if use lora ckpt else "No"] [GPU_ID] [data_dir]

MMSTAR

Download data from MMSTAR.
Eval model.

bash script/eval/eval_mmstar.sh [ckpt_path] [base_path if use lora ckpt else "No"] [GPU_ID] [data_dir]

Acknowledgement

TPO and RLAIF-V: This work extends the implementations provided by these projects, whose concise and effective DPO solutions are greatly appreciated.
LLaVA: The training process was carried out on the LLaVA model, and we acknowledge the valuable contributions of this work to our research.

Citation

If you find our work helpful, please consider citing it:

@article{liu2025mitigating,
  title={Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization},
  author={Liu, Wenqi and Song, Xuemeng and Li, Jiaxi and Wei, Yinwei and Zheng, Na and Yin, Jianhua and Nie, Liqiang},
  journal={arXiv preprint arXiv:2506.11712},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
asset		asset
eval		eval
llava		llava
muffin		muffin
script		script
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Install

Train

Evaluation

HallusionBench

Object-HalBench

MMHal-Bench

AMBER

MMSTAR

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

Liuwq-bit/SymMPO

Folders and files

Latest commit

History

Repository files navigation

Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Install

Train

Evaluation

HallusionBench

Object-HalBench

MMHal-Bench

AMBER

MMSTAR

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages