Skip to content

[NeurIPS 2025] Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Notifications You must be signed in to change notification settings

Liuwq-bit/SymMPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

intro1

Install

Our codebase requires CUDA version 11.8.

conda create -n symmpo python=3.10 -y
conda activate symmpo
pip install -r requirements.txt

Train

1. Prepare data

The training dataset can be downloaded from SymMPO_Dataset.

2. Download the pretrained models

Download LLaVA model from liuhaotian/llava-v1.5-7b.

Download the vision tower model (CLIP) from openai/clip-vit-large-patch14-336.

3. Modify model paths

To integrate the downloaded models, update the following paths in the code:

  1. Set the path to the LLaVA model in the 3rd line of run.sh.
  2. Set the path to the CLIP model:
    • In the 4th line of run.sh.
    • In the 6th line of llava/model/multimodal_encoder/builder.py.
    • In the 14th line of llava/model/multimodal_encoder/clip_encoder.py.

4. Start Training

Run the following command to start training.

bash run.sh

Evaluation

During evaluation, HallusionBench/object-halbench/mmhal-bench need to be assessed using Deepseek-V3/GPT-3.5/GPT-4.

HallusionBench

  1. Download Questions and Annotations and Figures.
  2. Eval model.
bash script/eval/eval_hallusion.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_DEEPSEEK_API_KEY] [GPU_ID]

We default use DeepSeek-V3, Please replace {YOUR_DEEPSEEK_API_KEY} with a valid DeekSeek api-key or directly modify the 48th line in eval/hallusion_evaluation.py.

Object-HalBench

  1. Download data from COCO.

  2. Download eval supplement model in python.

import nltk
nltk.download('wordnet')
nltk.download('punkt')
  1. Download eval supplement model in terminal.
python -m spacy download en_core_web_trf
  1. Eval model.
bash script/eval/eval_objhal.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]

We default use gpt-3.5-turbo-0125, Please replace {YOUR_OPENAI_API_KEY} with a valid OpenAI api-key or directly modify the 51th line in eval/gpt4_grpc.py.

MMHal-Bench

  1. Download data from MMHal-Bench.

  2. Eval model.

bash script/eval/eval_mmhal.sh [ckpt_path] [base_path if use lora ckpt else "No"] [YOUR_OPENAI_API_KEY] [GPU_ID]

We default use gpt-4-1106-preview, Please replace {YOUR_OPENAI_API_KEY} with a valid OpenAI api-key or directly modify the 51th line in eval/gpt4_grpc.py.

AMBER

  1. Download AMBER data and image.

  2. Download eval supplement model in terminal.

python -m spacy download en_core_web_lg
  1. Eval model.
bash script/eval/eval_amber.sh [ckpt_path] [base_path if use lora ckpt else "No"] [GPU_ID] [data_dir]

MMSTAR

  1. Download data from MMSTAR.

  2. Eval model.

bash script/eval/eval_mmstar.sh [ckpt_path] [base_path if use lora ckpt else "No"] [GPU_ID] [data_dir]

Acknowledgement

  • TPO and RLAIF-V: This work extends the implementations provided by these projects, whose concise and effective DPO solutions are greatly appreciated.
  • LLaVA: The training process was carried out on the LLaVA model, and we acknowledge the valuable contributions of this work to our research.

Citation

If you find our work helpful, please consider citing it:

@article{liu2025mitigating,
  title={Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization},
  author={Liu, Wenqi and Song, Xuemeng and Li, Jiaxi and Wei, Yinwei and Zheng, Na and Yin, Jianhua and Nie, Liqiang},
  journal={arXiv preprint arXiv:2506.11712},
  year={2025}
}

About

[NeurIPS 2025] Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published