Qi Zhang, Zhouhang Luo, Tao Yu, and Hui Huang. View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection. AAAI 2025
This is the official implementation for the AAAI 2025 paper `VTR'.
View transformation robustness (VTR) is critical for deeplearning-based multi-view 3D object reconstruction models, which indicates the methods’ stability under inputs with various view transformations. However, existing research seldom focuses on view transformation robustness in multi-view 3D object reconstruction. One direct way to improve the models’ VTR is to produce data with more view transformations and add them to model training. Recent progress on large vi- sion models, particularly Stable Diffusion models, has provided great potential for generating 3D models or synthesizing novel view images with only a single image input. To fully utilize the power of Stable Diffusion models without causing extra inference computation burdens, we propose to generate novel views with Stable Diffusion models for better view transformation robustness. Instead of synthesizing random views, we propose a reconstruction error-guided view selection method, which considers the reconstruction errors’ spatial distribution of the 3D predictions and chooses the views that could cover the reconstruction errors as much as possible. The methods are trained and tested on sets with large view transformations to validate the 3D reconstruction models’ robustness to view transformations. Extensive experiments demonstrate that the proposed method can outperform state-of-the-art 3D reconstruction methods and other view transformation robustness comparison methods.
We should do some preliminary work before install dependencies.
git clone https://github.com/CompVis/taming-transformers.git
pip install -e taming-transformers/
git clone https://github.com/openai/CLIP.git
pip install -e CLIP/
- python
- pytorch & torchvision
- tensorboardX
- numpy
- matplotlib
- pillow
- opencv-python
- tqdm
- argparse
- shutil
- tqdm
- rich
- pix2vox++, following their instruction
- LRGT, following their instruction
- Zero123, following their instruction
For training, you first pretrain the reconstruction model, pix2vox++ and LRGT (the pretrained models are as follow). Then you can simply use the following command: sh train_ours.sh
We provide the testing script, which you can run as follows: sh test_on_voxel.sh
You can download the checkpoints at this link code: r9s3
This work was supported in parts by NSFC (62202312, U21B2023), Guangdong Basic and Applied Basic Research Foundation (2023B1515120026), Shenzhen Science and Technology Program (KQTD 20210811090044003, RCJC20200714114435012), and Scientific Development Funds from Shenzhen University.
If you feel the paper is useful or use the code of the paper, please cite our paper:
@inproceedings{zhang2025view,
title={View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection},
author={Zhang, Qi and Luo, Zhouhang and Yu, Tao and Huang, Hui},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={10},
pages={10076--10084},
year={2025}
}
