PoseProbe enables realistic view synthesis from few input images without pose prior.
Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images.
We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations.
Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as pose probes
.
By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough.
PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: https://zhirui-gao.github.io/PoseProbe.github.io/
We leverage generic objects in few-view input images as pose probes. The pose probe is automatically segmented by SAM with prompts, and initialized by a cubic shape. The method does not introduce extra burden but successfully facilitates pose estimation in feature-sparse scenes.
This dataset is generated using BlenderProc with wide-baseline views. We only have access to 3 input views, without camera pose prior. Our method renders clearer details and fewer artifacts compared to other pose-free baselines! Note that the camera poses derived via PnP in our method serve as the initial poses for all baseline for a fair comparison.
DTU is composed of complex object-centric scenes, with wide-baseline views spanning a half-hemisphere. We only have access to 3 input views, without camera pose prior. As before, All baselines suffer from blurriness and inaccurate scene geometry, while our approach produces much better-quality novel-view renderings thanks to the pose probe constraint. Note that the camera poses derived via PnP in our method serve as the initial poses for all NeRF baseline for a fair comparison.
If you want to cite our work, please use:
@article{gao2024generic,
title={Generic Objects as Pose Probes for Few-Shot View Synthesis},
author={Gao, Zhirui and Yi, Renjiao and Zhu, Chenyang and Zhuang, Ke and Chen, Wei and Xu, Kai},
journal={arXiv preprint arXiv:2408.16690},
year={2024}
}