Mesh R-CNN is a 3D reconstruction and object understanding framework developed by Facebook Research that extends Mask R-CNN into the 3D domain. Built on top of Detectron2 and PyTorch3D, Mesh R-CNN enables end-to-end 3D mesh prediction directly from single RGB images. The model learns to detect, segment, and reconstruct detailed 3D mesh representations of objects in natural images, bridging the gap between 2D perception and 3D understanding. Unlike voxel-based or point-based approaches, Mesh R-CNN uses a differentiable mesh representation, allowing it to efficiently refine surface geometry while maintaining high spatial detail. The system combines 2D detection from Mask R-CNN with 3D reasoning modules that output full mesh reconstructions aligned with the input image. It has been evaluated on datasets such as Pix3D, where it demonstrates state-of-the-art performance in reconstructing real-world object geometry.
Features
- Extends Mask R-CNN to enable 3D mesh reconstruction from images
- Built on Detectron2 (for 2D vision) and PyTorch3D (for 3D operations)
- Predicts detailed 3D surface meshes instead of voxels or point clouds
- End-to-end differentiable framework for joint 2D-3D reasoning
- Pretrained model available for the Pix3D dataset
- Supports demo visualization and easy integration with Detectron2 pipelines