Skip to content

devinli123/MV-SAM3D

Repository files navigation

MV-SAM3D: Adaptive Multi-View 3D Reconstruction

An enhanced multi-view extension for SAM 3D Objects, featuring adaptive fusion strategies for improved 3D reconstruction quality from multiple viewpoints.

🔗 Basic Version: For a simpler averaging-based approach, check out our basic multi-view fork.

🔥 Highlights

  • Adaptive Multi-View Fusion: Unlike simple averaging, we employ a confidence-aware fusion mechanism that automatically weighs contributions from different views based on their reliability.

  • Multiple Weighting Strategies:

    • Entropy-based: Uses attention entropy as uncertainty measure
    • Visibility-based: Uses self-occlusion detection via DDA ray tracing
    • Mixed: Combines both strategies for robust weighting
  • Per-Latent Weighting: Our method operates at the latent level, enabling fine-grained control over how information from different views is combined for each spatial location.

  • Improved Reconstruction Quality: Better handling of occluded regions and view-dependent artifacts through intelligent fusion.

📢 Research in Progress

This is an active research project. We are continuously exploring new fusion strategies and will release updates as our research progresses.

Results Comparison

Single-View (View 3) Single-View (View 6) Multi-View (Adaptive Fusion)
Input Image
Input Image
Input Images
↓ 3D Reconstruction ↓
3D Result
3D Result
3D Result

Installation

Please follow the installation instructions in the basic multi-view version or in SAM 3D Objects.

Quick Start

Basic Usage (Both Stages Weighted by Default)

python run_inference_weighted.py \
    --input_path ./data/example \
    --mask_prompt stuffed_toy \
    --image_names 0,1,2,3,4,5,6,7

Disable All Weighting (Simple Average)

python run_inference_weighted.py \
    --input_path ./data/example \
    --mask_prompt stuffed_toy \
    --image_names 0,1,2,3,4,5,6,7 \
    --no_stage1_weighting --no_stage2_weighting

Using Visibility Weighting (Requires DA3)

To use visibility-based weighting for Stage 2, you need to first run Depth Anything 3 (DA3) to obtain camera poses:

Step 1: Install Depth Anything 3

Please follow the installation instructions at Depth Anything 3.

Step 2: Run DA3 to get camera poses

python scripts/run_da3.py \
    --image_dir ./data/example/images \
    --output_dir ./da3_outputs/example

Step 3: Run weighted inference with visibility

python run_inference_weighted.py \
    --input_path ./data/example \
    --mask_prompt stuffed_toy \
    --image_names 0,1,2,3,4,5,6,7 \
    --da3_output ./da3_outputs/example/da3_output.npz \
    --stage2_weight_source visibility

Key Parameters

Basic

Parameter Description Default
--input_path Path to input directory Required
--mask_prompt Mask folder name None
--image_names Image names (comma-separated) All images
--da3_output Path to DA3 output (for visibility weighting) None

Stage 1 (Shape) Weighting

Parameter Description Default
--no_stage1_weighting Disable Stage 1 weighting False (enabled)
--stage1_entropy_layer Attention layer for weight computation 9
--stage1_entropy_alpha Entropy weighting sharpness 30.0

Stage 2 (Texture) Weighting

Parameter Description Default
--no_stage2_weighting Disable Stage 2 weighting False (enabled)
--stage2_weight_source entropy, visibility, or mixed entropy
--stage2_entropy_alpha Entropy weighting sharpness 30.0
--stage2_visibility_alpha Visibility weighting sharpness 30.0
--stage2_attention_layer Attention layer for weight computation 6
--self_occlusion_tolerance Tolerance for visibility detection 4.0

Visualization

Parameter Description Default
--visualize_weights Visualize fusion weights False
--compute_latent_visibility Visualize latent visibility per view False
--overlay_pointmap Overlay result on View 0 pointmap False
--merge_da3_glb Merge result with DA3 scene False

📖 Full Parameters: See README_PARAMETERS.md for detailed parameter documentation.

Data Structure

input_path/
├── images/
│   ├── 0.png
│   ├── 1.png
│   └── ...
└── object_name/  # mask folder
    ├── 0.png
    ├── 1.png
    └── ...

Mask Format: RGBA format where alpha channel stores mask (alpha=255 for object, alpha=0 for background).

Acknowledgments

This project builds upon SAM 3D Objects by Meta. We thank the original authors for their excellent work.

License

This project inherits the SAM License from the original SAM 3D Objects project.

About

SAM 3D Objects with Multi-view Images

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published