This is the official repository of "Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation."
Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation
Moo Hyun (Kyle) Son1, Juyoung (Justin) Bae1, Zelin Qiu1, Jiale Peng3, Kai Xin Li2, Yifan Lin3, Hao Chen1
1The Hong Kong University of Science and Technology (HKUST) 2Delun Dental Hospital 3The University of Hong Kong (HKU)Abstract. Digital dentistry represents a transformative shift in modern dental practice. The foundational step in this transformation is the accurate digital representation of the patient's dentition, which is obtained from segmented Cone-Beam Computed Tomography (CBCT) and Intraoral Scans (IOS). Despite the growing interest in digital dental technologies, existing segmentation methodologies frequently lack rigorous validation and demonstrate limited performance and clinical applicability. To the best of our knowledge, this is the first work to introduce a multimodal pretraining framework for tooth segmentation. We present ToothMCL, a Tooth Multimodal Contrastive Learning for pretraining that integrates volumetric (CBCT) and surface-based (IOS) modalities. By capturing modality-invariant representations through multimodal contrastive learning, our approach effectively models fine-grained anatomical features, enabling precise multi-class segmentation and accurate identification of Fédération Dentaire Internationale (FDI) tooth numbering. Along with the framework, we curated CBCT-IOS3.8K, the largest paired CBCT and IOS dataset to date, comprising 3,867 patients. We then evaluated ToothMCL on a comprehensive collection of independent datasets, representing the largest and most diverse evaluation to date. Our method achieves state-of-the-art performance in both internal and external testing, with an increase of 12% for CBCT segmentation and 8% for IOS segmentation in the Dice Similarity Coefficient (DSC). Furthermore, ToothMCL consistently surpasses existing approaches in tooth groups and demonstrates robust generalizability across varying imaging conditions and clinical scenarios. Our findings underscore the transformative potential of large-scale multimodal pretraining in digital dentistry and highlight the critical importance of effectively leveraging paired multimodal data. Our approach lays the foundation for enhanced clinical workflows, including caries detection, orthodontic simulation, and dental prosthesis design.
To get started with this project, clone this repository to your local machine using the following command:
git clone https://github.com/mhson-kyle/ToothMCL.git
cd ToothMCLBefore training the model, make sure you have the following requirements installed:
pip install -r requirements.txt- CBCT: Resample to 0.3 mm isotropic voxel size
cd preprocessing
python voxel_resample.py --input_dir path/to/input_dir --output_dir path/to/output_dir- Crop to ROI.
cd preprocessing
python crop_roi.py --images_dir path/to/images_dir --labels_dir path/to/labels_dir --save_dir path/to/save_dir- IOS: Farthest point sampling to 24000 points per jaw
cd preprocessing
CUDA_VISIBLE_DEVICES=0 python fps.py --obj_dir path/to/obj_dir --json_dir path/to/json_dir --save_dir path/to/save_dir- Prepare your dataset in the required format
- Adjust the configuration files to suit your training needs
- Run the following command to train the model:
cd pretrain
CUDA_VISIBLE_DEVICES=0 python train.py --config path/to/config.yamlcd finetune/cbct_segmentation
sh scripts/train.shcd finetune/ios_segmentation
CUDA_VISIBLE_DEVICES=0 python train.py --config path/to/config.yamlWe sincerely thank those who have open-sourced their works, including but not limited to, the repositories below:
If you find this repository useful, please consider citing:
@inproceedings{Son2025ToothMCL,
title={Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation},
author={Son, Moo Hyun and Bae, Juyoung and Qiu, Zelin and Li, Kai Xin and Lin, Yifan and Chen, Hao},
journal={Under Review},
year={2025},
}