Skip to content

Michi-3000/EyeCLIP

Repository files navigation

EyeCLIP: A Multimodal Visual–Language Foundation Model for Computational Ophthalmology

Python 3.8+ PyTorch

Official repository for EyeCLIP, a vision-language foundation model designed specifically for multi-modal ophthalmic image analysis. 📝 Paper: A Multimodal Visual–Language Foundation Model for Computational Ophthalmology (npj Digital Medicine, 2025)


🔍 Overview

EyeCLIP adapts the CLIP (Contrastive Language–Image Pretraining) architecture to address the unique challenges of ophthalmology. It incorporates self-supervised learning, multi-modal image contrastive learning, and hierarchical keyword-guided vision-language supervision. These innovations empower EyeCLIP to achieve zero-shot disease recognition, cross-modal retrieval, and efficient fine-tuning across a wide range of ophthalmic and systemic conditions.


✨ Key Features

  • 🧠 Multimodal Support Natively pretrained on 11 ophthalmic modalities using one encoder, including:

    • Color Fundus Photography (CFP)
    • Optical Coherence Tomography (OCT)
    • Fundus Fluorescein Angiography (FFA)
    • Indocyanine Green Angiography (ICGA)
    • Fundus Autofluorescence (FAF)
    • Slit Lamp Photography
    • Ocular Ultrasound (OUS)
    • Specular Microscopy
    • External Eye Photography
    • Corneal Photography
    • RetCam Imaging
  • 🔗 CLIP-based Vision–Language Pretraining Tailored adaptation of OpenAI’s CLIP for ophthalmic imaging and medical-language semantics.

  • 🚀 Zero-Shot Generalization Classifies both ophthalmic and systemic diseases using natural language prompts—without task-specific fine-tuning.

  • 🧩 Versatile and Adaptable Easily fine-tuned for downstream diagnostic tasks, including multi-label classification, systemic disease prediction, and rare disease diagnosis.


🗞️ News

  • 2025-07: Initial release of pre-trained EyeCLIP model weights
  • 2025-06: Paper accepted by npj Digital Medicine
  • 2025-03: Public release of EyeCLIP codebase

⚙️ Installation

Set up the environment using conda and pip:

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
git clone https://github.com/Michi-3000/EyeCLIP.git
cd EyeCLIP

🎯 Pretrained Weights

Model Name Description Download Link
eyeclip_visual Multimodal foundation model trained on diverse ophthalmic data 🔗 Google Drive

📁 Dataset Preparation

To prepare datasets for pretraining or downstream evaluation:

  1. Download datasets referenced in the paper.
  2. Organize them into the following format:
dataset_root/
├── images/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
├── labels.csv
  • labels.csv should follow the format:
impath,class
/path/to/image1.jpg,0
/path/to/image2.jpg,1

🚀 Quick Start

🔍 Zero-Shot Evaluation

python zero_shot.py \
    --data_path ./your_dataset \
    --text_prompts "normal retina,diabetic retinopathy,glaucoma"

🩺 Fine-Tuning

Ophthalmic Disease Classification

bash scripts/cls_ophthal.sh

Or use the Python version:

current_time=$(date +"%Y-%m-%d-%H%M")
FINETUNE_CHECKPOINT="eyeclip_visual.pt"
DATA_PATH="/data/public/"

DATASET_NAMES=("IDRiD" "Aptos2019" "Glaucoma_Fundus" "JSIEC" "Retina" "MESSIDOR2" "OCTID" "PAPILA")

for DATASET_NAME in "${DATASET_NAMES[@]}"; do
    echo "=============================================="
    echo "Processing dataset: $DATASET_NAME"
    echo "=============================================="
    
    python main_finetune_ophthal.py \
        --data_path "${DATA_PATH}" \
        --data_name "${DATASET_NAME}" \
        --finetune "${FINETUNE_CHECKPOINT}" \
        --clip_model_type "ViT-B/32" \
        --batch_size 64 \
        --epochs 50 \
        --lr 1e-4 \
        --weight_decay 0.01 \
        --output_dir "./classification_results/${current_time}" \
        --warmup_epochs 5 \
        --test_num 5
        
    echo "Finished processing dataset: $DATASET_NAME"
    echo ""
done

echo "All datasets processed successfully!"

Systemic Disease Classification

bash scripts/cls_chro.sh

Or with custom parameters:

current_time=$(date +"%Y-%m-%d-%H%M")
FINETUNE_CHECKPOINT="eyeclip_visual.pt"

CUDA_VISIBLE_DEVICES=0 python main_finetune_chro.py \
    --finetune "${FINETUNE_CHECKPOINT}" \
    --clip_model_type "ViT-B/32" \
    --batch_size 64 \
    --epochs 50 \
    --lr 1e-4 \
    --weight_decay 0.01 \
    --output_dir "./classification_results/${current_time}" \
    --warmup_epochs 5 \
    --test_num 5

🧪 Pretraining from Scratch

To pretrain EyeCLIP on your own dataset:

python CLIP_ft_all_1enc_all.py

📚 Scripts and Utilities

Script Purpose
main_finetune_ophthal.py Fine-tuning on ophthalmic disease datasets
main_finetune_chro.py Fine-tuning for systemic (chronic) disease detection
zero_shot.py Zero-shot classification using language prompts
retrieval.py Cross-modal image–text retrieval

📖 Citation

If you use EyeCLIP in your research, please cite:

@article{shi2025multimodal,
  title={A multimodal visual--language foundation model for computational ophthalmology},
  author={Shi, Danli and Zhang, Weiyi and Yang, Jiancheng and Huang, Siyu and Chen, Xiaolan and Xu, Pusheng and Jin, Kai and Lin, Shan and Wei, Jin and Yusufu, Mayinuer and others},
  journal={npj Digital Medicine},
  volume={8},
  number={1},
  pages={381},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

🤝 Acknowledgements

This project builds upon prior open-source contributions, especially:

  • CLIP – Contrastive Language–Image Pretraining by OpenAI
  • MAE – Masked Autoencoders by Facebook AI Research

We thank the open-source community and the medical imaging research ecosystem for their invaluable contributions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published