Stars
Separate mask2former from framework
【IEEE TPAMI 2025】Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding
[ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
A collection of resources on personalized image generation.
A manually annotated test set for DeepLesion with 3D lesion boxes
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
A general fine-tuning kit geared toward diffusion models.
ControlLoRA Version 3: LoRA Is All You Need to Control the Spatial Information of Stable Diffusion.
[CVPR 2023] Label-Free Liver Tumor Segmentation
Discover the repository for "ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting," a pioneering study that has been accepted for presentation at CVPR 2024.
ControlLoRA: A Lightweight Neural Network To Control Stable Diffusion Spatial Information
Diffusion Models in Medical Imaging (Published in Medical Image Analysis Journal)
Improved tumor synthesis leveraging radiology reports as prompts for diffusion models.
[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
[IEEE Transactions on Medical Imaging/TMI 2023] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Code for the paper "Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation", ECCV 2024
📚 A collection of papers about Referring Image Segmentation.
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.
A repo for Masked Image Modeling for 3D Medical Images
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
Collection of AWESOME vision-language models for vision tasks
Tracking and collecting papers/projects/others related to Segment Anything.
A curated list of foundation models for vision and language tasks in medical imaging