Stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Solve Visual Understanding with Reinforced VLMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Medical imaging processing for AI applications.
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks
BiomedParse: A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Automated Quality Control and visual reports for Quality Assessment of structural (T1w, T2w) and functional MRI of the brain
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
Learning to Use Medical Tools with Multi-modal Agent
Implementation of Medfusion - A latent diffusion model for medical image synthesis.
Unofficial code for VPT(Visual Prompt Tuning) paper of arxiv 2203.12119
U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking
[ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
Chinese clinical named entity recognition using pre-trained BERT model
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
The official repository of paper named 'A Refer-and-Ground Multimodal Large Language Model for Biomedicine'
Small tool using selenium to get a temporary API endpoint for the ChatGPT Image Input / image recognition feature. Very quickly made, you should not rely on this on prod.
[NAACL 2025] VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
This is the repository of Quality Sentinel, a label quality evaluation model for medical image segmentation.
Weixiang-Sun / samexporter_all
Forked from vietanhdev/samexporterExport Segment Anything Models to ONNX
