Stars
Tongyi Deep Research, the Leading Open-source Deep Research Agent
U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking
[ICML 2025] MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Automated Quality Control and visual reports for Quality Assessment of structural (T1w, T2w) and functional MRI of the brain
Medical imaging processing for AI applications.
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Learning to Use Medical Tools with Multi-modal Agent
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
[NAACL 2025] VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
The official repository of paper named 'A Refer-and-Ground Multimodal Large Language Model for Biomedicine'
BiomedParse: A Foundation Model for Joint Segmentation, Detection, and Recognition of Biomedical Objects Across Nine Modalities
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Solve Visual Understanding with Reinforced VLMs
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
This is the repository of Quality Sentinel, a label quality evaluation model for medical image segmentation.
Weixiang-Sun / samexporter_all
Forked from vietanhdev/samexporterExport Segment Anything Models to ONNX
