Skip to content
View feymanpriv's full-sized avatar

Block or report feymanpriv

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SAM with text prompt

Python 2,452 284 Updated Aug 28, 2025

[NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward

Python 26 1 Updated Sep 19, 2025

AgentFlow: In-the-Flow Agentic System Optimization

Python 1,206 145 Updated Nov 5, 2025

将SmolVLM2的视觉头与Qwen3-0.6B模型进行了拼接微调

Python 423 44 Updated Sep 8, 2025

OpenAGI: When LLM Meets Domain Experts

Python 2,205 200 Updated Nov 28, 2024

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,095 37 Updated Oct 4, 2025

MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.

Python 347 17 Updated Aug 26, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 81,967 9,167 Updated Nov 9, 2025

A powerful tool for creating fine-tuning datasets for LLM

JavaScript 11,720 1,133 Updated Nov 8, 2025

MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka

Python 320 10 Updated Jun 21, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,441 177 Updated Mar 28, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,330 540 Updated Nov 8, 2025
Python 4,377 418 Updated Sep 14, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,142 1,288 Updated Oct 27, 2025

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]

Python 231 20 Updated Mar 23, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,199 1,664 Updated Sep 24, 2025

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,960 131 Updated Nov 7, 2025
Python 142 13 Updated May 23, 2024

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,471 543 Updated May 18, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,081 1,850 Updated Nov 10, 2025

Large World Model -- Modeling Text and Video with Millions Context

Python 7,366 560 Updated Oct 19, 2024

VideoSys: An easy and efficient system for video generation

Python 2,005 132 Updated Aug 27, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 27,804 2,760 Updated Apr 30, 2025

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

Python 2,266 140 Updated Jul 15, 2025

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Python 760 58 Updated Feb 1, 2024

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Python 4,969 381 Updated Nov 9, 2025

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Python 942 48 Updated Oct 16, 2024
Python 714 47 Updated Mar 6, 2024
Python 629 33 Updated Feb 15, 2024

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 397 37 Updated May 8, 2025
Next