Skip to content
View yunlong10's full-sized avatar
🕹️
Focusing
🕹️
Focusing

Highlights

  • Pro

Block or report yunlong10

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fine Tuning MLLMs with Reasoning Priors from DeepSeekR1

Python 6 Updated Oct 29, 2025

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 524 45 Updated Oct 29, 2025

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)

Python 100 8 Updated Mar 31, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,887 159 Updated Oct 9, 2025

Contexts Optical Compression

Python 20,396 1,654 Updated Oct 25, 2025

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 695 42 Updated Oct 15, 2025

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Python 33 1 Updated Oct 26, 2025

🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training

Python 161 9 Updated Oct 28, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding

Python 60 1 Updated Nov 1, 2025

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Python 410 13 Updated Nov 5, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,121 36 Updated Oct 4, 2025

[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.

Python 13 Updated Sep 7, 2025

The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

Python 64 Updated Oct 15, 2025

🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹

52 1 Updated Sep 1, 2025

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 667 53 Updated Oct 23, 2025

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

Python 8,609 659 Updated Nov 13, 2025

Repository for PrePrint: "LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation"

3 Updated Aug 7, 2025

✨A static blog template built with Astro.

Astro 3,520 939 Updated Nov 6, 2025

Latest Papers, Codes and Datasets on VTG-LLMs.

52 1 Updated Oct 14, 2025
Python 1,091 96 Updated Oct 22, 2025

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…

Python 112 8 Updated Oct 21, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,830 299 Updated Nov 14, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,278 565 Updated Nov 3, 2025

Streamlining Cartoon Production with Generative Post-Keyframing

Python 466 41 Updated Aug 20, 2025

✏️ Storyboarder makes it easy to visualize a story as fast you can draw stick figures.

JavaScript 3,493 342 Updated Mar 17, 2024

accompanying material for sleep-time compute paper

Python 117 13 Updated Apr 30, 2025

Renderer for the harmony response format to be used with gpt-oss

Rust 4,002 225 Updated Nov 5, 2025

Structured Video Comprehension of Real-World Shorts

Python 216 8 Updated Sep 21, 2025

Physics of Language Models, Part 4

HTML 257 13 Updated Jul 29, 2025
Next