yunlong10

🕹️

Focusing

Yolo Y. Tang yunlong10

🕹️

Focusing

Ph.D. Student @ UR CS

125 followers · 53 following

Achievements

Highlights

Stars

WikiChao / DRIFT

Fine Tuning MLLMs with Reasoning Priors from DeepSeekR1

Python 6 Updated Oct 29, 2025

NVlabs / OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 524 45 Updated Oct 29, 2025

nttmdlab-nlp / SlideVQA

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)

Python 100 8 Updated Mar 31, 2025

MoonshotAI / Kimi-Linear

1,159 51 Updated Oct 31, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,887 159 Updated Oct 9, 2025

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 20,396 1,654 Updated Oct 25, 2025

mit-han-lab / streaming-vlm

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 695 42 Updated Oct 15, 2025

HowieHwong / Agentic-Guardian

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Python 33 1 Updated Oct 26, 2025

yunlong10 / Awesome-Video-LMM-Post-Training

🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training

Python 161 9 Updated Oct 28, 2025

Espere-1119-Song / VideoNSA

VideoNSA: Native Sparse Attention Scales Video Understanding

Python 60 1 Updated Nov 1, 2025

NJU-3DV / SpatialVID

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Python 410 13 Updated Nov 5, 2025

zhaochen0110 / Awesome_Think_With_Images

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,121 36 Updated Oct 4, 2025

NIneeeeeem / LangDC

[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.

Python 13 Updated Sep 7, 2025

zhang9302002 / ThinkingWithVideos

The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

Python 64 Updated Oct 15, 2025

pipixin321 / Awesome-Video-MLLMs

🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹

52 1 Updated Sep 1, 2025

NVlabs / Fast-dLLM

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 667 53 Updated Oct 23, 2025

oumi-ai / oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

Python 8,609 659 Updated Nov 13, 2025

Kr-Panghu / LayerT2V-public

Repository for PrePrint: "LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation"

3 Updated Aug 7, 2025

saicaca / fuwari

✨A static blog template built with Astro.

Astro 3,520 939 Updated Nov 6, 2025

ki-lw / Awesome-MLLMs-for-Video-Temporal-Grounding

Latest Papers, Codes and Datasets on VTG-LLMs.

52 1 Updated Oct 14, 2025

ByteDance-Seed / m3-agent

Python 1,091 96 Updated Oct 22, 2025

bytedance / video-SALMONN-2

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…

Python 112 8 Updated Oct 21, 2025

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,830 299 Updated Nov 14, 2025

facebookresearch / dinov3

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,278 565 Updated Nov 3, 2025

TencentARC / ToonComposer

Streamlining Cartoon Production with Generative Post-Keyframing

Python 466 41 Updated Aug 20, 2025

wonderunit / storyboarder

✏️ Storyboarder makes it easy to visualize a story as fast you can draw stick figures.

JavaScript 3,493 342 Updated Mar 17, 2024

letta-ai / sleep-time-compute

accompanying material for sleep-time compute paper

Python 117 13 Updated Apr 30, 2025

openai / harmony

Renderer for the harmony response format to be used with gpt-oss

Rust 4,002 225 Updated Nov 5, 2025

TencentARC / ARC-Hunyuan-Video-7B

Structured Video Comprehension of Real-World Shorts

Python 216 8 Updated Sep 21, 2025

facebookresearch / PhysicsLM4

Physics of Language Models, Part 4

HTML 257 13 Updated Jul 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yolo Y. Tang yunlong10

Achievements

Achievements

Highlights

Block or report yunlong10

Stars

WikiChao / DRIFT

NVlabs / OmniVinci

nttmdlab-nlp / SlideVQA

MoonshotAI / Kimi-Linear

QwenLM / Qwen3-Omni

deepseek-ai / DeepSeek-OCR

mit-han-lab / streaming-vlm

HowieHwong / Agentic-Guardian

yunlong10 / Awesome-Video-LMM-Post-Training

Espere-1119-Song / VideoNSA

NJU-3DV / SpatialVID

zhaochen0110 / Awesome_Think_With_Images

NIneeeeeem / LangDC

zhang9302002 / ThinkingWithVideos

pipixin321 / Awesome-Video-MLLMs

NVlabs / Fast-dLLM

oumi-ai / oumi

Kr-Panghu / LayerT2V-public

saicaca / fuwari

ki-lw / Awesome-MLLMs-for-Video-Temporal-Grounding

ByteDance-Seed / m3-agent

bytedance / video-SALMONN-2

fla-org / flash-linear-attention

facebookresearch / dinov3

TencentARC / ToonComposer

wonderunit / storyboarder

letta-ai / sleep-time-compute

openai / harmony

TencentARC / ARC-Hunyuan-Video-7B

facebookresearch / PhysicsLM4