-
🤡 School
- 🤡 Gotham
- clownrat6.github.io
- @clownrat66
Highlights
- Pro
Lists (12)
Sort Name ascending (A-Z)
Stars
[NeurIPS'25] One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution
Efficient Triton Kernels for LLM Training
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
Unified automatic quality assessment for speech, music, and sound.
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址:https://github.com/MoyuScript/bilibili-api
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
A paper list of some recent works about Token Compress for Vit and VLM
Kimi K2 is the large language model series developed by Moonshot AI team
[CVPR 2024 Highlight] Official GraCo: Granularity-Controllable Interactive Segmentation.
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
[MathCoder, MathCoder-VL] Family of LLMs/LMMs for mathematical reasoning.
Official repository for "AM-RADIO: Reduce All Domains Into One"
DELTA: Dense Efficient Long-range 3D Tracking for Any video (ICLR 2025)
[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description
Official repository of "Event-based Video Frame Interpolation with Cross-Modal Asymmetric Bidirectional Motion Fields", CVPR 2023 paper(highlight)
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels



