This repo has a Pinokio 1-click-installer available here:
Installing outside of Pinokio will require ffmepg on PATH and self-installed pytorch. Torch install info can be seen here:
Forked from: lihaoyun6/FlashVSR_plus
Original Project: OpenImagingLab/FlashVSR
A user-friendly fork of FlashVSR, enhanced and packaged for the Pinokio community. This version is optimized for consumer-grade hardware, enabling users to access powerful video and image upscaling without the demanding VRAM requirements of the original project.
|
|
|
FlashVSR was generously released via OpenImagingLab to the open-source community. Their team's README is detailed below!
This project builds upon this excellent fork lihaoyun6/FlashVSR_plus, which introduced several key optimizations to the original FlashVSR project
The FlashVSR_plus fork laid the groundwork with several notable enhancements, including:
- Replaced Block-Sparse-Attention with Sparse_SageAttention.
- Added DiT tiling and other memory optimizations to significantly reduce VRAM requirements.
- Implemented the initial Gradio user interface.
This fork further refines the user experience and expands functionality with a focus on quality-of-life improvements and adds several new tools.
- Enhanced Gradio UI: The interface has been redesigned for a more intuitive workflow, including dedicated tabs for additional tasks.
- Improved Memory Management and optimizations and internal fixes to ensure smooth operation on consumer hardware.
- Chunked Video Processing: Easily upscale longer videos without running into memory limitations.
- Image Upscaling: A new feature that brings the power of FlashVSR to still images.
- Post-Processing Toolbox: A suite of useful post-processing tools for RIFE frame interpolation, seamless video looping, and extra compression/export options.
Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Authors: Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue
Your star means a lot for us to develop this project! ⭐
Diffusion models have recently advanced video restoration, but applying them to real-world video super-resolution (VSR) remains challenging due to high latency, prohibitive computation, and poor generalization to ultra-high resolutions. Our goal in this work is to make diffusion-based VSR practical by achieving efficiency, scalability, and real-time performance. To this end, we propose FlashVSR, the first diffusion-based one-step streaming framework towards real-time VSR. FlashVSR runs at ∼17 FPS for 768 × 1408 videos on a single A100 GPU by combining three complementary innovations: (i) a train-friendly three-stage distillation pipeline that enables streaming super-resolution, (ii) locality-constrained sparse attention that cuts redundant computation while bridging the train–test resolution gap, and (iii) a tiny conditional decoder that accelerates reconstruction without sacrificing quality. To support large-scale training, we also construct VSR-120K, a new dataset with 120k videos and 180k images. Extensive experiments show that FlashVSR scales reliably to ultra-high resolutions and achieves state-of-the-art performance with up to ∼12× speedup over prior one-step diffusion VSR models.
The overview of FlashVSR. This framework features:
- Three-Stage Distillation Pipeline for streaming VSR training.
- Locality-Constrained Sparse Attention to cut redundant computation and bridge the train–test resolution gap.
- Tiny Conditional Decoder for efficient, high-quality reconstruction.
- VSR-120K Dataset consisting of 120k videos and 180k images, supports joint training on both images and videos.
We welcome feedback and issues. Thank you for trying FlashVSR!
We gratefully acknowledge the following open-source projects:
- DiffSynth Studio — https://github.com/modelscope/DiffSynth-Studio
- Block-Sparse-Attention — https://github.com/mit-han-lab/Block-Sparse-Attention
- taehv — https://github.com/madebyollin/taehv
- Junhao Zhuang Email: [email protected]
@misc{zhuang2025flashvsrrealtimediffusionbasedstreaming,
title={FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution},
author={Junhao Zhuang and Shi Guo and Xin Cai and Xiaohui Li and Yihao Liu and Chun Yuan and Tianfan Xue},
year={2025},
eprint={2510.12747},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.12747},
}
