Skip to content

[v.0.6] FlashVSR - Video (and Image) Upscaler. [Runs on 12GB vram, 32GB ram] Diffusion-Based Streaming Video Super-Resolution

License

Notifications You must be signed in to change notification settings

ai-anchorite/FlashVSR_plus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo has a Pinokio 1-click-installer available here:

Installing outside of Pinokio will require ffmepg on PATH and self-installed pytorch. Torch install info can be seen here:

Forked from: lihaoyun6/FlashVSR_plus

Original Project: OpenImagingLab/FlashVSR

FlashVSR: Efficient & High-Quality Video Super-Resolution

A user-friendly fork of FlashVSR, enhanced and packaged for the Pinokio community. This version is optimized for consumer-grade hardware, enabling users to access powerful video and image upscaling without the demanding VRAM requirements of the original project.

Tab Screenshots
flashvsr_video_screen flashvsr_img_screen flashvsr_toolbox_screen

Project Background

FlashVSR was generously released via OpenImagingLab to the open-source community. Their team's README is detailed below!

This project builds upon this excellent fork lihaoyun6/FlashVSR_plus, which introduced several key optimizations to the original FlashVSR project

Features from Upstream Fork

The FlashVSR_plus fork laid the groundwork with several notable enhancements, including:

  • Replaced Block-Sparse-Attention with Sparse_SageAttention.
  • Added DiT tiling and other memory optimizations to significantly reduce VRAM requirements.
  • Implemented the initial Gradio user interface.

Enhancements in This Version

This fork further refines the user experience and expands functionality with a focus on quality-of-life improvements and adds several new tools.

  • Enhanced Gradio UI: The interface has been redesigned for a more intuitive workflow, including dedicated tabs for additional tasks.
  • Improved Memory Management and optimizations and internal fixes to ensure smooth operation on consumer hardware.
  • Chunked Video Processing: Easily upscale longer videos without running into memory limitations.
  • Image Upscaling: A new feature that brings the power of FlashVSR to still images.
  • Post-Processing Toolbox: A suite of useful post-processing tools for RIFE frame interpolation, seamless video looping, and extra compression/export options.

Original Project Details below:

⚡ FlashVSR

Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

Authors: Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue

       

Your star means a lot for us to develop this project!


🌟 Abstract

Diffusion models have recently advanced video restoration, but applying them to real-world video super-resolution (VSR) remains challenging due to high latency, prohibitive computation, and poor generalization to ultra-high resolutions. Our goal in this work is to make diffusion-based VSR practical by achieving efficiency, scalability, and real-time performance. To this end, we propose FlashVSR, the first diffusion-based one-step streaming framework towards real-time VSR. FlashVSR runs at ∼17 FPS for 768 × 1408 videos on a single A100 GPU by combining three complementary innovations: (i) a train-friendly three-stage distillation pipeline that enables streaming super-resolution, (ii) locality-constrained sparse attention that cuts redundant computation while bridging the train–test resolution gap, and (iii) a tiny conditional decoder that accelerates reconstruction without sacrificing quality. To support large-scale training, we also construct VSR-120K, a new dataset with 120k videos and 180k images. Extensive experiments show that FlashVSR scales reliably to ultra-high resolutions and achieves state-of-the-art performance with up to ∼12× speedup over prior one-step diffusion VSR models.


🛠️ Method

The overview of FlashVSR. This framework features:

  • Three-Stage Distillation Pipeline for streaming VSR training.
  • Locality-Constrained Sparse Attention to cut redundant computation and bridge the train–test resolution gap.
  • Tiny Conditional Decoder for efficient, high-quality reconstruction.
  • VSR-120K Dataset consisting of 120k videos and 180k images, supports joint training on both images and videos.


🤗 Feedback & Support

We welcome feedback and issues. Thank you for trying FlashVSR!


📄 Acknowledgments

We gratefully acknowledge the following open-source projects:


📞 Contact


📜 Citation

@misc{zhuang2025flashvsrrealtimediffusionbasedstreaming,
      title={FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution}, 
      author={Junhao Zhuang and Shi Guo and Xin Cai and Xiaohui Li and Yihao Liu and Chun Yuan and Tianfan Xue},
      year={2025},
      eprint={2510.12747},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.12747}, 
}

About

[v.0.6] FlashVSR - Video (and Image) Upscaler. [Runs on 12GB vram, 32GB ram] Diffusion-Based Streaming Video Super-Resolution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%