Skip to content

Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny conditional decoder.

License

Notifications You must be signed in to change notification settings

PeterTPE/FlashVSR_plus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡ FlashVSR+

Optimized inference pipeline based on FlashVSR project

Authors: Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue

Modified: lihaoyun6

     

Your star means a lot for us to develop this project!


🤔 What's New?

  • Replaced Block-Sparse-Attention with Sparse_SageAttention to avoid building complex cuda kernels.
  • With the new tile_dit method, you can even output 1080P video on 8GB of VRAM.
  • Support copying audio tracks to output files (powered by FFmpeg).
  • Introduced Blackwell GPU support for FlashVSR.

🚀 Getting Started

Follow these steps to set up and run FlashVSR on your local machine:

⚠️ Note: This project is primarily designed and optimized for 4× video super-resolution.
We strongly recommend using the 4× SR setting to achieve better results and stability. ✅

1️⃣ Clone the Repository

git clone https://github.com/lihaoyun6/FlashVSR_plus
cd FlashVSR_plus

2️⃣ Set Up the Python Environment

Create and activate the environment (Python 3.11.13):

conda create -n flashvsr python=3.11.13
conda activate flashvsr

Install project dependencies:

pip install -r requirements.txt

3️⃣ Download Model Weights

  • When you run FlashVSR+ for the first time, it will automatically download all required models from HuggingFace.

  • You can also manually download all files from FlashVSR and put them in the following location:

./models/FlashVSR/
│
├── LQ_proj_in.ckpt                                   
├── TCDecoder.ckpt                                    
├── Wan2.1_VAE.pth                                    
├── diffusion_pytorch_model_streaming_dmd.safetensors 
└── README.md

4️⃣ Run Inference

For example:

python run.py -i ./inputs/example0.mp4 -s 4 ./

Or you can run:

python run.py -h

usage: run.py [-h] [-i INPUT] [-s SCALE] [-m {tiny,full}] [--tiled-vae] [--tiled-dit] [--tile-size TILE_SIZE]
              [--overlap OVERLAP] [--unload-dit] [--color-fix] [--seed SEED] [-t {fp16,bf16}] [-d DEVICE]
              output_folder

FlashVSR+: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution.

positional arguments:
  output_folder         Path to save output video

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to video file or folder of images
  -s SCALE, --scale SCALE
                        Upscale factor, default=4
  -m {tiny,full}, --mode {tiny,full}
                        The type of pipeline to use, default=tiny
  --tiled-vae           Enable tile decoding
  --tiled-dit           Enable tile inference
  --tile-size TILE_SIZE
                        Chunk size of tile inference, default=256
  --overlap OVERLAP     Overlap size of tile inference, default=24
  --unload-dit          Unload DiT before decoding
  --color-fix           Correct output video color
  --seed SEED           Random Seed, default=0
  -t {fp16,bf16}, --dtype {fp16,bf16}
                        Data type for processing, default=bf16
  -d DEVICE, --device DEVICE
                        Device to run FlashVSR

🤗 Feedback & Support

We welcome feedback and issues. Thank you for trying FlashVSR+


📄 Acknowledgments

We gratefully acknowledge the following open-source projects:


📞 Contact


📜 Citation

@misc{zhuang2025flashvsrrealtimediffusionbasedstreaming,
      title={FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution}, 
      author={Junhao Zhuang and Shi Guo and Xin Cai and Xiaohui Li and Yihao Liu and Chun Yuan and Tianfan Xue},
      year={2025},
      eprint={2510.12747},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.12747}, 
}

About

Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny conditional decoder.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%