Optimized inference pipeline based on FlashVSR project
Authors: Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue
Modified: lihaoyun6
Your star means a lot for us to develop this project! ⭐
- Replaced
Block-Sparse-AttentionwithSparse_SageAttentionto avoid building complex cuda kernels. - With the new
tile_ditmethod, you can even output 1080P video on 8GB of VRAM. - Support copying audio tracks to output files (powered by FFmpeg).
- Introduced Blackwell GPU support for FlashVSR.
Follow these steps to set up and run FlashVSR on your local machine:
⚠️ Note: This project is primarily designed and optimized for 4× video super-resolution.
We strongly recommend using the 4× SR setting to achieve better results and stability. ✅
git clone https://github.com/lihaoyun6/FlashVSR_plus
cd FlashVSR_plusCreate and activate the environment (Python 3.11.13):
conda create -n flashvsr python=3.11.13
conda activate flashvsrInstall project dependencies:
pip install -r requirements.txt-
When you run FlashVSR+ for the first time, it will automatically download all required models from HuggingFace.
-
You can also manually download all files from FlashVSR and put them in the following location:
./models/FlashVSR/
│
├── LQ_proj_in.ckpt
├── TCDecoder.ckpt
├── Wan2.1_VAE.pth
├── diffusion_pytorch_model_streaming_dmd.safetensors
└── README.md
For example:
python run.py -i ./inputs/example0.mp4 -s 4 ./Or you can run:
python run.py -h
usage: run.py [-h] [-i INPUT] [-s SCALE] [-m {tiny,full}] [--tiled-vae] [--tiled-dit] [--tile-size TILE_SIZE]
[--overlap OVERLAP] [--unload-dit] [--color-fix] [--seed SEED] [-t {fp16,bf16}] [-d DEVICE]
output_folder
FlashVSR+: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution.
positional arguments:
output_folder Path to save output video
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Path to video file or folder of images
-s SCALE, --scale SCALE
Upscale factor, default=4
-m {tiny,full}, --mode {tiny,full}
The type of pipeline to use, default=tiny
--tiled-vae Enable tile decoding
--tiled-dit Enable tile inference
--tile-size TILE_SIZE
Chunk size of tile inference, default=256
--overlap OVERLAP Overlap size of tile inference, default=24
--unload-dit Unload DiT before decoding
--color-fix Correct output video color
--seed SEED Random Seed, default=0
-t {fp16,bf16}, --dtype {fp16,bf16}
Data type for processing, default=bf16
-d DEVICE, --device DEVICE
Device to run FlashVSRWe welcome feedback and issues. Thank you for trying FlashVSR+
We gratefully acknowledge the following open-source projects:
- FlashVSR — https://github.com/OpenImagingLab/FlashVSR
- DiffSynth Studio — https://github.com/modelscope/DiffSynth-Studio
- Sparse_SageAttention — https://github.com/jt-zhang/Sparse_SageAttention_API
- taehv — https://github.com/madebyollin/taehv
- Junhao Zhuang Email: [email protected]
@misc{zhuang2025flashvsrrealtimediffusionbasedstreaming,
title={FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution},
author={Junhao Zhuang and Shi Guo and Xin Cai and Xiaohui Li and Yihao Liu and Chun Yuan and Tianfan Xue},
year={2025},
eprint={2510.12747},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.12747},
}