Skip to content

yjsunnn/DLoRAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution

Yujing Sun1,2, * | Lingchen Sun1,2, * | Shuaizheng Liu1,2 | Rongyuan Wu1,2 | Zhengqiang Zhang1,2 | Lei Zhang1,2

1The Hong Kong Polytechnic University, 2OPPO Research Institute

📍 NeurIPS 2025

Visual Results

⏰ Update

  • 2025.10.16: We update an improved version of DLoRAL. Thanks @Feynman1999 for the bug fixes!
  • 2025.09.18: DLoRAL is accepted by NIPS2025 🎉
  • 2025.07.14: Colab demo is available. ✨ No local GPU or setup needed - just upload and enhance!
  • 2025.07.08: The inference code and pretrained weights are available.
  • 2025.06.24: The project page is available, including a brief 2-minute explanation video, more visual results and relevant researches.
  • 2025.06.17: The repo is released.

⭐ If DLoRAL is helpful to your videos or projects, please help star this repo. Thanks! 🤗

😊 You may also want to check our relevant works:

  1. OSEDiff (NIPS2024) Paper | Code

    Real-time Image SR algorithm that has been applied to the OPPO Find X8 series.

  2. PiSA-SR (CVPR2025) Paper | Code

    Pioneering exploration of Dual-LoRA paradigm in Image SR.

  3. TVT-SR (ICCV2025) Paper | Code

    A compact VAE and compute-efficient UNet able to handle fine-grained structures.

  4. Awesome Diffusion Models for Video Super-Resolution Repo

    A curated list of resources for Video Super-Resolution (VSR) using Diffusion Models.

👀 TODO

  • Release inference code.
  • Colab demo for convenient test.
  • Release training code.
  • Release training data.

🌟 Overview Framework

DLoRAL Framework

Training: A dynamic dual-stage training scheme alternates between optimizing temporal coherence (consistency stage) and refining high-frequency spatial details (enhancement stage) with smooth loss interpolation to ensure stability.

Inference: During inference, both C-LoRA and D-LoRA are merged into the frozen diffusion UNet, enabling one-step enhancement of low-quality inputs into high-quality outputs.

🔧 Dependencies and Installation

  1. Clone repo

    git clone https://github.com/yjsunnn/DLoRAL.git
    cd DLoRAL
  2. Install dependent packages

    conda create -n DLoRAL python=3.10 -y
    conda activate DLoRAL
    pip install -r requirements.txt
    # mim install mmedit and mmcv
    pip install openmim
    mim install mmcv-full
    pip install mmedit
  3. Download Models

Dependent Models

  • RAM --> put into /path/to/DLoRAL/preset/models/ram_swin_large_14m.pth
  • DAPE --> put into /path/to/DLoRAL/preset/models/DAPE.pth
  • Pretrained Weights --> put into /path/to/DLoRAL/preset/models/checkpoints/model.pkl
    • If your goal is to reproduce the results from the paper, we recommend using this version of the weights instead.

Each path can be modified according to its own requirements, and the corresponding changes should also be applied to the command line and the code.

🖼️ Quick Inference

For Real-World Video Super-Resolution:

python src/test_DLoRAL.py     \
--pretrained_model_path stabilityai/stable-diffusion-2-1-base     \
--ram_ft_path /path/to/DLoRAL/preset/models/DAPE.pth     \
--ram_path '/path/to/DLoRAL/preset/models/ram_swin_large_14m.pth'     \
--merge_and_unload_lora False     \
--process_size 512     \
--pretrained_model_name_or_path stabilityai/stable-diffusion-2-1-base     \
--vae_encoder_tiled_size 4096     \
--load_cfr     \
--pretrained_path /path/to/DLoRAL/preset/models/checkpoints/model.pkl     \
--stages 1     \
-i /path/to/input_videos/     \
-o /path/to/results

⚙️ Training

For Real-World Video Super-Resolution:

bash train_scripts.sh

Some key parameters and corresponding meaning:

Param Description Example Value
--quality_iter Number of steps for the initial switch from consistency to quality stage 5000
--quality_iter_1_final Number of steps required to switch from the quality stage to the consistency stage 13000
--quality_iter_2 Relative number of steps after quality_iter_1_final to switch back to the quality stage (actual switch happens at quality_iter_1_final + quality_iter_2) 5000
--lsdir_txt_path Dataset path for the first stage "/path/to/your/dataset"
--pexel_txt_path Dataset path for the second stage "/path/to/your/dataset"

💬 Contact:

If you have any problem (not only about DLoRAL, but also problems regarding to burst/video super-resolution), please feel free to contact me at [email protected]

Citations

If our code helps your research or work, please consider citing our paper. The following are BibTeX references:

@misc{sun2025onestepdiffusiondetailrichtemporally,
      title={One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution}, 
      author={Yujing Sun and Lingchen Sun and Shuaizheng Liu and Rongyuan Wu and Zhengqiang Zhang and Lei Zhang},
      year={2025},
      eprint={2506.15591},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.15591}, 
}

About

[NeurIPS'25] One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published