Real-ESRGAN: Synthetic Data Super-Resolution
Real-ESRGAN: Synthetic Data Super-Resolution
Abstract 1. Introduction
Though many attempts have been made in blind super-
resolution to restore low-resolution images with unknown Single image super-resolution (SR) [12, 9, 26] is an ac-
and complex degradations, they are still far from addressing tive research topic, which aims at reconstructing a high-
general real-world degraded images. In this work, we ex- resolution (HR) image from its low-resolution (LR) coun-
tend the powerful ESRGAN to a practical restoration appli- terpart. Since the pioneering work of SRCNN [8], deep con-
cation (namely, Real-ESRGAN), which is trained with pure volution neural network (CNN) approaches have brought
synthetic data. Specifically, a high-order degradation mod- prosperous developments in the SR field. However, most
eling process is introduced to better simulate complex real- approaches [20, 26, 19, 24, 48] assume an ideal bicubic
world degradations. We also consider the common ringing downsampling kernel, which is different from real degrada-
and overshoot artifacts in the synthesis process. In addition, tions. This degradation mismatch makes those approaches
we employ a U-Net discriminator with spectral normaliza- unpractical in real-world scenarios.
tion to increase discriminator capability and stabilize the Blind super-resolution [34, 2, 54], on the contrary, aims
training dynamics. Extensive comparisons have shown its to restore low-resolution images suffering from unknown
superior visual performance than prior works on various and complex degradations. Existing approaches can be
real datasets. We also provide efficient implementations to roughly categorized into explicit modeling and implicit
synthesize training pairs on the fly. modeling, according to the underlying degradation process.
Classical degradation model [10, 28], which consists of
blur, downsampling, noise and JPEG compression (more
*Liangbin Xie is an intern in Applied Research Center, Tencent PCG details in Sec. 3.1), is widely adopted in explicit model-
1905
ing methods [54, 15, 33]. However, the real-world degrada- easily train our Real-ESRGAN and achieve a good balance
tions are usually too complex to be modeled with a simple of local detail enhancement and artifact suppression.
combination of multiple degradations. Thus, these methods To summarize, in this work, 1) we propose a high-order
will easily fail in real-world samples. Implicit modeling degradation process to model practical degradations, and
methods [52, 11, 44] utilize data distribution learning with utilize sinc filters to model common ringing and overshoot
Generative Adversarial Network (GAN) [13] to obtain the artifacts. 2) We employ several essential modifications
degradation model. Yet, they are limited to the degradations (e.g., U-Net discriminator with spectral normalization) to
within training datasets, and could not generalize well to increase discriminator capability and stabilize the training
out-of-distribution images. Readers are encouraged to refer dynamics. 3) Real-ESRGAN trained with pure synthetic
to a recent blind SR survey [27] for a more comprehensive data is able to restore most real-world images and achieve
taxonomy. better visual performance than previous works, making it
In this work, we aim to extend the powerful ESR- more practical in real-world applications.
GAN [48] to restore general real-world LR images by
synthesizing training pairs with a more practical degrada- 2. Related Work
tion process. The real complex degradations usually come
from complicate combinations of different degradation pro- The image super-resolution field [20, 23, 43, 16, 24, 26,
cesses, such as imaging system of cameras, image editing, 56, 21, 42, 55, 7, 29] has witnessed a variety of develop-
and Internet transmission. For example, when we take a ments since SRCNN [8, 9]. To achieve visually-pleasing
photo with our cellphones, the photos may have several results, generative adversarial network [14] is usually em-
degradations, such as camera blur, sensor noise, sharpening ployed as loss supervisions to push the solutions closer to
artifacts, and JPEG compression. We then do some editing the natural manifold [25, 38, 48, 47]. Most methods assume
and upload to a social media app, which introduces further a bicubic downsampling kernel and usually fail in real im-
compression and unpredictable noises. The above process ages. Recent works also incorporate reinforcement learning
becomes more complicated when the image is shared sev- or GAN prior to image restoration [51, 6, 45].
eral times on the Internet. There have been several excellent explorations in blind
This motivates us to extend the classical “first-order” SR. The first category involves explicit degradation repre-
degradation model to “high-order” degradation model- sentations and typically consists of two components: degra-
ing for real-world degradations, i.e., the degradations are dation prediction and conditional restoration. The above
modeled with several repeated degradation processes, each two components are performed either separately [2, 54] or
process being the classical degradation model. Empiri- jointly (iteratively) [15, 33, 44]. These approaches rely
cally, we adopt a second-order degradation process for on predefined degradation representations (e.g., degrada-
a good balance between simplicity and effectiveness. A tion types and levels), and usually consider simple synthetic
recent work [53] also proposes a random shuffling strat- degradations. Moreover, inaccurate degradation estimations
egy to synthesize more practical degradations. However, it will inevitably result in artifacts.
still involves a fixed number of degradation processes, and Another category is to obtain/generate training pairs as
whether all the shuffled degradations are useful or not is close to real data as possible, and then train a unified net-
unclear. Instead, high-order degradation modeling is more work to address blind SR. The training pairs are usually 1)
flexible and attempts to mimic the real degradation genera- captured with specific cameras followed by tedious align-
tion process. We further incorporate sinc filters in the syn- ments [5, 49]; 2) or directly learned from unpaired data with
thesis process to simulate the common ringing and over- cycle consistency loss [52, 32]; 3) or synthesized with es-
shoot artifacts. timated blur kernels and extracted noise patches [58, 18].
As the degradation space is much larger than ESRGAN, However, 1) the captured data is only constrained to degra-
the training also becomes challenging. Specifically, 1) the dations associated with specific cameras, and thus could
discriminator requires a more powerful capability to dis- not well generalize to other real images; 2) learning fine-
criminate realness from complex training outputs, while grained degradations with unpaired data is challenging, and
the gradient feedback from the discriminator needs to be the results are usually unsatisfactory.
more accurate for local detail enhancement. Therefore, we Degradation models. Classical degradation model [10, 28]
improve the VGG-style discriminator in ESRGAN to an is widely adopted in blind SR methods [54, 15, 33]. Yet,
U-Net design [39, 50, 37]. 2) The U-Net structure and real-world degradations are usually too complex to be ex-
complicate degradations also increase the training instabil- plicitly modeled. Thus, implicit modeling attempts to learn
ity. Thus, we employ the spectral normalization (SN) a degradation generation process within networks [52, 11,
regularization [35, 39] to stabilize the training dynamics. 44]. In this work, we propose a flexible high-order degra-
Equipped with the dedicated improvements, we are able to dation model to synthesize more practical degradations.
1906
3. Methodology of the Gaussian distribution. When each channel of RGB
images has independent sampled noise, the synthetic noise
3.1. Classical Degradation Model is color noise. We also synthesize gray noise by employing
Blind SR aims to restore high-resolution images from the same sampled noise to all three channels [53, 36].
low-resolution ones with unknown and complex degrada- Poisson noise follows the Poisson distribution. It is
tions. The classical degradation model [10, 28] is usually usually used to approximately model the sensor noise
adopted to synthesize the low-resolution input. Generally, caused by statistical quantum fluctuations, that is, variation
the ground-truth image y is first convolved with blur ker- in the number of photons sensed at a given exposure level.
nel k. Then, a downsampling operation with scale factor r Poisson noise has an intensity proportional to the image
is performed. The low-resolution x is obtained by adding intensity, and the noises at different pixels are independent.
noise n. Finally, JPEG compression is also adopted, as it is
widely-used in real-world images. Resize (Downsampling). Downsampling is a basic
\label {equ:degradation} \vspace {-0.2cm} \bm {x} = \mathcal {D}(\bm {y}) = [(\bm {y}\circledast \bm {k})\downarrow _{r} + \bm {n}]_{\mathtt {JPEG}}, (1) operation for synthesizing low-resolution images in SR.
More generally, we consider both downsamping and
where D denotes the degradation process. In the following,
upsampling, i.e., the resize operation. There are several
we briefly revisit these commonly-used degradations. The
resize algorithms - nearest-neighbor interpolation, area
detailed settings are specified in Sec. 4.1.
resize, bilinear interpolation, and bicubic interpolation.
Different resize operations bring in different effects - some
Blur. We typically model blur degradation as a con-
produce blurry results while some may output over-sharp
volution with a linear blur filter (kernel). Isotropic and
images with overshoot artifacts.
anisotropic Gaussian filters are common choices. For
In order to include more diverse and complex resize
a Gaussian blur kernel k with a kernel size of 2t + 1,
effects, we consider a random resize operation from
its (i, j) ∈ [−t, t] element is sampled from a Gaussian
the above choices. As nearest-neighbor interpolation
distribution, formally:
introduces the misalignment issue, we exclude it and only
\label {equ:blur} \vspace {-0.4cm} \bm {k}(i,j) &= \frac {1}{N} \exp (-\frac {1}{2}\bm {C}^T\bm {\Sigma }^{-1}\bm {C}), \quad \bm {C}=[i,j]^T, (2) consider the area, bilinear and bicubic operations.
where Σ is the covariance matrix; C is the spatial coor- JPEG compression. JPEG compression is a com-
dinates; N is the normalization constant. The covariance monly used technique of lossy compression for digital
matrix could be further represented as follows: images. It first converts images into the YCbCr color
space and downsamples the chroma channels. Images are
\label {equ:cov_matrix} \vspace {-0.2cm} \bm {\Sigma } &= \bm {R} \begin {bmatrix} \sigma _1^2 & 0 \\ 0 & \sigma _2^2 \end {bmatrix} \bm {R}^T, \quad \text {($\bm {R}$ is the rotation matrix)}\\ &=\begin {bmatrix} cos\theta & -sin\theta \\ sin\theta & cos\theta \end {bmatrix}\begin {bmatrix} \sigma _1^2 & 0 \\ 0 & \sigma _2^2 \end {bmatrix} \begin {bmatrix} cos\theta & sin\theta \\ -sin\theta & cos\theta \end {bmatrix}, then split into 8 × 8 blocks and each block is transformed
with a two-dimensional discrete cosine transform (DCT),
(4) followed by a quantization of DCT coefficients. More
details of JPEG compression algorithms can be found
where σ1 and σ2 are the standard deviation along the two in [41]. Unpleasing block artifacts are usually introduced
principal axes (i.e., eigenvalues of the covariance matrix); by the JPEG compression.
θ is the rotation degree. When σ1 = σ2 , k is an isotropic The quality of compressed images is determined by a
Gaussian blur kernel; otherwise k is an anisotropic kernel. quality factor q ∈ [0, 100], where a lower q indicates a
Discussion. Though Gaussian blur kernels are widely used higher compression ratio and worse quality. We use the Py-
to model blur degradation, they may not well approximate Torch implementation - DiffJPEG [31].
real camera blur. To include more diverse kernel shapes,
3.2. High-order Degradation Model
we further adopt generalized Gaussian blur kernels [30]
and a plateau-shaped distribution. Their probability When we adopt the above classical degradation model
density function (pdf) are N1 exp(− 12 (C T Σ−1 C)β , and to synthesize training pairs, the trained model could indeed
1 1
N 1+(C T Σ−1 C)β , respectively. β is the shape parameter. handle some real samples. However, it still can not resolve
Empirically, we find that including these blur kernels could some complicated degradations in the real world, especially
produce sharper outputs for several real samples. the unknown noises and complex artifacts (see Fig. 3). It is
because that the synthetic low-resolution images still have
Noise. We consider two commonly-used noise types a large gap with realistic degraded images. We thus extend
– 1) additive Gaussian noise and 2) Poisson noise. Addic- the classical degradation model to a high-order degradation
tive Gaussian noise has a probability density function equal process to model more practical degradations.
to that of the Gaussian distribution. The noise intensity The classical degradation model only includes a fixed
is controlled by the standard deviation (i.e., sigma value) number of basic degradations, which can be regarded as
1907
first order
Resize Noise JPEG
Blur (Downsampling) Compression
• (Generalize) • Gaussian noise
Gaussian filter • Resize • JPEG
• Poisson noise
- isotropic - bicubic
- anisotropic - bilinear
• Color noise
• 2D sinc filter - area
• Gray noise
second order
Resize JPEG
Blur (Downsampling) Noise + 2D sinc filter
Figure 2: Overview of the pure synthetic data generation adopted in Real-ESRGAN. It utilizes a second-order degradation
process to model more practical degradations, where each degradation process adopts the classical degradation model. The
detailed choices for blur , resize, noise and JPEG compression are listed. We also employ sinc filter to synthesize common
ringing and overshoot artifacts.
Input Output Input Output Such a complicated deterioration process could not be
modeled with the classical first-order model. Thus, we pro-
pose a high-order degradation model. An n-order model
involves n repeated degradation processes (as shown in
Eq. 5), where each degradation process adopts the classi-
cal degradation model (Eq. 1) with the same procedure but
different hyper-parameters. Note that the “high-order” here
is different from that used in mathematical functions. It
mainly refers to the implementation time of the same opera-
tion. The random shuffling strategy in [53] may also include
repeated degradation processes (e.g., double blur or JPEG).
But we highlight that the high-order degradation process is
the key, indicating that not all the shuffled degradations are
necessary. In order to keep the image resolution in a reason-
able range, the downsampling operation in Eq. 1 is replaced
Figure 3: Models trained with synthetic data of classical with a random resize operation. Empirically, we adopt a
degradation model could resolve some real samples (Left). second-order degradation process, as it could resolve most
Yet, they amplify noises or introduce ringing artifacts for real cases while keeping simplicity. Fig. 2 depicts the over-
complex real-world images (Right). Zoom in for best view all pipeline of our pure synthetic data generation pipeline.
\label {equ:n_degradation} \bm {x} = \mathcal {D}^n(\bm {y}) = (\mathcal {D}_n \circ \cdots \circ \mathcal {D}_2\circ \mathcal {D}_1)(\bm {y}). (5)
a first-order modeling. However, the real-life degradation
processes are quite diverse, and usually comprise a series It is worth noting that the improved high-order degradation
of procedures including imaging system of cameras, image process is not perfect and could not cover the whole degra-
editing, Internet transmission, etc. For instance, when we dation space in the real world. Instead, it merely extends the
want to restore a low-quality image download from the In- solvable degradation boundary of previous blind SR meth-
ternet, its underlying degradation involves a complicated ods through modifying the data synthesis process. Several
combination of different degradation processes. Specifi- typical limitation scenarios can be found in Fig. 11.
cally, the original image might be taken with a cellphone
3.3. Ringing and overshoot artifacts
many years ago, which inevitably contains degradations
such as camera blur, sensor noise, low resolution and JPEG Ringing artifacts often appear as spurious edges near
compression. The image was then edited with sharpen- sharp transitions in an image. They visually look like bands
ing and resize operations, bringing in overshoot and blur or “ghosts” near edges. Overshoot artifacts are usually com-
artifacts. After that, it was uploaded to some social me- bined with ringing artifacts, which manifest themselves as
dia applications, which introduces a further compression an increased jump at the edge transition. The main cause of
and unpredictable noises. As the digital transmission will these artifacts is that the signal is bandlimited without high
also bring artifacts, this process becomes more complicated frequencies. These artifacts are very common and usually
when the image spreads several times on the Internet. produced by a sharping algorithm, JPEG compression, etc.
1908
×4
ESRGAN Arch
Pixel
×2 Unshuffle
Upsampling
RRDB RRDB
Conv
Conv
Conv
Conv
(X4)
Block Block
Pixel
×1 Unshuffle Output
Input
Figure 4: Real-ESRGAN adopts the same generator network as that in ESRGAN. For the scale factor of ×2 and ×1, it first
employs a pixel-unshuffle operation to reduce spatial size and re-arrange information to the channel dimension.
1909
hancement and artifact suppression. limits the diversity of synthetic degradations in a batch. For
The training process is divided into two stages. First, we example, samples in a batch could not have different resize
train a PSNR-oriented model with the L1 loss. The obtained scaling factors. Therefore, we employ a training pair pool
model is named by Real-ESRNet. We then use the trained to increase the degradation diversity in a batch. At each it-
PSNR-oriented model as an initialization of the generator, eration, the training samples are randomly selected from the
and train the Real-ESRGAN with a combination of L1 loss, training pair poor to form a training batch. We set the pool
perceptual loss [19] and GAN loss [13, 25, 4]. size to 180 in our implementation.
Sharpen ground-truth images during training. We fur-
4. Experiments ther show a training trick to visually improve the sharpness,
while not introducing visible artifacts. A typical way of
4.1. Datasets and Implementation sharpening images is to employ a post-process algorithm,
Training details. Similar to ESRGAN, we adopt such as unsharp masking (USM). However, this algorithm
DIV2K [1], Flickr2K [43] and OutdoorSceneTraining [47] tends to introduce overshoot artifacts. We empirically find
datasets for training. The training HR patch size is set that sharpening ground-truth images during training could
to 256. We train our models with four NVIDIA V100 achieve a better balance of sharpness and overshoot arti-
GPUs with a total batch size of 48. We employ Adam fact suppression. We denote the model trained with sharped
optimizer [22]. Real-ESRNet is finetuned from ESR- ground-truth images as Real-ESRGAN+ (comparisons are
GAN for faster convergence. We train Real-ESRNet for shown in Fig. 7).
1000K iterations with learning rate 2 × 10−4 while train- 4.2. Comparisons with Prior Works
ing Real-ESRGAN for 400K iterations with learning rate
1 × 10−4 . We adopt exponential moving average (EMA) We compare our Real-ESRGAN with several state-
for more stable training and better performance. Real- of-the-art methods, including ESRGAN [48], DAN [33],
ESRGAN is trained with a combination of L1 loss, per- CDC [49], RealSR [18] and BSRGAN [53]. We test on
ceptual loss and GAN loss, with weights {1, 1, 0.1}, re- several testing datasets with real-world images, including
spectively. We use the {conv1, ...conv5} feature maps RealSR [5], DRealSR [49], OST300 [47], DPED [17],
(with weights {0.1, 0.1, 1, 1, 1}) before activation in the ADE20K validation [57] and Internet images. Since ex-
pre-trained VGG19 network [19] as the perceptual loss. Our isting metrics for perceptual quality cannot well reflect the
implementation is based on the BasicSR [46]. actual human perceptual preferences on the fine-grained
Degradation details. We employ a second-order degra- scale [3], we present several representative visual samples
dation model for a good balance of simplicity and effec- in Fig. 7.
tiveness. Unless otherwise specified, the two degradation It can be observed from Fig. 7 that our Real-ESRGAN
processes have the same settings. We adopt Gaussian ker- outperforms previous approaches in both removing artifacts
nels, generalized Gaussian kernels and plateau-shaped ker- and restoring texture details. Real-ESRGAN+ (trained with
nels, with a probability of {0.7, 0.15, 0.15}. The blur ker- sharpened ground-truths) can further boost visual sharp-
nel size is randomly selected from {7, 9, ...21}. Blur stan- ness. Specifically, the first sample contains overshoot ar-
dard deviation σ is sampled from [0.2, 3] ([0.2, 1.5] for the tifacts (white edges around letters). Directly upsampling
second degradation process). Shape parameter β is sam- will inevitably amplify those artifacts (e.g., DAN and BSR-
pled from [0.5, 4] and [1, 2] for generalized Gaussian and GAN). Real-ESRGAN takes such common artifacts into
plateau-shaped kernels, respectively. We also use sinc ker- consideration and simulates them with sinc filter, thus ef-
nel with a probability of 0.1. We skip the second blur degra- fectively removing ringing and overshoot artifacts. The sec-
dation with a probability of 0.2. ond sample contains unknown and complicated degrada-
We employ Gaussian noises and Poisson noises with a tions. Most algorithms can not effectively eliminate them
probability of {0.5, 0.5}. The noise sigma range and Pois- while Real-ESRGAN trained with second-order degrada-
son noise scale are set to [1, 30] and [0.05, 3], respectively tion processes could. Real-ESRGAN is also capable of
([1, 25] and [0.05, 2.5] for the second degradation process). restoring more realistic textures (e.g., brick, mountain and
The gray noise probability is set to 0.4. JPEG compression tree textures) for real-world samples, while other methods
quality factor is set to [30, 95]. The final sinc filter is ap- either fail to remove degradations or add unnatural textures
plied with a probability of 0.8. More details can be found in (e.g., RealSR and BSRGAN).
the released codes.
4.3. Ablation Studies
Training pair pool. In order to improve the training ef-
ficiency, all degradation processes are implemented in Py- Second-order degradation model. We conduct ablation
Torch with CUDA acceleration, so that we are able to syn- studies of degradations on Real-ESRNet, as it is more con-
thesize training pairs on the fly. However, batch processing trollable and can better reflect the influence of degradations.
1910
Bicubic ESRGAN DAN CDC
1
RealSR BSRGAN Real-ESRGAN Real-ESRGAN+
2
RealSR BSRGAN Real-ESRGAN Real-ESRGAN+
3
RealSR BSRGAN Real-ESRGAN Real-ESRGAN+
4
RealSR BSRGAN Real-ESRGAN Real-ESRGAN+
5
RealSR BSRGAN Real-ESRGAN Real-ESRGAN+
Figure 7: Qualitative comparisons on several representative real-world samples with upsampling scale factor of 4. Our
Real-ESRGAN outperforms previous approaches in both removing artifacts and restoring texture details. Real-ESRGAN+
(trained with sharpened ground-truths) can further boost visual sharpness. Other methods may either fail to remove overshoot
(the 1st sample) and complicated artifacts (the 2nd sample), or fail to restore realistic and natural textures for various scenes
(the 3rd, 4th, 5th samples). (Zoom in for best view)
1911
Input ESRGAN setting U-Net discriminator U-Net discriminator w/ SN
Real-ESRNet Real-ESRNet
w/o second-order Real-ESRNet w/o second-order Real-ESRNet
Real-ESRNet Real-ESRNet
w/o sinc filter Real-ESRNet w/o sinc filter Real-ESRNet
Figure 8: Top: Real-ESRNet results w/ and w/o second- Figure 9: Ablation on the discriminator design. Zoom in for
order degradation process. Bottom: Real-ESRNet results best view
Gaussian blur kernels More blur kernels Gaussian blur kernels More blur kernels
w/ and w/o sinc filters. Zoom in for best view
1912
References [19] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual
losses for real-time style transfer and super-resolution. In
[1] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge ECCV, 2016. 1, 6
on single image super-resolution: Dataset and study. In
[20] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate
CVPRW, 2017. 6
image super-resolution using very deep convolutional net-
[2] Sefi Bell-Kligler, Assaf Shocher, and Michal Irani. Blind works. In CVPR, 2016. 1, 2
super-resolution kernel estimation using an internal-gan. In
[21] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Deeply-
NeurIPS, 2019. 1, 2
recursive convolutional network for image super-resolution.
[3] Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, In CVPR, 2016. 2
and Lihi Zelnik-Manor. The 2018 pirm challenge on percep- [22] Diederik Kingma and Jimmy Ba. Adam: A method for
tual image super-resolution. In ECCVW, 2018. 6 stochastic optimization. In ICLR, 2015. 6
[4] Yochai Blau and Tomer Michaeli. The perception-distortion [23] Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-
tradeoff. In CVPR, 2018. 6 Hsuan Yang. Deep laplacian pyramid networks for fast and
[5] Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei accurate super-resolution. In CVPR, 2017. 2
Zhang. Toward real-world single image super-resolution: A [24] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero,
new benchmark and a new model. In ICCV, 2019. 2, 6 Andrew Cunningham, Alejandro Acosta, Andrew Aitken,
[6] Kelvin C.K. Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-
and Chen Change Loy. Glean: Generative latent bank for realistic single image super-resolution using a generative ad-
large-factor image super-resolution. In CVPR, 2021. 2 versarial network. In CVPR, 2017. 1, 2
[7] Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and [25] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero,
Lei Zhang. Second-order attention network for single image Andrew Cunningham, Alejandro Acosta, Andrew Aitken,
super-resolution. In CVPR, 2019. 2 Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-
[8] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou realistic single image super-resolution using a generative ad-
Tang. Learning a deep convolutional network for image versarial network. In CVPR, 2017. 2, 6
super-resolution. In ECCV, 2014. 1, 2 [26] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and
[9] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Kyoung Mu Lee. Enhanced deep residual networks for single
Tang. Image super-resolution using deep convolutional net- image super-resolution. In CVPRW, 2017. 1, 2
works. IEEE TPAMI, 38(2):295–307, 2016. 1, 2 [27] Anran Liu, Yihao Liu, Jinjin Gu, Yu Qiaoand, and Chao
[10] Michael Elad and Arie Feuer. Restoration of a single super- Dong. Blind image super-resolution: A survey and beyond.
resolution image from several blurred, noisy, and undersam- arXiv:2107.03055, 2021. 2
pled measured images. IEEE transactions on image process- [28] Ce Liu and Deqing Sun. On bayesian adaptive video super
ing, 6(12):1646–1658, 1997. 1, 2, 3 resolution. IEEE transactions on pattern analysis and ma-
[11] Manuel Fritsche, Shuhang Gu, and Radu Timofte. Frequency chine intelligence, 36(2):346–360, 2013. 1, 2, 3
separation for real-world super-resolution. In ICCVW, 2019. [29] Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, and
2 Thomas S Huang. Non-local recurrent network for image
[12] Daniel Glasner, Shai Bagon, and Michal Irani. Super- restoration. In NeurIPS, 2018. 2
resolution from a single image. In ICCV, 2009. 1 [30] Yu-Qi Liu, Xin Du, Hui-Liang Shen, and Shu-Jie Chen. Es-
[13] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing timating generalized gaussian blur kernels for out-of-focus
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and image deblurring. IEEE Transactions on circuits and sys-
Yoshua Bengio. Generative adversarial nets. In NeurIPS, tems for video technology, 2020. 3
2014. 2, 6 [31] Michael R Lomnitz. Diffjpeg. https://github.com/
[14] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing mlomnitz/DiffJPEG, 2021. 3
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and [32] Andreas Lugmayr, Martin Danelljan, and Radu Timofte. Un-
Yoshua Bengio. Generative adversarial nets. In NeurIPS, supervised learning for real-world super-resolution. In IC-
2014. 2 CVW, 2019. 2
[15] Jinjin Gu, Hannan Lu, Wangmeng Zuo, and Chao Dong. [33] Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, and
Blind super-resolution with iterative kernel correction. In Tieniu Tan. Unfolding the alternating optimization for blind
CVPR, 2019. 2 super resolution. In NeurIPS, 2020. 2, 6
[16] Muhammad Haris, Greg Shakhnarovich, and Norimichi [34] Tomer Michaeli and Michal Irani. Nonparametric blind
Ukita. Deep backprojection networks for super-resolution. super-resolution. In CVPR, 2013. 1
In CVPR, 2018. 2 [35] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and
[17] Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Yuichi Yoshida. Spectral normalization for generative ad-
Vanhoey, and Luc Van Gool. Dslr-quality photos on mobile versarial networks. In ICLR, 2018. 2, 5
devices with deep convolutional networks. In ICCV, 2017. 6 [36] Seonghyeon Nam, Youngbae Hwang, Yasuyuki Matsushita,
[18] Xiaozhong Ji, Yun Cao, Ying Tai, Chengjie Wang, Jilin Li, and Seon Joo Kim. A holistic approach to cross-channel im-
and Feiyue Huang. Real-world super-resolution via kernel age noise modeling and its application to image denoising.
estimation and noise injection. In CVPRW, 2020. 1, 2, 6 In CVPR, 2016. 3
1913
[37] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- [52] Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang,
net: Convolutional networks for biomedical image segmen- Chao Dong, and Liang Lin. Unsupervised image super-
tation. In International Conference on Medical image com- resolution using cycle-in-cycle generative adversarial net-
puting and computer-assisted intervention. Springer, 2015. works. In CVPRW, 2018. 2
2 [53] Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo-
[38] Mehdi SM Sajjadi, Bernhard Schölkopf, and Michael fte. Designing a practical degradation model for deep blind
Hirsch. Enhancenet: Single image super-resolution through image super-resolution. arXiv preprint arXiv:2103.14006,
automated texture synthesis. In ICCV, 2017. 2 2021. 2, 3, 4, 6
[39] Edgar Schonfeld, Bernt Schiele, and Anna Khoreva. A u-net [54] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Learning a
based discriminator for generative adversarial networks. In single convolutional super-resolution network for multiple
CVPR, 2020. 2, 5 degradations. In CVPR, 2018. 1, 2
[40] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, [55] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng
Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Zhong, and Yun Fu. Image super-resolution using very deep
Wang. Real-time single image and video super-resolution residual channel attention networks. In ECCV, 2018. 2
using an efficient sub-pixel convolutional neural network. In [56] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and
CVPR, 2016. 5 Yun Fu. Residual dense network for image super-resolution.
[41] Richard Shin and Dawn Song. Jpeg-resistant adversarial im- In CVPR, 2018. 2
ages. In NeurIPS Workshop on Machine Learning and Com- [57] Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fi-
puter Security, 2017. 3 dler, Adela Barriuso, and Antonio Torralba. Semantic under-
[42] Ying Tai, Jian Yang, and Xiaoming Liu. Image super- standing of scenes through the ade20k dataset. International
resolution via deep recursive residual network. In CVPR, Journal of Computer Vision, 2019. 6
2017. 2 [58] Ruofan Zhou and Sabine Susstrunk. Kernel modeling super-
resolution on real low-resolution images. In ICCV, 2019. 2
[43] Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-
Hsuan Yang, Lei Zhang, Bee Lim, Sanghyun Son, Heewon
Kim, Seungjun Nah, Kyoung Mu Lee, et al. Ntire 2017 chal-
lenge on single image super-resolution: Methods and results.
In CVPRW, 2017. 2, 6
[44] Longguang Wang, Yingqian Wang, Xiaoyu Dong, Qingyu
Xu, Jungang Yang, Wei An, and Yulan Guo. Unsuper-
vised degradation representation learning for blind super-
resolution. In CVPR, 2021. 2
[45] Xintao Wang, Yu Li, Honglun Zhang, and Ying Shan. To-
wards real-world blind face restoration with generative facial
prior. In CVPR, 2021. 2
[46] Xintao Wang, Ke Yu, Kelvin C.K. Chan, Chao Dong, and
Chen Change Loy. Basicsr. https://github.com/
xinntao/BasicSR, 2020. 6
[47] Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy.
Recovering realistic texture in image super-resolution by
deep spatial feature transform. In CVPR, 2018. 2, 6
[48] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu,
Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: En-
hanced super-resolution generative adversarial networks. In
ECCVW, 2018. 1, 2, 5, 6
[49] Pengxu Wei, Ziwei Xie, Hannan Lu, ZongYuan Zhan, Qix-
iang Ye amd Wangmeng Zuo, and Liang Lin. Component
divide-and-conquer for real-world image super-resolution. In
ECCV, 2020. 2, 6
[50] Yitong Yan, Chuangchuang Liu, Changyou Chen, Xianfang
Sun, Longcun Jin, Peng Xinyi, and Xiang Zhou. Fine-
grained attention and feature-sharing generative adversarial
networks for single image super-resolution. IEEE Transac-
tions on Multimedia, 2021. 2, 5
[51] Ke Yu, Xintao Wang, Chao Dong, Xiaoou Tang, and
Chen Change Loy. Path-restore: Learning network path se-
lection for image restoration. arXiv:1904.10343, 2019. 2
1914