Generative Adversarial Networks
Generative Adversarial Networks
Moez Krichen
ReDCAD, University of Sfax, Sfax, Tunisia
[email protected]
both theoretical and practical insights for researchers and processing nodes that can learn to recognize patterns in data
practitioners in the field, and to help demystify the often through a process of supervised or unsupervised training.
confusing and intimidating aspects of GANs. Generative models are a class of DL algorithm that can
The primary contributions of this paper are: produce new data that is similar to the T rainingData
1) A comprehensive overview of the GAN (T rainingData ). They have many types of applications, from
architecture, including the GeneratorN etwork and image synthesis to speech generation. One of the most popular
DiscriminatorN etwork , and the key design choices types of generative models is GAN.
and variations. The basic concept of GANs was introduced by Ian Good-
2) An in-depth review of the loss functions utilized in fellow and his colleagues in 2014 [6]. As illustrated in
GANs, including the original minimax objective, as well Figure 2 GANs consist of two NNs: a GeneratorN etwork
as more recent approaches s.a. Wasserstein distance and and a DiscriminatorN etwork . The GeneratorN etwork takes
gradient penalty. as input a random noise vector and produces a new sam-
3) A survey of the various training methods utilized in ple that is intended to be similar to the T rainingData .
GANs, including alternating optimization, minibatch The DiscriminatorN etwork takes as input a sample and
discrimination, and spectral normalization. tries to differentiate between samples produced by the
4) A review of the different applications of GANs across GeneratorN etwork and samples from the T rainingData .
domains s.a. computer vision, NLP, and audio synthesis. The GeneratorN etwork is trained to produce samples that
5) An exploration of the evaluation metrics utilized to are difficult for the DiscriminatorN etwork to differentiate
assess the diversity and quality of GAN-produced data. from the T rainingData , while the DiscriminatorN etwork
6) A discussion of the challenges and open issues in GAN is trained to classify samples correctly as either real or
research, including training instability, mode collapse, fake. The training process (T rainingP rocess ) for GANs
and ethical considerations. is iterative, and involves alternating between training
7) A glimpse into the future directions of GAN research, the GeneratorN etwork and DiscriminatorN etwork . During
including improving scalability, developing new archi- training, the GeneratorN etwork learns to produce more re-
tectures, incorporating domain knowledge, and explor- alistic samples, while the DiscriminatorN etwork learns to
ing new applications. become more accurate in differentiating between real and
fake samples. The goal is to find an equilibrium where the
In Section II, we provide a brief background on GANs and
GeneratorN etwork produces samples that are indistinguish-
related work. In Section III, we provide a detailed overview
able from the T rainingData , and the DiscriminatorN etwork
of the GAN architecture, including the GeneratorN etwork
is not able to differentiate between real and fake samples.
and DiscriminatorN etwork , and the key design choices and
Several types of GAN architectures have been proposed, in-
variations. In Section IV, we review the loss functions utilized
cluding deep convolutional GANs, Wasserstein GANs [7], and
in GANs, including the original minimax objective, as well as
conditional GANs [8]. DCGANs are a type of GAN that use
more recent approaches s.a. Wasserstein distance and gradient
CNNs in the GeneratorN etwork and DiscriminatorN etwork
penalty. In Section V, we discuss the training methods uti-
to produce high-quality images. WGANs are a type of GAN
lized in GANs, including alternating optimization, minibatch
that use the Wasserstein distance metric instead of the tra-
discrimination, and spectral normalization. In Section VI, we
ditional Jensen-Shannon divergence to evaluate the distance
survey the various applications of GANs across domains s.a.
between the produced and real distributions. cGANs are
computer vision, NLP, and audio synthesis. In Section VII, we
a type of GAN that condition the GeneratorN etwork and
explore the evaluation metrics utilized to assess the diversity
DiscriminatorN etwork on additional information, s.a. class
and quality of GAN-produced data, s.a. Frechet Inception
labels or attribute vectors.
Distance and Inception Score. In Section VIII, we discuss
In addition to image synthesis, GANs have been applied
the challenges and open issues in GAN research, including
to a wide range of problems, s.a. data augmentation, style
training instability, mode collapse, and ethical considerations.
transfer [9], and anomaly detection [10]. GAN-based image
Finally, in Section IX, we provide a glimpse into the future
synthesis has seen important advances in recent years, with
directions of GAN research, including improving scalability,
the introduction of progressive GANs, styleGAN [11], and
developing new architectures, incorporating domain knowl-
BigGAN. These models are able to create high-quality images
edge, and exploring new applications. We conclude the paper
with high resolution and diverse styles.
in Section X, summarizing our contributions and discussing
the broader impact and potential of GANs. III. GAN A RCHITECTURE
GANs [6] are a type of generative model that learn
II. BACKGROUND
to generate new data samples that resemble a given
In recent years, DL has emerged as an important tool for T rainingData . The basic GAN architecture consists of two
solving a wide range of ML problems, s.a. image classification, NNs: a GeneratorN etwork and a DiscriminatorN etwork .
speech recognition, and NLP [4], [5]. DL algorithms are based The GeneratorN etwork considers a random noise vector
on NNs, which are composed of layers of interconnected v ∈ Rd as input and creates a synthetic data (SyntheticData )
IEEE - 56998
sample ŝ ∈ Rm as output. The DiscriminatorN etwork takes IV. L OSS F UNCTIONS FOR GAN S
a data sample s ∈ Rm as input and produces a scalar value The success of GANs in generating high-quality
D(s) ∈ [0, 1] as output, indicating the probability that s is a SyntheticData samples is closely tied to the design of
RealData sample (as opposed to a synthetic sample produced their loss functions (LossF unction ). In this section, we review
by the GeneratorN etwork ). some of the most commonly utilized LFs for GANs and their
The GeneratorN etwork and The DiscriminatorN etwork properties.
are trained in an adversarial way, with the GeneratorN etwork
attempting to generate synthetic samples that resemble A. The Original GAN LossF unction
RealData samples and the DiscriminatorN etwork attempt- The original GAN LossF unction [6] is given by Equation
ing to differentiate between RealData and SyntheticData 1, which encourages the GeneratorN etwork G to generate
samples. The T rainingP rocess can be modeled as a 2-player synthetic samples that are indistinguishable from real samples
minimax game in which the GeneratorN etwork G attempts by the DiscriminatorN etwork D. While the original GAN
to minimize the following objective function: LossF unction has been successful in generating high-quality
SyntheticData samples, it suffers from several problems,
including instability during training and mode collapse, where
min max Es∼pdata (s) [log D(s)] + Ev∼pv (v) [log(1 − D(G(v)))], (1)
G D the GeneratorN etwork learns to produce a limited set of
where: samples that do not represent the full diversity of the true
• pdata (s) = true data distribution;
data distribution.
• v = noise vector; B. Improved GAN LFs
• pv (v) = prior distribution of v;
To address the problems with the original GAN LF, several
• E = the expected value.
improved GAN LFs have been proposed in the literature.
The first term in Equation 1 encourages the Wasserstein GANs (WGANs) [7] use the Wasserstein dis-
DiscriminatorN etwork to correctly classify RealData tance as a LF, which has been shown to produce more stable
samples as real, while the second term encourages the training and produce higher-quality samples. The WGAN
GeneratorN etwork to generate synthetic samples that the LossF unction is given by Equation 3:
DiscriminatorN etwork classifies as real.
The DiscriminatorN etwork D tries to maximize the fol- min max Es∼pdata (s) [D(s)] − Ev∼pv (v) [D(G(v))], (3)
lowing objective function: G D
In conclusion, the choice of LossF unction is critical for the Consequently, training GANs is a challenging task which
success of GANs in generating high-quality SyntheticData needs careful consideration of several factors to achieve stable
samples. While the original GAN LossF unction has been suc- and high-quality results. The key challenges in GAN training
cessful in many applications, several improved LFs have been include instability and mode collapse, which can be addressed
proposed that address its limitations and produce more stable using various techniques, s.a. modifying the GAN LF, us-
training and higher-quality samples. The type of LossF unction ing different architectures, and adding regularization. Further
used is determined by the purpose and problem at hand. research is needed to develop more effective techniques for
training GANs and improving their performance in various
V. T RAINING GAN S applications.
GANs are typically trained using a 2-player minimax game,
VI. A PPLICATIONS OF GAN S
where a GeneratorN etwork learns to produce SyntheticData
samples, and a DiscriminatorN etwork learns to differen- GANs have gotten a lot of interest recently because of their
tiate between RealData and SyntheticData samples. The capacity to produce high-quality SyntheticData that closely
T rainingP rocess involves iteratively updating the parameters matches RealData . GANs have various applications in differ-
of the GeneratorN etwork and DiscriminatorN etwork to ent fields, including computer vision, NLP, and healthcare.
improve their performance. A. Image Synthesis
A. Challenges in Training GANs One of the most famous applications of GANs is in image
Training GANs can be challenging due to several factors, synthesis, where GANs are utilized to produce new images
including instability and mode collapse. Instability can arise that are similar to a given set of training images. GANs can
when the DiscriminatorN etwork is too powerful and quickly create highly realistic images that can be utilized for various
learns to differentiate between RealData and SyntheticData purposes, s.a. in video games, virtual reality, and creating
samples, making it difficult for the GeneratorN etwork to SyntheticData for training ML models. Recent advances in
learn. Mode collapse, on the other hand, can occur when GAN-based image synthesis have led to the development of
the GeneratorN etwork learns to create a restricted number several new techniques s.a. progressive GANs, styleGAN [16],
of samples that fail to accurately reflect the diversity of the and BigGAN [17]. Progressive GANs produce high-resolution
underlying data distribution. To address instability, several images by incrementally increasing the size of the produced
approaches have been proposed, s.a. modifying the GAN images, while styleGAN allows for the control of different
LossF unction to make it more stable during training. For aspects of the produced images s.a. style, pose, and facial
example, the Wasserstein GAN (WGAN) [7] replaces the orig- expression. BigGAN is capable of generating high-quality
inal GAN LossF unction with the Wasserstein distance, which images with up to 512x512 pixels.
can produce more stable training. Several strategies have been B. Data Augmentation
developed to address mode collapse, s.a. adding noise to the GANs are additionally appropriate for data augmentation,
input of the GeneratorN etwork , using feature matching [14], which involves creating SyntheticData to expand the size
or using different architectures for the GeneratorN etwork and of the T rainingData . By supplying additional T rainingData
DiscriminatorN etwork , s.a. the CycleGAN [15]. that is close to the RealData , data augmentation using GANs
B. Stabilizing GAN Training may increase the performance of ML models. This approach
has been successfully applied in various areas s.a. object
Several strategies for stabilizing GAN training and address- detection, image classification, and speech recognition [18].
ing the aforementioned issues have been proposed. One such
technique is minibatch discrimination [14], which involves C. Style Transfer
adding additional features to the DiscriminatorN etwork that Additionally, GANs can be used for style transfer, which is
allow it to compare multiple samples at once and differentiate the process of transferring an image’s style to another. This
between them. This enhances the diversity of the produced method can be used to create novel pictures by merging the
samples and helps to prevent mode collapse. Another tech- content of one picture with the style of another. Style transfer
nique is spectral normalization [13], which involves normal- using GANs has shown promising results in various domains,
izing the weights of the DiscriminatorN etwork to ensure s.a. fashion design, art, and photography [19].
that the Lipschitz constant of the network is limited. This
helps to prevent the DiscriminatorN etwork from becoming D. Emerging Applications
too powerful and stabilizes the T rainingP rocess . Other tech- GANs are also being utilized for emerging applications s.a.
niques include using different LFs, s.a. the least-squares GAN video synthesis and text-to-image synthesis. In video synthe-
(LSGAN) LossF unction [12], to improve the stability of the sis, GANs are utilized to produce new video frames that are
T rainingP rocess . Additionally, regularization techniques, s.a. similar to the existing frames. This technique can be utilized
weight decay and dropout, can also be utilized to prevent to create high-quality videos with less manual effort. Text-to-
overfitting and improve the generalization performance of the image synthesis using GANs involves generating images from
models. textual descriptions. This approach has potential applications
IEEE - 56998
in fashion design, interior design, and other areas where the compared to the GeneratorN etwork , which leads to
ability to produce images from textual descriptions can be the GeneratorN etwork outputting similar samples that
useful [20]. fool the DiscriminatorN etwork . Researchers are ex-
ploring various techniques to address mode collapse, s.a.
VII. E VALUATION OF GAN S
adding regularization terms to the LossF unction or using
Due to the lack of a clear goal function, evaluating the alternative training methods. Other techniques include
performance of GANs is a difficult undertaking. GANs pro- modifying the architecture of the GeneratorN etwork
duce SyntheticData by learning the underlying distribu- and DiscriminatorN etwork or using more advanced
tion of the T rainingData , and the quality of the produced optimization methods.
data depends on different factors s.a. the architecture of 2) Training instability: It can be challenging to train GANs,
the GeneratorN etwork and DiscriminatorN etwork , the op- and the T rainingP rocess can be unstable, leading to
timization algorithm utilized, and the choice of hyperparam- oscillations or divergence in the GeneratorN etwork and
eters. To evaluate the performance of GANs, various metrics DiscriminatorN etwork loss. This can make it challeng-
have been defined, including IS4 and FID5 . Based on the clas- ing to achieve good performance. One possible cause
sification precision of a previously trained Inception model, the of training instability is the imbalance between the
IS assesses the diversity and quality of the generated images. GeneratorN etwork and DiscriminatorN etwork , where
Using an Inception model that has already been trained, the one dominates the other. Another cause is the van-
FID calculates the separation in feature space between the ishing gradient problem, where the gradients of the
distributions of the real and created images. LossF unction become too small to update the parame-
While these metrics have been widely utilized in GAN ters. Researchers are investigating various approaches to
research, they have limitations. For example, the IS is known improve the stability of GAN training, s.a. adjusting the
to favor models that produce images that are easily classified learning rate, using different optimization algorithms, or
by the Inception model, even if they are low-quality or lack adding noise to the T rainingP rocess . Another approach
diversity. The FID can be sensitive to noise and image artifacts, is to use more advanced architectures, s.a. Wasserstein
and may not always correlate with visual quality. Moreover, GANs or Spectral Normalization GANs, that have been
both metrics require pre-trained Inception models, which may shown to be more stable during training.
not be readily available or may not be suitable for all types 3) Evaluation metrics: There is a lack of widely accepted
of data. There is ongoing research to develop new evaluation evaluation metrics for GANs, making it difficult to
metrics that can better capture the performance of GANs. compare different models and assess their performance
Some recent proposals include KID6 , which measures the objectively. Some proposed evaluation metrics include
distance between the distributions of the features extracted FID, which evaluates the distance between the distribu-
from the Inception model, and LPIPS7 , which measures the tion of produced samples and the distribution of real
perceptual similarity between images based on the activations samples in a feature space, and IS, which measures
of a pre-trained deep NN. the diversity and quality of produced samples based
In conclusion, evaluating the performance of GANs is an on their classification scores by a pre-trained classifier.
important aspect of GAN research, and various metrics have However, these metrics have limitations and may not
been proposed for this purpose. However, current evaluation capture all aspects of the produced data. Researchers are
metrics have limitations, and there is a need for new metrics exploring alternative evaluation metrics and methods to
that can better capture the performance of GANs. better quantify the performance of GANs.
VIII. C HALLENGES AND O PEN I SSUES 4) Scalability: GANs can be computationally expensive to
train and require enormous volumes of data. As a result,
GANs have demonstrated enormous potential for producing its scalability and applicability to real-world challenges
realistic photos, movies, and other forms of data. However, may be limited. Transfer learning, which involves fine-
there are various obstacles and unresolved concerns that must tuning a pre-trained model on a new dataset, is one
be addressed in order to increase the performance and usability method for improving GAN scalability. Another option
of GANs. Among the obstacles and unresolved issues are: is to leverage parallel computing, also known as dis-
1) Mode collapse: Mode collapse is a common problem tributed training over numerous GPUs, or to use cloud-
in GANs, where the GeneratorN etwork creates only based computing resources. Researchers are also looking
a limited set of outputs, ignoring other possible out- into techniques to reduce the quantity of data neces-
puts. This might lead to a lack of diversity in the sary for GAN training, also known as semi-supervised
data that is generated. One possible cause of mode learning, or to enrich the T rainingData with generative
collapse is the DiscriminatorN etwork being too strong models.
4 IS=Inception
5) Ethical implications: As with any technology, GANs
Score
5 FID= Fréchet Inception Distance
raise ethical implications, particularly in the context of
6 KID=Kernel Inception Distance generating realistic images or videos. GANs can be
7 LPIPS=Learned Perceptual Image Patch Similarity utilized to create fake content that can be utilized for ma-
IEEE - 56998