Generative AI
Generative Adversarial Networks
Discover theDiscover
world at the
Leiden
world
University
at Leiden University
Outline
What are Generative Adversarial Networks?
Extensions to ‘Vanilla’ Generative Adversarial Networks
Applications of Generative Adversarial Networks
Tutorial Exercise
Discover the world at Leiden University
What are Generative Adversarial Networks?
Discover the world at Leiden University
What are Generative Adversarial Networks?
An Adversarial Game
Generative adversarial networks are based on a game, in the sense of game theory,
between two machine learning models.
The generator defines pmodel(x) implicitly
It is not able to evaluate the density function pmodel but can draw samples from it.
A prior distribution p(z) over a vector z is used as input to a generator function
G(z; θ (G)) where θ (G) is a set of learnable parameters defining the generator’s strategy
in the game.
The prior distribution p(z) is typically unstructured, e.g., a high-dimensional normal
distribution. Consequently, samples z are noise.
The generator must learn the function G(z) that transforms unstructured noise z into
realistic samples.
Discover the world at Leiden University
What are Generative Adversarial Networks?
An Adversarial Game
Generative adversarial networks are based on a game, in the sense of game theory,
between two machine learning models.
The discriminator examines samples x and returns an estimate D(x; θ (D)) of whether x is
real (drawn from pdata) or fake (drawn from pmodel by running the generator).
Each player incurs a cost (a loss): J (G)(θ (G), θ (D)) for the generator and J (D)(θ (G), θ (D)) for
the discriminator. Each player attempts to minimise its own cost.
The discriminator’s cost encourages it to correctly classify data as real or fake.
The generator’s cost encourages it to generate samples that the discriminator
incorrectly classifies as real.
There have been different formulations of these loss functions.
Discover the world at Leiden University
Generative Adversarial Networks (GANs)
Discriminator: learns to
classify input into “fake” or
“real” inputs using
“ground truth” examples.
Goal: to minimise
classification error
Generator: learns to
transform a random vector
into outputs capable of
fooling discriminator that
its output is “real”.
Goal: to maximise
classification error.
Schematic of a Generative Adversarial Network
Discover the world at Leiden University
Training GANs
The key to the success of GANs is how training
is alternated between the two networks:
At the start, the generator outputs noisy
images and the discriminator predicts
randomly
By training the generator, it becomes better
at generating fake observations
By training the discriminator, it becomes
better at identifying fake observations
As the generator improves, the discriminator
must adapt to identify fakes
As the discriminator improves, the generator
must find new ways to produce fakes
Schematic of a Generative Adversarial Network Training
Discover the world at Leiden University
Extensions to Generative Adversarial Networks
Discover the world at Leiden University
Conditional GANs
Conditioning the generator
on some data other than
the noise vector provides
contextual information
Discriminator may use
label to decide real versus
fake, or may be trained to
classify images
Schematic of Conditional GAN
Discover the world at Leiden University
Conditional Image Generation
Conditioning the generator
on some data other than
the noise vector provides
contextual information
Discriminator may use
label to decide real versus
fake, or may be trained to
classify images
Examples of Outputs from a Conditional GAN
Discover the world at Leiden University
InfoGAN
InfoGAN is similar to
Conditional GAN but
instead of a label the goal is
to generate codes (c) that
will organise the latent
space
The latent codes are
learned as part of the
training process by
another network Q
Schematic of InfoGAM
Discover the world at Leiden University
InfoGAN
InfoGAN is similar to
Conditional GAN but
instead of a label the goal is
to generate codes that will
organise the latent space
Illustration of organisation of latent space in InfoGAN (Source: Chen et al, 2016)
Discover the world at Leiden University
Applications of GANs
Discover the world at Leiden University
Examples Applications of GANs
Image Generation: generate an image based on an existing dataset: e.g. DCGAN
High Quality Image Generation: generate HQ images e.g., ProGAN, BigGAN
Image-to-Image Translation: convert one class of image to another, e.g., pix2pix
Image Super-resolution: from low resolution to high resolution images, e.g., SRGAN
Next Frame Prediction: generate the next frame in a video, e.g., FutureGAN
Text-to-Image Generation: generate an image from a text description, e.g., StackGAN
Text-to-Speech Generation: generate speech from text input, e.g., GAN-TTS
Discover the world at Leiden University
Image Generation
Deep Convolutional
Generative Adversarial
Network (DCGAN)
Extended original GAN
architecture to use
convolutional layers
Source: Radford et al (2016)
Discover the world at Leiden University
Image Generation
Deep Convolutional
Generative Adversarial
Network (DCGAN)
Extended original GAN
architecture to use
convolutional layers
Greatly improves ability of
generator to produce
images and discriminator
to classify images
Source: Radford et al (2016)
Discover the world at Leiden University
Image Generation
Deep Convolutional
Generative Adversarial
Network (DCGAN)
Extended original GAN
architecture to use
convolutional layers
Greatly improves ability of
generator to produce
images and discriminator
to classify images
Demonstrated ability to
perform vector arithmetic
in the latent space of noise
vectors to the generator
Source: Radford et al (2016)
Discover the world at Leiden University
High Quality Image Generation
ProGAN progressively
increases the size of the
images generated as
training progresses
Allows stable learning of
much higher quality
images than previous
approaches
Source: Karras et al (2018)
Discover the world at Leiden University
High Quality Image Generation
ProGAN progressively
increases the size of the
images generated as
training progresses
Allows stable learning of
much higher quality
images than previous
approaches
Demonstrated ability of
GANs to work with high
quality images
Source: Karras et al (2018)
Discover the world at Leiden University
This Person Does Not Exist
Source: https://thispersondoesnotexist.com
Discover the world at Leiden University
High Quality Image Generation
BigGAN
Combined multiple
improvements and large-
scale training to build a
large model of images.
Source: Brock, Donahue and Simonyan, 2018
Discover the world at Leiden University
High Quality Image Generation
BigGAN
Combined multiple
improvements and large-
scale training to build a
large model of images.
Latent Space
Random variables
provided to generator
define a space that can be
sampled to produce
images not in training set.
Source: Brock, Donahue and Simonyan, 2018
Discover the world at Leiden University
Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos
Daytime to nighttime conversion (Source: Isola et al, 2016)
Discover the world at Leiden University
Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos
Semantic image to photo translation (Source: Isola et al, 2016)
Discover the world at Leiden University
Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos
Daytime to nighttime conversion (Source: Isola et al, 2016)
Discover the world at Leiden University
Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos
Photo in-painting
Thermal to RGB
Sketche to photo conversion (Source: Isola et al, 2016)
Discover the world at Leiden University
Discover the world at Leiden University Learning to see: Gloomy Sunday (Source: Memo Atkin)
Image Super-Resolution
Super-Resolution GANs
have generators that are
trained to convert low
resolution images to high
resolution images
The input to the generator
is a combination of the low
resolution image and a
noise vector
Comparison of image super-resolution (Source: Ledig et al, 2017)
Discover the world at Leiden University
Image Super-Resolution
Super-Resolution (SR)
GANs have generators that
are trained to convert low
resolution images to high
resolution images
The input to the generator
is a combination of the low
resolution image and a
noise vector
They were shown to
outperform the state-of-
the-art: producing sharp
details in SR images
Comparison of image super-resolution (Source: Ledig et al, 2017)
Discover the world at Leiden University
Photo In-painting
Photo in-painting requires
the generator to be
conditioned on an image
with a missing section and
produce a plausible
completed image
Examples of photo in-painting (Source: Pathak et al, 2016)
Discover the world at Leiden University
Photo In-painting
Photo in-painting requires
the generator to be
conditioned on an image
with a missing section and
produce a plausible
completed image
The Context Encoder
model shares many of the
features of a GAN, it is not
referred to in the paper as
a GAN model
Examples of photo in-painting (Source: Pathak et al, 2016)
Discover the world at Leiden University
Next Frame Prediction
FutureGAN is an example
of a GAN with a generator
trained to predict the next
frame in a video
The input to the generator
is conditional on one or
more previous frames and
require to produce the
next frame
FutureGAN builds on
ProGAN and takes a
progressive approach to
training the network
Examples of next frame prediction (See: Aigner and K rner, 2018))
Discover the world at Leiden University
ö
Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise
An architecture for text-to-image generation (Source: Reed et al, 2016)
Discover the world at Leiden University
Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise
Early papers showed
ability to generate low
resolution images
An architecture for text-to-image generation (Source: Reed et al, 2016)
Discover the world at Leiden University
Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise
Early papers showed
ability to generate low
resolution images
StackGAN showed the
output could be improved
using a pair of GANs
Architecture of StackGAN (Source: Zhang et al, 2017)
Discover the world at Leiden University
Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise
Early papers showed
ability to generate low
resolution images
StackGAN showed the
output could be improved
using a pair of GANs
Examples of image improvement in StackGAN (Source: Zhang et al, 2017)
Discover the world at Leiden University
Creative Adversarial Networks (CANs)
Adjust Loss Function to
Produce Novel Styles
Discriminator: Minimise
Real/Fake = Art/Not Art
and Art-style classification
Generator: Maximise
Real/Fake = Art/Not Art
and Style Ambiguity
Source: Creative Adversarial Networks
Discover the world at Leiden University
CAN: Creative Adversarial Networks (Elgammal et al., 2017)
Adjust Loss Function to
Produce Novel Styles
Discriminator: Minimise
Real/Fake = Art/Not Art
and Art-style classification
Generator: Maximise
Real/Fake = Art/Not Art
and Style Ambiguity
Source: Creative Adversarial Networks
Discover the world at Leiden University
GAN Challenges
Discover the world at Leiden University
GAN Challenges
Uninformative Loss
Value of loss is less
informative than in
traditional networks,
making training trickier
Oscillating Loss
The loss of the
discriminator and
generator can start to
oscillate wildly, rather than
exhibiting long-term
stability.
Oscillating Loss (Source: Generative Deep Learning)
Discover the world at Leiden University
Mode Collapse
If the generator finds a
small number of outputs
that fool the discriminator
Pressure on the generator
to produce diverse outputs
reduces dramatically
Generator tends to map
every point in the latent
space to these outputs
Gradient of loss function
collapses to near 0
Mode collapse results in outputs being very similar (Source: Generative Deep Learning)
Discover the world at Leiden University
Tutorial Exercise
Today’s tutorial exercise is to build and
train a GAN on the MNIST dataset
The GAN is a Deep Convolutional GAN,
so is able to learn high level features of the
from the MNIST dataset
The tutorial includes a graded assignment
to apply and extend the approach to a
different dataset
Some suggestions are given for other
small datasets, but even these will require
some experimentation with the
architecture of the GAN to be effective
Discover the world at Leiden University