0% found this document useful (0 votes)
10 views42 pages

06 Generative Adversarial Networks

Uploaded by

for.code.things
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views42 pages

06 Generative Adversarial Networks

Uploaded by

for.code.things
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Generative AI

Generative Adversarial Networks

Discover theDiscover
world at the
Leiden
world
University
at Leiden University
Outline
What are Generative Adversarial Networks?

Extensions to ‘Vanilla’ Generative Adversarial Networks

Applications of Generative Adversarial Networks

Tutorial Exercise

Discover the world at Leiden University


What are Generative Adversarial Networks?

Discover the world at Leiden University


What are Generative Adversarial Networks?
An Adversarial Game
Generative adversarial networks are based on a game, in the sense of game theory,
between two machine learning models.
The generator defines pmodel(x) implicitly
It is not able to evaluate the density function pmodel but can draw samples from it.
A prior distribution p(z) over a vector z is used as input to a generator function
G(z; θ (G)) where θ (G) is a set of learnable parameters defining the generator’s strategy
in the game.
The prior distribution p(z) is typically unstructured, e.g., a high-dimensional normal
distribution. Consequently, samples z are noise.
The generator must learn the function G(z) that transforms unstructured noise z into
realistic samples.

Discover the world at Leiden University


What are Generative Adversarial Networks?
An Adversarial Game
Generative adversarial networks are based on a game, in the sense of game theory,
between two machine learning models.
The discriminator examines samples x and returns an estimate D(x; θ (D)) of whether x is
real (drawn from pdata) or fake (drawn from pmodel by running the generator).
Each player incurs a cost (a loss): J (G)(θ (G), θ (D)) for the generator and J (D)(θ (G), θ (D)) for
the discriminator. Each player attempts to minimise its own cost.
The discriminator’s cost encourages it to correctly classify data as real or fake.
The generator’s cost encourages it to generate samples that the discriminator
incorrectly classifies as real.
There have been different formulations of these loss functions.

Discover the world at Leiden University


Generative Adversarial Networks (GANs)
Discriminator: learns to
classify input into “fake” or
“real” inputs using
“ground truth” examples.
Goal: to minimise
classification error
Generator: learns to
transform a random vector
into outputs capable of
fooling discriminator that
its output is “real”.
Goal: to maximise
classification error.
Schematic of a Generative Adversarial Network

Discover the world at Leiden University


Training GANs
The key to the success of GANs is how training
is alternated between the two networks:
At the start, the generator outputs noisy
images and the discriminator predicts
randomly
By training the generator, it becomes better
at generating fake observations
By training the discriminator, it becomes
better at identifying fake observations
As the generator improves, the discriminator
must adapt to identify fakes
As the discriminator improves, the generator
must find new ways to produce fakes

Schematic of a Generative Adversarial Network Training

Discover the world at Leiden University


Extensions to Generative Adversarial Networks

Discover the world at Leiden University


Conditional GANs
Conditioning the generator
on some data other than
the noise vector provides
contextual information
Discriminator may use
label to decide real versus
fake, or may be trained to
classify images

Schematic of Conditional GAN

Discover the world at Leiden University


Conditional Image Generation
Conditioning the generator
on some data other than
the noise vector provides
contextual information
Discriminator may use
label to decide real versus
fake, or may be trained to
classify images

Examples of Outputs from a Conditional GAN

Discover the world at Leiden University


InfoGAN
InfoGAN is similar to
Conditional GAN but
instead of a label the goal is
to generate codes (c) that
will organise the latent
space
The latent codes are
learned as part of the
training process by
another network Q

Schematic of InfoGAM

Discover the world at Leiden University


InfoGAN
InfoGAN is similar to
Conditional GAN but
instead of a label the goal is
to generate codes that will
organise the latent space

Illustration of organisation of latent space in InfoGAN (Source: Chen et al, 2016)

Discover the world at Leiden University


Applications of GANs

Discover the world at Leiden University


Examples Applications of GANs
Image Generation: generate an image based on an existing dataset: e.g. DCGAN

High Quality Image Generation: generate HQ images e.g., ProGAN, BigGAN

Image-to-Image Translation: convert one class of image to another, e.g., pix2pix

Image Super-resolution: from low resolution to high resolution images, e.g., SRGAN

Next Frame Prediction: generate the next frame in a video, e.g., FutureGAN

Text-to-Image Generation: generate an image from a text description, e.g., StackGAN

Text-to-Speech Generation: generate speech from text input, e.g., GAN-TTS

Discover the world at Leiden University


Image Generation
Deep Convolutional
Generative Adversarial
Network (DCGAN)
Extended original GAN
architecture to use
convolutional layers

Source: Radford et al (2016)

Discover the world at Leiden University


Image Generation
Deep Convolutional
Generative Adversarial
Network (DCGAN)
Extended original GAN
architecture to use
convolutional layers
Greatly improves ability of
generator to produce
images and discriminator
to classify images

Source: Radford et al (2016)

Discover the world at Leiden University


Image Generation
Deep Convolutional
Generative Adversarial
Network (DCGAN)
Extended original GAN
architecture to use
convolutional layers
Greatly improves ability of
generator to produce
images and discriminator
to classify images
Demonstrated ability to
perform vector arithmetic
in the latent space of noise
vectors to the generator
Source: Radford et al (2016)

Discover the world at Leiden University


High Quality Image Generation
ProGAN progressively
increases the size of the
images generated as
training progresses
Allows stable learning of
much higher quality
images than previous
approaches

Source: Karras et al (2018)

Discover the world at Leiden University


High Quality Image Generation
ProGAN progressively
increases the size of the
images generated as
training progresses
Allows stable learning of
much higher quality
images than previous
approaches
Demonstrated ability of
GANs to work with high
quality images

Source: Karras et al (2018)

Discover the world at Leiden University


This Person Does Not Exist

Source: https://thispersondoesnotexist.com

Discover the world at Leiden University


High Quality Image Generation
BigGAN
Combined multiple
improvements and large-
scale training to build a
large model of images.

Source: Brock, Donahue and Simonyan, 2018

Discover the world at Leiden University


High Quality Image Generation
BigGAN
Combined multiple
improvements and large-
scale training to build a
large model of images.

Latent Space
Random variables
provided to generator
define a space that can be
sampled to produce
images not in training set.

Source: Brock, Donahue and Simonyan, 2018

Discover the world at Leiden University


Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos

Daytime to nighttime conversion (Source: Isola et al, 2016)

Discover the world at Leiden University


Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos

Semantic image to photo translation (Source: Isola et al, 2016)

Discover the world at Leiden University


Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos

Daytime to nighttime conversion (Source: Isola et al, 2016)

Discover the world at Leiden University


Image-to-Image Translation
pix2pix and other image-
to-image translation GANs
can perform multiple tasks
Semantic images to photos
Satellite photos to maps
Day to night conversion
Black & white to colour
Sketches to photos
Photo in-painting
Thermal to RGB

Sketche to photo conversion (Source: Isola et al, 2016)

Discover the world at Leiden University


Discover the world at Leiden University Learning to see: Gloomy Sunday (Source: Memo Atkin)
Image Super-Resolution
Super-Resolution GANs
have generators that are
trained to convert low
resolution images to high
resolution images
The input to the generator
is a combination of the low
resolution image and a
noise vector

Comparison of image super-resolution (Source: Ledig et al, 2017)

Discover the world at Leiden University


Image Super-Resolution
Super-Resolution (SR)
GANs have generators that
are trained to convert low
resolution images to high
resolution images
The input to the generator
is a combination of the low
resolution image and a
noise vector
They were shown to
outperform the state-of-
the-art: producing sharp
details in SR images

Comparison of image super-resolution (Source: Ledig et al, 2017)

Discover the world at Leiden University


Photo In-painting
Photo in-painting requires
the generator to be
conditioned on an image
with a missing section and
produce a plausible
completed image

Examples of photo in-painting (Source: Pathak et al, 2016)

Discover the world at Leiden University


Photo In-painting
Photo in-painting requires
the generator to be
conditioned on an image
with a missing section and
produce a plausible
completed image
The Context Encoder
model shares many of the
features of a GAN, it is not
referred to in the paper as
a GAN model

Examples of photo in-painting (Source: Pathak et al, 2016)

Discover the world at Leiden University


Next Frame Prediction
FutureGAN is an example
of a GAN with a generator
trained to predict the next
frame in a video
The input to the generator
is conditional on one or
more previous frames and
require to produce the
next frame
FutureGAN builds on
ProGAN and takes a
progressive approach to
training the network

Examples of next frame prediction (See: Aigner and K rner, 2018))

Discover the world at Leiden University



Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise

An architecture for text-to-image generation (Source: Reed et al, 2016)

Discover the world at Leiden University


Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise
Early papers showed
ability to generate low
resolution images

An architecture for text-to-image generation (Source: Reed et al, 2016)

Discover the world at Leiden University


Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise
Early papers showed
ability to generate low
resolution images
StackGAN showed the
output could be improved
using a pair of GANs
Architecture of StackGAN (Source: Zhang et al, 2017)

Discover the world at Leiden University


Text-to-Image Generation
text2image goes further
and learns a mapping from
natural language
descriptions to images
Text is first encoded, e.g.,
with an LSTM, and
combine with noise
Early papers showed
ability to generate low
resolution images
StackGAN showed the
output could be improved
using a pair of GANs
Examples of image improvement in StackGAN (Source: Zhang et al, 2017)

Discover the world at Leiden University


Creative Adversarial Networks (CANs)
Adjust Loss Function to
Produce Novel Styles
Discriminator: Minimise
Real/Fake = Art/Not Art
and Art-style classification
Generator: Maximise
Real/Fake = Art/Not Art
and Style Ambiguity

Source: Creative Adversarial Networks

Discover the world at Leiden University


CAN: Creative Adversarial Networks (Elgammal et al., 2017)
Adjust Loss Function to
Produce Novel Styles
Discriminator: Minimise
Real/Fake = Art/Not Art
and Art-style classification
Generator: Maximise
Real/Fake = Art/Not Art
and Style Ambiguity

Source: Creative Adversarial Networks

Discover the world at Leiden University


GAN Challenges

Discover the world at Leiden University


GAN Challenges
Uninformative Loss
Value of loss is less
informative than in
traditional networks,
making training trickier

Oscillating Loss
The loss of the
discriminator and
generator can start to
oscillate wildly, rather than
exhibiting long-term
stability.

Oscillating Loss (Source: Generative Deep Learning)

Discover the world at Leiden University


Mode Collapse
If the generator finds a
small number of outputs
that fool the discriminator
Pressure on the generator
to produce diverse outputs
reduces dramatically
Generator tends to map
every point in the latent
space to these outputs
Gradient of loss function
collapses to near 0

Mode collapse results in outputs being very similar (Source: Generative Deep Learning)

Discover the world at Leiden University


Tutorial Exercise
Today’s tutorial exercise is to build and
train a GAN on the MNIST dataset
The GAN is a Deep Convolutional GAN,
so is able to learn high level features of the
from the MNIST dataset

The tutorial includes a graded assignment


to apply and extend the approach to a
different dataset
Some suggestions are given for other
small datasets, but even these will require
some experimentation with the
architecture of the GAN to be effective

Discover the world at Leiden University

You might also like