Skip to content

Commit b6400d0

Browse files
authored
add GANs
1 parent 8be551f commit b6400d0

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -473,6 +473,7 @@ and learns from new input (input node * input gate)
473473
* [Clustering](#clustering)
474474
* [Principal Component Analysis](#principal-component-analysis)
475475
* [Autoencoder](#autoencoder)
476+
* [Generative Adversarial Network](#generative-adversarial-network)
476477

477478
### Clustering
478479

@@ -490,6 +491,7 @@ and the algorithm iteratively finds the cluster each data point belongs to
490491

491492
[back to top](#data-science-question-answer)
492493

494+
493495
### Principal Component Analysis
494496

495497
* Principal Component Analysis (PCA) is a dimension reduction technique that projects
@@ -517,10 +519,30 @@ Here is a visual explanation of PCA
517519
* An autoencoder always consists of two parts, the encoder and the decoder. The encoder would find a lower dimension representation (latent variable) of the original input, while the decoder is used to reconstruct from the lower-dimension vector such that the distance between the original and reconstruction is minimized
518520
* Can be used for data denoising and dimensionality reduction
519521

520-
521522
![](assets/autoencoder.png)
522523

523524

525+
### Generative Adversarial Network
526+
527+
* Generative Adversarial Network (GAN) is an unsupervised learning algorithm that also has supervised flavor: using supervised loss as part of training
528+
* GAN typically has two major components: the **generator** and the **discriminator**. The generator tries to generate "fake" data (e.g, images or sentences) that fool the discriminator into thinking that they're real, while the discriminator tries to distinguish between real and generated data. It's a fight between the two players thus the name adversarial, and this fight drives both sides to improve until "fake" data are indistinguishable from the real data
529+
* How does it work, intuitively
530+
- The generator takes a **random** input and generates a sample of data
531+
- The discriminator then either takes the generated sample or a real data sample, and tries to predict whether the input is real or generated (i.e., solving a binary classification problem)
532+
- Given a truth score range of [0, 1], ideally the we'd love to see discriminator give low score to generated data but high score to real data. On the other hand, we also wanna see the generated data fool the discriminator. And this paradox drives both sides become stronger
533+
* How does it work, from a training perspective
534+
- Without training, the generator creates 'garbage' data only while the discriminator is too 'innocent' to tell the difference between fake and real data
535+
- Usually we would first train the discriminator with both real (label 1) and generated data (label 0) for N epochs so it would have a good judgement of what is real vs. fake
536+
- Then we **set the discriminator non-trainable**, and train the generator. Even though the discriminator is non-trainable at this stage, we still use it as a classifier so that **error signals can be back propagated and therefore enable the generator to learn**
537+
- The above two steps would continue in turn until both sides cannot be improved further
538+
* Here are some [tips and tricks to make GANs work](https://github.com/soumith/ganhacks)
539+
* One Caveat is that the **adversarial part is only auxiliary: The end goal of using GAN is to generate data that even experts cannot tell if it's real or fake**
540+
541+
![gan](assets/gan.jpg)
542+
543+
[back to top](#data-science-question-answer)
544+
545+
524546
## Reinforcement Learning
525547

526548
[TODO]

0 commit comments

Comments
 (0)