You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -490,6 +491,7 @@ and the algorithm iteratively finds the cluster each data point belongs to
490
491
491
492
[back to top](#data-science-question-answer)
492
493
494
+
493
495
### Principal Component Analysis
494
496
495
497
* Principal Component Analysis (PCA) is a dimension reduction technique that projects
@@ -517,10 +519,30 @@ Here is a visual explanation of PCA
517
519
* An autoencoder always consists of two parts, the encoder and the decoder. The encoder would find a lower dimension representation (latent variable) of the original input, while the decoder is used to reconstruct from the lower-dimension vector such that the distance between the original and reconstruction is minimized
518
520
* Can be used for data denoising and dimensionality reduction
519
521
520
-
521
522

522
523
523
524
525
+
### Generative Adversarial Network
526
+
527
+
* Generative Adversarial Network (GAN) is an unsupervised learning algorithm that also has supervised flavor: using supervised loss as part of training
528
+
* GAN typically has two major components: the **generator** and the **discriminator**. The generator tries to generate "fake" data (e.g, images or sentences) that fool the discriminator into thinking that they're real, while the discriminator tries to distinguish between real and generated data. It's a fight between the two players thus the name adversarial, and this fight drives both sides to improve until "fake" data are indistinguishable from the real data
529
+
* How does it work, intuitively
530
+
- The generator takes a **random** input and generates a sample of data
531
+
- The discriminator then either takes the generated sample or a real data sample, and tries to predict whether the input is real or generated (i.e., solving a binary classification problem)
532
+
- Given a truth score range of [0, 1], ideally the we'd love to see discriminator give low score to generated data but high score to real data. On the other hand, we also wanna see the generated data fool the discriminator. And this paradox drives both sides become stronger
533
+
* How does it work, from a training perspective
534
+
- Without training, the generator creates 'garbage' data only while the discriminator is too 'innocent' to tell the difference between fake and real data
535
+
- Usually we would first train the discriminator with both real (label 1) and generated data (label 0) for N epochs so it would have a good judgement of what is real vs. fake
536
+
- Then we **set the discriminator non-trainable**, and train the generator. Even though the discriminator is non-trainable at this stage, we still use it as a classifier so that **error signals can be back propagated and therefore enable the generator to learn**
537
+
- The above two steps would continue in turn until both sides cannot be improved further
538
+
* Here are some [tips and tricks to make GANs work](https://github.com/soumith/ganhacks)
539
+
* One Caveat is that the **adversarial part is only auxiliary: The end goal of using GAN is to generate data that even experts cannot tell if it's real or fake**
0 commit comments