Skip to content

Commit a99e49c

Browse files
committed
starting explaining PCA
1 parent 8b1b194 commit a99e49c

File tree

1 file changed

+33
-1
lines changed

1 file changed

+33
-1
lines changed

tutorial/general_concepts.rst

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,11 +379,43 @@ not use any kind of labels.
379379
An unsupervised learning model will try to fit its parameters so
380380
as to best summarize regularities found in the data.
381381

382+
The following introduces the main variants of unsupervised learning
383+
algorithms.
384+
382385

383386
Dimensionality Reduction and visualization
384387
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
385388

386-
TODO
389+
Dimensionality reduction the task to **derive a set of new artificial
390+
features that is smaller than the original feature set while retaining
391+
most of the variance of the original data**.
392+
393+
The most common technique for dimensionality reduction is called
394+
**Principal Component Analysis**.
395+
396+
PCA can be done using linear combinations of the original features
397+
using a truncated Singular Value Decomposition of the matrix ``X``
398+
so as to project the data onto a base of the top singular vectors.
399+
400+
If the number of retained components is 2 or 3, PCA can be used to
401+
visualize the dataset::
402+
403+
404+
>>> from scikits.learn.pca import RandomizedPCA
405+
>>> pca = RandomizedPCA(3, whiten=True).fit(X)
406+
407+
Once fitted, the ``pca`` model exposes the singular vectors as in the
408+
``components_`` attribute::
409+
410+
>>> pca.components_.T
411+
array([[ 0.17650757, -0.04015901, 0.41812992, 0.17516725],
412+
[ 1.33840478, 1.48757227, -0.35831476, -0.15229463],
413+
[-2.08029843, 2.13551363, 0.25967715, 1.96594819]])
414+
415+
Let us project the iris dataset along those first 3 dimensions::
416+
417+
>>> X_pca = pca.transform(X)
418+
387419

388420

389421
Clustering

0 commit comments

Comments
 (0)