@@ -379,11 +379,43 @@ not use any kind of labels.
379379An unsupervised learning model will try to fit its parameters so
380380as to best summarize regularities found in the data.
381381
382+ The following introduces the main variants of unsupervised learning
383+ algorithms.
384+
382385
383386Dimensionality Reduction and visualization
384387~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
385388
386- TODO
389+ Dimensionality reduction the task to **derive a set of new artificial
390+ features that is smaller than the original feature set while retaining
391+ most of the variance of the original data **.
392+
393+ The most common technique for dimensionality reduction is called
394+ **Principal Component Analysis **.
395+
396+ PCA can be done using linear combinations of the original features
397+ using a truncated Singular Value Decomposition of the matrix ``X ``
398+ so as to project the data onto a base of the top singular vectors.
399+
400+ If the number of retained components is 2 or 3, PCA can be used to
401+ visualize the dataset::
402+
403+
404+ >>> from scikits.learn.pca import RandomizedPCA
405+ >>> pca = RandomizedPCA(3, whiten=True).fit(X)
406+
407+ Once fitted, the ``pca `` model exposes the singular vectors as in the
408+ ``components_ `` attribute::
409+
410+ >>> pca.components_.T
411+ array([[ 0.17650757, -0.04015901, 0.41812992, 0.17516725],
412+ [ 1.33840478, 1.48757227, -0.35831476, -0.15229463],
413+ [-2.08029843, 2.13551363, 0.25967715, 1.96594819]])
414+
415+ Let us project the iris dataset along those first 3 dimensions::
416+
417+ >>> X_pca = pca.transform(X)
418+
387419
388420
389421Clustering
0 commit comments