File tree Expand file tree Collapse file tree 1 file changed +35
-0
lines changed Expand file tree Collapse file tree 1 file changed +35
-0
lines changed Original file line number Diff line number Diff line change @@ -677,3 +677,38 @@ Training set, test sets and overfitting
677677TODO
678678
679679
680+ Main Takeway points
681+ -------------------
682+
683+ - Start by extracting feature vector ``X `` with shape
684+ ``(n_samples, n_features) ``
685+
686+ - Metrics in feature space should try to preserve the intuitive pairwise
687+ "closeness" of samples
688+
689+ - Supervised learning: ``clf.fit(X, y) `` and then ``clf.predict(X_new) ``
690+
691+ - classification: ``y `` is an array of integers
692+
693+ - regression: ``y `` is an array of floats
694+
695+ - Unsupervised learning: ``clf.fit(X) ``
696+
697+ - dimensionality reduction with ``clf.transform(X_new) ``
698+
699+ - clustering to find group id for each sample
700+
701+ - Some models work much better with data normalized with PCA
702+
703+ - Simple linear models can fail completely (non linearly separable data)
704+
705+ - Simple linear models often very useful in practice (esp. with
706+ large ``n_features ``)
707+
708+ - Before starting training models, split train / test data:
709+
710+ - use training set for model selection and fitting
711+
712+ - use test set for model evaluation
713+
714+
You can’t perform that action at this time.
0 commit comments