Skip to content

Commit 4651e08

Browse files
committed
added takeway points section
1 parent 74e225a commit 4651e08

File tree

1 file changed

+35
-0
lines changed

1 file changed

+35
-0
lines changed

tutorial/general_concepts.rst

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -677,3 +677,38 @@ Training set, test sets and overfitting
677677
TODO
678678

679679

680+
Main Takeway points
681+
-------------------
682+
683+
- Start by extracting feature vector ``X`` with shape
684+
``(n_samples, n_features)``
685+
686+
- Metrics in feature space should try to preserve the intuitive pairwise
687+
"closeness" of samples
688+
689+
- Supervised learning: ``clf.fit(X, y)`` and then ``clf.predict(X_new)``
690+
691+
- classification: ``y`` is an array of integers
692+
693+
- regression: ``y`` is an array of floats
694+
695+
- Unsupervised learning: ``clf.fit(X)``
696+
697+
- dimensionality reduction with ``clf.transform(X_new)``
698+
699+
- clustering to find group id for each sample
700+
701+
- Some models work much better with data normalized with PCA
702+
703+
- Simple linear models can fail completely (non linearly separable data)
704+
705+
- Simple linear models often very useful in practice (esp. with
706+
large ``n_features``)
707+
708+
- Before starting training models, split train / test data:
709+
710+
- use training set for model selection and fitting
711+
712+
- use test set for model evaluation
713+
714+

0 commit comments

Comments
 (0)