@@ -208,25 +208,6 @@ Practical implementations of such feature extraction strategies
208208will be presented in the last sections of this tutorial.
209209
210210
211- How to devise a "good" feature extraction strategy
212- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213-
214- The feature extraction strategy both depends on the task we are
215- trying to perform and the nature of the collected data. Therefore
216- there is no formal rule to define which strategy is the best.
217-
218- A good rule of thumb is to imagine a human-being performing the
219- task the machine is trying to accomplish using only the numerical
220- features provided to the machine.
221-
222- Usually the feature extraction is useful if and only if two samples
223- **judged similar in real life ** by the human-being are **close
224- according to some similarity metric of the feature space **.
225-
226- In other words, the feature extraction strategy must somehow preserve
227- the intuitive topology of the sample set.
228-
229-
230211Supervised Learning: ``model.fit(X, y) ``
231212----------------------------------------
232213
@@ -279,6 +260,13 @@ of thereof)::
279260 >>> from scikits.learn.svm import LinearSVC
280261 >>> clf = LinearSVC()
281262
263+ .. note ::
264+
265+ Whenever you import a scikit-learn class or function of the first time,
266+ you are advised to read the docstring by using the ``? `` magic suffix
267+ of ipython, for instance type: ``LinearSVC? ``.
268+
269+
282270``clf `` is a statistical model that has parameters that control the
283271learning algorithm (those parameters are sometimes called the
284272hyper-parameters). Those hyperparameters can be supplied by the
@@ -752,7 +740,7 @@ using for fitting the model:
752740
753741
754742The overfitting issue
755- +++++++++++++++++++++
743+ ~~~~~~~~~~~~~~~~~~~~~
756744
757745The problem lies in the fact that some models can be subject to the
758746**overfitting ** issue: they can **learn the training data by heart **
@@ -769,7 +757,7 @@ whether your model is overfitting or not.
769757
770758
771759Solutions to overfitting
772- ++++++++++++++++++++++++
760+ ~~~~~~~~~~~~~~~~~~~~~~~~
773761
774762The solution to this issue is twofold:
775763
@@ -786,7 +774,7 @@ The solution to this issue is twofold:
786774
787775
788776Measuring classification performance on a test set
789- ++++++++++++++++++++++++++++++++++++++++++++++++++
777+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
790778
791779Here is an example on you to split the data on the iris dataset.
792780
@@ -846,9 +834,6 @@ Key takeaway points
846834
847835- Build ``X `` (features vectors) with shape ``(n_samples, n_features) ``
848836
849- - Metrics in feature space should try to preserve the intuitive pairwise
850- "closeness" of samples
851-
852837- Supervised learning: ``clf.fit(X, y) `` and then ``clf.predict(X_new) ``
853838
854839 - Classification: ``y `` is an array of integers
0 commit comments