Skip to content

Commit 388f000

Browse files
authored
FIX address comments from forum (INRIA#341)
1 parent f283299 commit 388f000

File tree

1 file changed

+15
-12
lines changed

1 file changed

+15
-12
lines changed

python_scripts/cross_validation_train_test.py

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
# notebook. The target to be predicted is a continuous variable and not anymore
2828
# discrete. This task is called regression.
2929
#
30-
# Therefore, we will use predictive model specific to regression and not to
30+
# This, we will use a predictive model specific to regression and not to
3131
# classification.
3232

3333
# %%
@@ -173,7 +173,10 @@
173173
# record their statistical performance on each variant of the test set.
174174
#
175175
# To evaluate the statistical performance of our regressor, we can use
176-
# `cross_validate` with a `ShuffleSplit` object:
176+
# [`sklearn.model_selection.cross_validate`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html)
177+
# with a
178+
# [`sklearn.model_selection.ShuffleSplit`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html)
179+
# object:
177180

178181
# %%
179182
from sklearn.model_selection import cross_validate
@@ -221,9 +224,9 @@
221224
cv_results.head(10)
222225

223226
# %% [markdown]
224-
# We get timing information to fit and predict at each round of
225-
# cross-validation. Also, we get the test score, which corresponds to the
226-
# testing error on each of the split.
227+
# We get timing information to fit and predict at each cross-validation
228+
# iteration. Also, we get the test score, which corresponds to the testing
229+
# error on each of the splits.
227230

228231
# %%
229232
len(cv_results)
@@ -258,7 +261,7 @@
258261
# 46.36 +/- 1.17 k\$.
259262
#
260263
# If we were to train a single model on the full dataset (without
261-
# cross-validation) and then had later access to an unlimited amount of test
264+
# cross-validation) and then later had access to an unlimited amount of test
262265
# data, we would expect its true testing error to fall close to that
263266
# region.
264267
#
@@ -281,7 +284,7 @@
281284
#
282285
# We notice that the mean estimate of the testing error obtained by
283286
# cross-validation is a bit smaller than the natural scale of variation of the
284-
# target variable. Furthermore the standard deviation of the cross validation
287+
# target variable. Furthermore, the standard deviation of the cross validation
285288
# estimate of the testing error is even smaller.
286289
#
287290
# This is a good start, but not necessarily enough to decide whether the
@@ -298,15 +301,15 @@
298301
# mean absolute percentage error would have been a much better choice.
299302
#
300303
# But in all cases, an error of 47 k\$ might be too large to automatically use
301-
# our model to tag house value without expert supervision.
304+
# our model to tag house values without expert supervision.
302305
#
303306
# ## More detail regarding `cross_validate`
304307
#
305308
# During cross-validation, many models are trained and evaluated. Indeed, the
306309
# number of elements in each array of the output of `cross_validate` is a
307-
# result from one of this `fit`/`score`. To make it explicit, it is possible
308-
# to retrieve theses fitted models for each of the fold by passing the option
309-
# `return_estimator=True` in `cross_validate`.
310+
# result from one of these `fit`/`score` procedures. To make it explicit, it is
311+
# possible to retrieve theses fitted models for each of the splits/folds by
312+
# passing the option `return_estimator=True` in `cross_validate`.
310313

311314
# %%
312315
cv_results = cross_validate(regressor, data, target, return_estimator=True)
@@ -321,7 +324,7 @@
321324
# because it allows to inspect the internal fitted parameters of these
322325
# regressors.
323326
#
324-
# In the case where you are interested only about the test score, scikit-learn
327+
# In the case where you only are interested in the test score, scikit-learn
325328
# provide a `cross_val_score` function. It is identical to calling the
326329
# `cross_validate` function and to select the `test_score` only (as we
327330
# extensively did in the previous notebooks).

0 commit comments

Comments
 (0)