Skip to content

Commit 9b183a8

Browse files
committed
Merge pull request scikit-learn#29 from glouppe/adaboost
FIX: some of Gael comments
2 parents 6711dfc + 8ecc504 commit 9b183a8

File tree

5 files changed

+22
-24
lines changed

5 files changed

+22
-24
lines changed

doc/modules/ensemble.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,7 @@ concentrate on the examples that are missed by the previous ones in the sequence
281281
AdaBoost can be used both for classification and regression problems:
282282

283283
- For multi-class classification, :class:`AdaBoostClassifier` implements
284-
AdaBoost-SAMME [ZZRH2009]_.
284+
AdaBoost-SAMME and AdaBoost-SAMME.R [ZZRH2009]_.
285285

286286
- For regression, :class:`AdaBoostRegressor` implements AdaBoost.R2 [D1997]_.
287287

examples/ensemble/plot_adaboost_multiclass.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,15 @@
1010
spheres such that roughly equal numbers of samples are in each class (quantiles
1111
of the :math:`\chi^2` distribution).
1212
13-
The performance of the SAMME and SAMME.R [1] algorithms are compared.
14-
The error of each algorithm on the test set after each boosting iteration is
15-
shown on the left, the classification error on the test set of each tree is
16-
shown in the middle, and the boost weight of each tree is shown on the right.
17-
All trees have a weight of one in the SAMME.R algorithm and therefore are not
18-
shown.
13+
The performance of the SAMME and SAMME.R [1] algorithms are compared. SAMME.R
14+
uses the probability estimates to update the additive model, while SAMME uses
15+
the classifications only. As the example illustrates, the SAMME.R algorithm
16+
typically converges faster than SAMME, achieving a lower test error with fewer
17+
boosting iterations. The error of each algorithm on the test set after each
18+
boosting iteration is shown on the left, the classification error on the test
19+
set of each tree is shown in the middle, and the boost weight of each tree is
20+
shown on the right. All trees have a weight of one in the SAMME.R algorithm and
21+
therefore are not shown.
1922
2023
.. [1] J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class AdaBoost", 2009.
2124

sklearn/datasets/samples_generator.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ def make_hastie_10_2(n_samples=12000, random_state=None):
331331
The ten features are standard independent Gaussian and
332332
the target ``y`` is defined by::
333333
334-
y[i] = 1 if np.sum(X[i]**2) > 9.34 else -1
334+
y[i] = 1 if np.sum(X[i] ** 2) > 9.34 else -1
335335
336336
Parameters
337337
----------
@@ -1291,9 +1291,8 @@ def make_gaussian_quantiles(mean=None, cov=1., n_samples=100,
12911291
# Label by quantile
12921292
step = n_samples // n_classes
12931293

1294-
y = np.hstack([
1295-
np.repeat(np.arange(n_classes), step),
1296-
np.repeat(n_classes - 1, n_samples - step * n_classes)])
1294+
y = np.hstack([np.repeat(np.arange(n_classes), step),
1295+
np.repeat(n_classes - 1, n_samples - step * n_classes)])
12971296

12981297
if shuffle:
12991298
X, y = util_shuffle(X, y, random_state=generator)

sklearn/ensemble/tests/test_weight_boosting.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
from nose.tools import assert_true
1010
from nose.tools import assert_raises
1111

12+
from sklearn.dummy import DummyClassifier
13+
from sklearn.dummy import DummyRegressor
1214
from sklearn.grid_search import GridSearchCV
1315
from sklearn.ensemble import AdaBoostClassifier
1416
from sklearn.ensemble import AdaBoostRegressor
@@ -179,10 +181,6 @@ def test_importances():
179181

180182
def test_error():
181183
"""Test that it gives proper exception on deficient input."""
182-
from sklearn.dummy import DummyClassifier
183-
from sklearn.dummy import DummyRegressor
184-
185-
# Invalid values for parameters
186184
assert_raises(ValueError,
187185
AdaBoostClassifier(learning_rate=-1).fit,
188186
X, y)

sklearn/ensemble/weight_boosting.py

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -438,15 +438,13 @@ def _boost_real(self, iboost, X, y, sample_weight):
438438
if estimator_error <= 0:
439439
return sample_weight, 1., 0.
440440

441-
"""
442-
Construct y coding as described in Zhu et al [2]::
443-
444-
y_k = 1 if c == k else -1 / (K - 1)
445-
446-
where K == n_classes_ and c, k in [0, K) are indices along the second
447-
axis of the y coding with c being the index corresponding to the true
448-
class label.
449-
"""
441+
# Construct y coding as described in Zhu et al [2]:
442+
#
443+
# y_k = 1 if c == k else -1 / (K - 1)
444+
#
445+
# where K == n_classes_ and c, k in [0, K) are indices along the second
446+
# axis of the y coding with c being the index corresponding to the true
447+
# class label.
450448
n_classes = self.n_classes_
451449
classes = self.classes_
452450
y_codes = np.array([-1. / (n_classes - 1), 1.])

0 commit comments

Comments
 (0)