Skip to content

Commit a715a9f

Browse files
committed
Merge branch 'dfsg' into debian
* dfsg: Corrected macro ROC in example plot_roc BF: FIX OvR decision_function_shape in SVC Fix fit_transform, stability issue and scale issue in PLS FIX class_weight in LogisticRegression and LogisticRegressionCV FIX MaxAbsScaler on sparse matrices with 1 row Deprecate residues_ in LinearRegression Addressed comments on PR scikit-learn#5451 MAINT Removed deprecated stuff. MAINT disable circle ci on 0.17.X MAINT: deprecation warns from StandardScaler std_ FIX: remove shuffling in LabelKFold FIX skip LDA deprecation test on python3.3 that has no reload. Fix broken examples using RandomTreeEmbeddings MAINT Use the full listing of the rackspace wheelhouse for appveyor
2 parents 918005f + e87f203 commit a715a9f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+407
-454
lines changed

circle.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
general:
2+
# Restric the build to the branch master only
3+
branches:
4+
only:
5+
- master

continuous_integration/appveyor/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Those wheels were collected from http://www.lfd.uci.edu/~gohlke/pythonlibs/
33
# This is a temporary solution. As soon as numpy and scipy provide official
44
# wheel for windows we ca delete this --find-links line.
5-
--find-links http://28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com/index.html
5+
--find-links http://28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com/
66

77
# fix the versions of numpy to force the use of numpy and scipy to use the whl
88
# of the rackspace folder instead of trying to install from more recent

doc/whats_new.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,10 @@ New features
6464
shuffling step in the ``cd`` solver.
6565
By `Tom Dupre la Tour`_ and `Mathieu Blondel`_.
6666

67+
- **IndexError** bug `#5495
68+
<https://github.com/scikit-learn/scikit-learn/issues/5495>`_ when
69+
doing OVR(SVC(decision_function_shape="ovr")). Fixed by `Elvis Dohmatob`_.
70+
6771
Enhancements
6872
............
6973
- :class:`manifold.TSNE` now supports approximate optimization via the
@@ -280,6 +284,10 @@ Bug fixes
280284
<https://github.com/scikit-learn/scikit-learn/pull/4478>`_)
281285
By `Andreas Müller`_, `Loic Esteve`_ and `Giorgio Patrini`_.
282286

287+
- Fixed bug in :class:`cross_decomposition.PLS` that yielded unstable and
288+
platform dependent output, and failed on `fit_transform`.
289+
By `Arthur Mensch`_.
290+
283291
API changes summary
284292
-------------------
285293
- Attribute `data_min`, `data_max` and `data_range` in
@@ -3766,3 +3774,5 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
37663774
.. _Jean Kossaifi: https://github.com/JeanKossaifi
37673775
.. _Andrew Lamb: https://github.com/andylamb
37683776
.. _Graham Clenaghan: https://github.com/gclenaghan
3777+
.. _Giorgio Patrini: https://github.com/giorgiop
3778+
.. _Elvis Dohmatob: https://github.com/dohmatob

examples/ensemble/plot_feature_transformation.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,10 @@
3434
from sklearn.linear_model import LogisticRegression
3535
from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier,
3636
GradientBoostingClassifier)
37-
from sklearn.feature_selection import SelectFromModel
3837
from sklearn.preprocessing import OneHotEncoder
3938
from sklearn.cross_validation import train_test_split
4039
from sklearn.metrics import roc_curve
40+
from sklearn.pipeline import make_pipeline
4141

4242
n_estimator = 10
4343
X, y = make_classification(n_samples=80000)
@@ -51,13 +51,13 @@
5151
test_size=0.5)
5252

5353
# Unsupervised transformation based on totally random trees
54-
rt = RandomTreesEmbedding(max_depth=3, n_estimators=n_estimator)
55-
rt_lm = LogisticRegression()
56-
rt.fit(X_train, y_train)
57-
rt_lm.fit(SelectFromModel(rt, prefit=True).transform(X_train_lr), y_train_lr)
54+
rt = RandomTreesEmbedding(max_depth=3, n_estimators=n_estimator,
55+
random_state=0)
5856

59-
y_pred_rt = rt_lm.predict_proba(
60-
SelectFromModel(rt, prefit=True).transform(X_test))[:, 1]
57+
rt_lm = LogisticRegression()
58+
pipeline = make_pipeline(rt, rt_lm)
59+
pipeline.fit(X_train, y_train)
60+
y_pred_rt = pipeline.predict_proba(X_test)[:, 1]
6161
fpr_rt_lm, tpr_rt_lm, _ = roc_curve(y_test, y_pred_rt)
6262

6363
# Supervised transformation based on random forests

examples/ensemble/plot_random_forest_embedding.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,14 @@
3030
from sklearn.datasets import make_circles
3131
from sklearn.ensemble import RandomTreesEmbedding, ExtraTreesClassifier
3232
from sklearn.decomposition import TruncatedSVD
33-
from sklearn.feature_selection import SelectFromModel
3433
from sklearn.naive_bayes import BernoulliNB
3534

3635
# make a synthetic dataset
3736
X, y = make_circles(factor=0.5, random_state=0, noise=0.05)
3837

3938
# use RandomTreesEmbedding to transform data
4039
hasher = RandomTreesEmbedding(n_estimators=10, random_state=0, max_depth=3)
41-
hasher.fit(X)
42-
model = SelectFromModel(hasher, prefit=True)
43-
X_transformed = model.transform(X)
40+
X_transformed = hasher.fit_transform(X)
4441

4542
# Visualize result using PCA
4643
pca = TruncatedSVD(n_components=2)

examples/model_selection/plot_roc.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
from sklearn.cross_validation import train_test_split
4545
from sklearn.preprocessing import label_binarize
4646
from sklearn.multiclass import OneVsRestClassifier
47+
from scipy import interp
4748

4849
# Import some data to play with
4950
iris = datasets.load_iris()
@@ -99,10 +100,23 @@
99100
# Plot ROC curves for the multiclass problem
100101

101102
# Compute macro-average ROC curve and ROC area
102-
fpr["macro"] = np.mean([fpr[i] for i in range(n_classes)], axis=0)
103-
tpr["macro"] = np.mean([tpr[i] for i in range(n_classes)], axis=0)
103+
104+
# First aggregate all false positive rates
105+
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))
106+
107+
# Then interpolate all ROC curves at this points
108+
mean_tpr = np.zeros_like(all_fpr)
109+
for i in range(n_classes):
110+
mean_tpr += interp(all_fpr, fpr[i], tpr[i])
111+
112+
# Finally average it and compute AUC
113+
mean_tpr /= n_classes
114+
115+
fpr["macro"] = all_fpr
116+
tpr["macro"] = mean_tpr
104117
roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])
105118

119+
# Plot all ROC curves
106120
plt.figure()
107121
plt.plot(fpr["micro"], tpr["micro"],
108122
label='micro-average ROC curve (area = {0:0.2f})'

sklearn/cluster/_feature_agglomeration.py

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,6 @@
1111
from ..utils import check_array
1212
from ..utils.validation import check_is_fitted
1313

14-
import warnings
15-
16-
1714
###############################################################################
1815
# Mixin class for feature agglomeration.
1916

@@ -24,7 +21,7 @@ class AgglomerationTransform(TransformerMixin):
2421

2522
pooling_func = np.mean
2623

27-
def transform(self, X, pooling_func=None):
24+
def transform(self, X):
2825
"""
2926
Transform a new matrix using the built clustering
3027
@@ -34,25 +31,14 @@ def transform(self, X, pooling_func=None):
3431
A M by N array of M observations in N dimensions or a length
3532
M array of M one-dimensional observations.
3633
37-
pooling_func : callable, default=np.mean
38-
This combines the values of agglomerated features into a single
39-
value, and should accept an array of shape [M, N] and the keyword
40-
argument `axis=1`, and reduce it to an array of size [M].
41-
4234
Returns
4335
-------
4436
Y : array, shape = [n_samples, n_clusters] or [n_clusters]
4537
The pooled values for each feature cluster.
4638
"""
4739
check_is_fitted(self, "labels_")
4840

49-
if pooling_func is not None:
50-
warnings.warn("The pooling_func parameter is deprecated since 0.15 "
51-
"and will be removed in 0.18. "
52-
"Pass it to the constructor instead.",
53-
DeprecationWarning)
54-
else:
55-
pooling_func = self.pooling_func
41+
pooling_func = self.pooling_func
5642
X = check_array(X)
5743
nX = []
5844
if len(self.labels_) != X.shape[1]:

sklearn/cluster/dbscan_.py

Lines changed: 2 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@
99
#
1010
# License: BSD 3 clause
1111

12-
import warnings
13-
1412
import numpy as np
1513
from scipy import sparse
1614

@@ -24,8 +22,7 @@
2422

2523

2624
def dbscan(X, eps=0.5, min_samples=5, metric='minkowski',
27-
algorithm='auto', leaf_size=30, p=2, sample_weight=None,
28-
random_state=None):
25+
algorithm='auto', leaf_size=30, p=2, sample_weight=None):
2926
"""Perform DBSCAN clustering from vector array or distance matrix.
3027
3128
Read more in the :ref:`User Guide <dbscan>`.
@@ -75,10 +72,6 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski',
7572
weight may inhibit its eps-neighbor from being core.
7673
Note that weights are absolute, and default to 1.
7774
78-
random_state: numpy.RandomState, optional
79-
Deprecated and ignored as of version 0.16, will be removed in version
80-
0.18. DBSCAN does not use random initialization.
81-
8275
Returns
8376
-------
8477
core_samples : array [n_core_samples]
@@ -109,11 +102,6 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski',
109102
"""
110103
if not eps > 0.0:
111104
raise ValueError("eps must be positive.")
112-
if random_state is not None:
113-
warnings.warn("The parameter random_state is deprecated in 0.16 "
114-
"and will be removed in version 0.18. "
115-
"DBSCAN is deterministic except for rare border cases.",
116-
category=DeprecationWarning)
117105

118106
X = check_array(X, accept_sparse='csr')
119107
if sample_weight is not None:
@@ -195,9 +183,6 @@ class DBSCAN(BaseEstimator, ClusterMixin):
195183
of the construction and query, as well as the memory required
196184
to store the tree. The optimal value depends
197185
on the nature of the problem.
198-
random_state: numpy.RandomState, optional
199-
Deprecated and ignored as of version 0.16, will be removed in version
200-
0.18. DBSCAN does not use random initialization.
201186
202187
Attributes
203188
----------
@@ -233,14 +218,13 @@ class DBSCAN(BaseEstimator, ClusterMixin):
233218
"""
234219

235220
def __init__(self, eps=0.5, min_samples=5, metric='euclidean',
236-
algorithm='auto', leaf_size=30, p=None, random_state=None):
221+
algorithm='auto', leaf_size=30, p=None):
237222
self.eps = eps
238223
self.min_samples = min_samples
239224
self.metric = metric
240225
self.algorithm = algorithm
241226
self.leaf_size = leaf_size
242227
self.p = p
243-
self.random_state = random_state
244228

245229
def fit(self, X, y=None, sample_weight=None):
246230
"""Perform DBSCAN clustering from features or distance matrix.

sklearn/cluster/hierarchical.py

Lines changed: 3 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,7 @@ def _fix_connectivity(X, connectivity, n_components=None,
8686
###############################################################################
8787
# Hierarchical tree building functions
8888

89-
def ward_tree(X, connectivity=None, n_components=None, n_clusters=None,
90-
return_distance=False):
89+
def ward_tree(X, connectivity=None, n_clusters=None, return_distance=False):
9190
"""Ward clustering based on a Feature matrix.
9291
9392
Recursively merges the pair of clusters that minimally increases
@@ -111,12 +110,6 @@ def ward_tree(X, connectivity=None, n_components=None, n_clusters=None,
111110
be symmetric and only the upper triangular half is used.
112111
Default is None, i.e, the Ward algorithm is unstructured.
113112
114-
n_components : int (optional)
115-
Number of connected components. If None the number of connected
116-
components is estimated from the connectivity matrix.
117-
NOTE: This parameter is now directly determined directly
118-
from the connectivity matrix and will be removed in 0.18
119-
120113
n_clusters : int (optional)
121114
Stop early the construction of the tree at n_clusters. This is
122115
useful to decrease computation time if the number of clusters is
@@ -199,11 +192,6 @@ def ward_tree(X, connectivity=None, n_components=None, n_clusters=None,
199192
else:
200193
return children_, 1, n_samples, None
201194

202-
if n_components is not None:
203-
warnings.warn(
204-
"n_components is now directly calculated from the connectivity "
205-
"matrix and will be removed in 0.18",
206-
DeprecationWarning)
207195
connectivity, n_components = _fix_connectivity(X, connectivity)
208196
if n_clusters is None:
209197
n_nodes = 2 * n_samples - 1
@@ -326,12 +314,6 @@ def linkage_tree(X, connectivity=None, n_components=None,
326314
be symmetric and only the upper triangular half is used.
327315
Default is None, i.e, the Ward algorithm is unstructured.
328316
329-
n_components : int (optional)
330-
Number of connected components. If None the number of connected
331-
components is estimated from the connectivity matrix.
332-
NOTE: This parameter is now directly determined directly
333-
from the connectivity matrix and will be removed in 0.18
334-
335317
n_clusters : int (optional)
336318
Stop early the construction of the tree at n_clusters. This is
337319
useful to decrease computation time if the number of clusters is
@@ -435,11 +417,6 @@ def linkage_tree(X, connectivity=None, n_components=None,
435417
return children_, 1, n_samples, None, distances
436418
return children_, 1, n_samples, None
437419

438-
if n_components is not None:
439-
warnings.warn(
440-
"n_components is now directly calculated from the connectivity "
441-
"matrix and will be removed in 0.18",
442-
DeprecationWarning)
443420
connectivity, n_components = _fix_connectivity(X, connectivity)
444421

445422
connectivity = connectivity.tocoo()
@@ -636,12 +613,6 @@ class AgglomerativeClustering(BaseEstimator, ClusterMixin):
636613
By default, no caching is done. If a string is given, it is the
637614
path to the caching directory.
638615
639-
n_components : int (optional)
640-
Number of connected components. If None the number of connected
641-
components is estimated from the connectivity matrix.
642-
NOTE: This parameter is now directly determined from the connectivity
643-
matrix and will be removed in 0.18
644-
645616
compute_full_tree : bool or 'auto' (optional)
646617
Stop early the construction of the tree at n_clusters. This is
647618
useful to decrease computation time if the number of clusters is
@@ -689,12 +660,10 @@ class AgglomerativeClustering(BaseEstimator, ClusterMixin):
689660

690661
def __init__(self, n_clusters=2, affinity="euclidean",
691662
memory=Memory(cachedir=None, verbose=0),
692-
connectivity=None, n_components=None,
693-
compute_full_tree='auto', linkage='ward',
694-
pooling_func=np.mean):
663+
connectivity=None, compute_full_tree='auto',
664+
linkage='ward', pooling_func=np.mean):
695665
self.n_clusters = n_clusters
696666
self.memory = memory
697-
self.n_components = n_components
698667
self.connectivity = connectivity
699668
self.compute_full_tree = compute_full_tree
700669
self.linkage = linkage
@@ -760,7 +729,6 @@ def fit(self, X, y=None):
760729
kwargs['affinity'] = self.affinity
761730
self.children_, self.n_components_, self.n_leaves_, parents = \
762731
memory.cache(tree_builder)(X, connectivity,
763-
n_components=self.n_components,
764732
n_clusters=n_clusters,
765733
**kwargs)
766734
# Cut the tree
@@ -807,12 +775,6 @@ class FeatureAgglomeration(AgglomerativeClustering, AgglomerationTransform):
807775
By default, no caching is done. If a string is given, it is the
808776
path to the caching directory.
809777
810-
n_components : int (optional)
811-
Number of connected components. If None the number of connected
812-
components is estimated from the connectivity matrix.
813-
NOTE: This parameter is now directly determined from the connectivity
814-
matrix and will be removed in 0.18
815-
816778
compute_full_tree : bool or 'auto', optional, default "auto"
817779
Stop early the construction of the tree at n_clusters. This is
818780
useful to decrease computation time if the number of clusters is

sklearn/cluster/mean_shift_.py

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def _mean_shift_single_seed(my_mean, X, nbrs, max_iter):
9393

9494
def mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False,
9595
min_bin_freq=1, cluster_all=True, max_iter=300,
96-
max_iterations=None, n_jobs=1):
96+
n_jobs=1):
9797
"""Perform mean shift clustering of data using a flat kernel.
9898
9999
Read more in the :ref:`User Guide <mean_shift>`.
@@ -161,12 +161,6 @@ def mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False,
161161
See examples/cluster/plot_meanshift.py for an example.
162162
163163
"""
164-
# FIXME To be removed in 0.18
165-
if max_iterations is not None:
166-
warnings.warn("The `max_iterations` parameter has been renamed to "
167-
"`max_iter` from version 0.16. The `max_iterations` "
168-
"parameter will be removed in 0.18", DeprecationWarning)
169-
max_iter = max_iterations
170164

171165
if bandwidth is None:
172166
bandwidth = estimate_bandwidth(X)

0 commit comments

Comments
 (0)