Skip to content

Commit e504ea7

Browse files
vachandaglemaitre
authored andcommitted
DOC fix FeatureAgglomeration and MiniBatchKMeans docstring following sklearn guideline (scikit-learn#15809)
1 parent 64ac463 commit e504ea7

File tree

2 files changed

+39
-34
lines changed

2 files changed

+39
-34
lines changed

sklearn/cluster/_hierarchical.py

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -716,8 +716,8 @@ class AgglomerativeClustering(ClusterMixin, BaseEstimator):
716716
the full tree. It must be ``True`` if ``distance_threshold`` is not
717717
``None``. By default `compute_full_tree` is "auto", which is equivalent
718718
to `True` when `distance_threshold` is not `None` or that `n_clusters`
719-
is inferior to 100 or `0.02 * n_samples`. Otherwise, "auto" is
720-
equivalent to `False`.
719+
is inferior to the maximum between 100 or `0.02 * n_samples`.
720+
Otherwise, "auto" is equivalent to `False`.
721721
722722
linkage : {"ward", "complete", "average", "single"}, default="ward"
723723
Which linkage criterion to use. The linkage criterion determines which
@@ -924,39 +924,41 @@ class FeatureAgglomeration(AgglomerativeClustering, AgglomerationTransform):
924924
925925
Parameters
926926
----------
927-
n_clusters : int or None, optional (default=2)
927+
n_clusters : int, default=2
928928
The number of clusters to find. It must be ``None`` if
929929
``distance_threshold`` is not ``None``.
930930
931-
affinity : string or callable, default "euclidean"
931+
affinity : str or callable, default='euclidean'
932932
Metric used to compute the linkage. Can be "euclidean", "l1", "l2",
933933
"manhattan", "cosine", or 'precomputed'.
934934
If linkage is "ward", only "euclidean" is accepted.
935935
936-
memory : None, str or object with the joblib.Memory interface, optional
936+
memory : str or object with the joblib.Memory interface, default=None
937937
Used to cache the output of the computation of the tree.
938938
By default, no caching is done. If a string is given, it is the
939939
path to the caching directory.
940940
941-
connectivity : array-like or callable, optional
941+
connectivity : array-like or callable, default=None
942942
Connectivity matrix. Defines for each feature the neighboring
943943
features following a given structure of the data.
944944
This can be a connectivity matrix itself or a callable that transforms
945945
the data into a connectivity matrix, such as derived from
946946
kneighbors_graph. Default is None, i.e, the
947947
hierarchical clustering algorithm is unstructured.
948948
949-
compute_full_tree : bool or 'auto', optional, default "auto"
950-
Stop early the construction of the tree at n_clusters. This is
951-
useful to decrease computation time if the number of clusters is
952-
not small compared to the number of features. This option is
953-
useful only when specifying a connectivity matrix. Note also that
954-
when varying the number of clusters and using caching, it may
955-
be advantageous to compute the full tree. It must be ``True`` if
956-
``distance_threshold`` is not ``None``.
949+
compute_full_tree : 'auto' or bool, optional, default='auto'
950+
Stop early the construction of the tree at n_clusters. This is useful
951+
to decrease computation time if the number of clusters is not small
952+
compared to the number of features. This option is useful only when
953+
specifying a connectivity matrix. Note also that when varying the
954+
number of clusters and using caching, it may be advantageous to compute
955+
the full tree. It must be ``True`` if ``distance_threshold`` is not
956+
``None``. By default `compute_full_tree` is "auto", which is equivalent
957+
to `True` when `distance_threshold` is not `None` or that `n_clusters`
958+
is inferior to the maximum between 100 or `0.02 * n_samples`.
959+
Otherwise, "auto" is equivalent to `False`.
957960
958-
linkage : {"ward", "complete", "average", "single"}, optional\
959-
(default="ward")
961+
linkage : {'ward', 'complete', 'average', 'single'}, default='ward'
960962
Which linkage criterion to use. The linkage criterion determines which
961963
distance to use between sets of features. The algorithm will merge
962964
the pairs of cluster that minimize this criterion.
@@ -969,12 +971,12 @@ class FeatureAgglomeration(AgglomerativeClustering, AgglomerationTransform):
969971
- single uses the minimum of the distances between all observations
970972
of the two sets.
971973
972-
pooling_func : callable, default np.mean
974+
pooling_func : callable, default=np.mean
973975
This combines the values of agglomerated features into a single
974976
value, and should accept an array of shape [M, N] and the keyword
975977
argument `axis=1`, and reduce it to an array of size [M].
976978
977-
distance_threshold : float, optional (default=None)
979+
distance_threshold : float, default=None
978980
The linkage distance threshold above which, clusters will not be
979981
merged. If not ``None``, ``n_clusters`` must be ``None`` and
980982
``compute_full_tree`` must be ``True``.
@@ -988,7 +990,7 @@ class FeatureAgglomeration(AgglomerativeClustering, AgglomerationTransform):
988990
``distance_threshold=None``, it will be equal to the given
989991
``n_clusters``.
990992
991-
labels_ : array-like, (n_features,)
993+
labels_ : array-like of (n_features,)
992994
cluster labels for each feature.
993995
994996
n_leaves_ : int
@@ -997,15 +999,15 @@ class FeatureAgglomeration(AgglomerativeClustering, AgglomerationTransform):
997999
n_connected_components_ : int
9981000
The estimated number of connected components in the graph.
9991001
1000-
children_ : array-like, shape (n_nodes-1, 2)
1002+
children_ : array-like of shape (n_nodes-1, 2)
10011003
The children of each non-leaf node. Values less than `n_features`
10021004
correspond to leaves of the tree which are the original samples.
10031005
A node `i` greater than or equal to `n_features` is a non-leaf
10041006
node and has children `children_[i - n_features]`. Alternatively
10051007
at the i-th iteration, children[i][0] and children[i][1]
10061008
are merged to form node `n_features + i`
10071009
1008-
distances_ : array-like, shape (n_nodes-1,)
1010+
distances_ : array-like of shape (n_nodes-1,)
10091011
Distances between nodes in the corresponding place in `children_`.
10101012
Only computed if distance_threshold is not None.
10111013

sklearn/cluster/_k_means.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1336,12 +1336,13 @@ class MiniBatchKMeans(KMeans):
13361336
Parameters
13371337
----------
13381338
1339-
n_clusters : int, optional, default: 8
1339+
n_clusters : int, default=8
13401340
The number of clusters to form as well as the number of
13411341
centroids to generate.
13421342
1343-
init : {'k-means++', 'random' or an ndarray}, default: 'k-means++'
1344-
Method for initialization, defaults to 'k-means++':
1343+
init : {'k-means++', 'random'} or ndarray of shape \
1344+
(n_clusters, n_features), default='k-means++'
1345+
Method for initialization
13451346
13461347
'k-means++' : selects initial cluster centers for k-mean
13471348
clustering in a smart way to speed up convergence. See section
@@ -1353,26 +1354,26 @@ class MiniBatchKMeans(KMeans):
13531354
If an ndarray is passed, it should be of shape (n_clusters, n_features)
13541355
and gives the initial centers.
13551356
1356-
max_iter : int, optional
1357+
max_iter : int, default=100
13571358
Maximum number of iterations over the complete dataset before
13581359
stopping independently of any early stopping criterion heuristics.
13591360
1360-
batch_size : int, optional, default: 100
1361+
batch_size : int, default=100
13611362
Size of the mini batches.
13621363
1363-
verbose : bool, optional
1364+
verbose : int, default=0
13641365
Verbosity mode.
13651366
13661367
compute_labels : bool, default=True
13671368
Compute label assignment and inertia for the complete dataset
13681369
once the minibatch optimization has converged in fit.
13691370
1370-
random_state : int, RandomState instance or None (default)
1371+
random_state : int, RandomState instance, default=None
13711372
Determines random number generation for centroid initialization and
13721373
random reassignment. Use an int to make the randomness deterministic.
13731374
See :term:`Glossary <random_state>`.
13741375
1375-
tol : float, default: 0.0
1376+
tol : float, default=0.0
13761377
Control early stopping based on the relative center changes as
13771378
measured by a smoothed, variance-normalized of the mean center
13781379
squared position changes. This early stopping heuristics is
@@ -1383,25 +1384,27 @@ class MiniBatchKMeans(KMeans):
13831384
To disable convergence detection based on normalized center
13841385
change, set tol to 0.0 (default).
13851386
1386-
max_no_improvement : int, default: 10
1387+
max_no_improvement : int, default=10
13871388
Control early stopping based on the consecutive number of mini
13881389
batches that does not yield an improvement on the smoothed inertia.
13891390
13901391
To disable convergence detection based on inertia, set
13911392
max_no_improvement to None.
13921393
1393-
init_size : int, optional, default: 3 * batch_size
1394+
init_size : int, default=None
13941395
Number of samples to randomly sample for speeding up the
13951396
initialization (sometimes at the expense of accuracy): the
13961397
only algorithm is initialized by running a batch KMeans on a
13971398
random subset of the data. This needs to be larger than n_clusters.
13981399
1400+
If `None`, `init_size= 3 * batch_size`.
1401+
13991402
n_init : int, default=3
14001403
Number of random initializations that are tried.
14011404
In contrast to KMeans, the algorithm is only run once, using the
14021405
best of the ``n_init`` initializations as measured by inertia.
14031406
1404-
reassignment_ratio : float, default: 0.01
1407+
reassignment_ratio : float, default=0.01
14051408
Control the fraction of the maximum number of counts for a
14061409
center to be reassigned. A higher value means that low count
14071410
centers are more easily reassigned, which means that the
@@ -1411,10 +1414,10 @@ class MiniBatchKMeans(KMeans):
14111414
Attributes
14121415
----------
14131416
1414-
cluster_centers_ : array, [n_clusters, n_features]
1417+
cluster_centers_ : ndarray of shape (n_clusters, n_features)
14151418
Coordinates of cluster centers
14161419
1417-
labels_ :
1420+
labels_ : int
14181421
Labels of each point (if compute_labels is set to True).
14191422
14201423
inertia_ : float

0 commit comments

Comments
 (0)