Skip to content

Commit 67130a5

Browse files
jnothmanglemaitre
authored andcommitted
FIX IndexError due to imprecision in KMeans++ (scikit-learn#11756)
1 parent dc9955b commit 67130a5

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

sklearn/cluster/k_means_.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,9 @@ def _k_init(X, n_clusters, x_squared_norms, random_state, n_local_trials=None):
107107
rand_vals = random_state.random_sample(n_local_trials) * current_pot
108108
candidate_ids = np.searchsorted(stable_cumsum(closest_dist_sq),
109109
rand_vals)
110+
# XXX: numerical imprecision can result in a candidate_id out of range
111+
np.clip(candidate_ids, None, closest_dist_sq.size - 1,
112+
out=candidate_ids)
110113

111114
# Compute distances to center candidates
112115
distance_to_candidates = euclidean_distances(

0 commit comments

Comments
 (0)