Skip to content

IVF Hierarchical KMeans Flush & Merge #128675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Jun 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
206022a
added classes related to running hierarchical kmeans as a clustering …
john-wagster May 30, 2025
85e4d8f
[CI] Auto commit changes from spotless
elasticsearchmachine May 30, 2025
6578e87
Merge branch 'main' into ivf_hkmeans
john-wagster May 30, 2025
4280682
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 2, 2025
58c5991
iter
john-wagster Jun 2, 2025
651efdf
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 2, 2025
5743d59
bringing back some interfaces
john-wagster Jun 2, 2025
47e5d8e
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 2, 2025
786e4f1
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 2, 2025
5ca53d3
accidentally remove suppressforbidden
john-wagster Jun 2, 2025
b1f9ae4
migrated from short to int and fixed IOUtils copy/paste errors
john-wagster Jun 2, 2025
075e2ce
no longer allocating larger arrays for slices that are the entire set…
john-wagster Jun 2, 2025
5fb98ff
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 2, 2025
523c2ca
iter on fvvs
john-wagster Jun 2, 2025
bb4531b
Merge branch 'ivf_hkmeans' of github.com:john-wagster/elasticsearch i…
john-wagster Jun 2, 2025
44b0aa9
iter on fvvs
john-wagster Jun 2, 2025
f5f0538
fixing comment
john-wagster Jun 2, 2025
3893098
switched to reservoir sampling
john-wagster Jun 3, 2025
1f2d053
switched to reservoir sampling
john-wagster Jun 3, 2025
6cda6a6
switched to reservoir sampling
john-wagster Jun 3, 2025
4cd94cf
missed a few short to int in tests
john-wagster Jun 3, 2025
b6d61fa
removed sorting on writeCentroids
john-wagster Jun 3, 2025
c82d719
migrated CentroidAssignments to a class to hide default constructor, …
john-wagster Jun 3, 2025
4bd2c9c
only getting the vector value on sampling when necessary
john-wagster Jun 3, 2025
1d61944
* stepLloyd now passes nextCentroids to prevent creating and rec…
john-wagster Jun 4, 2025
f05a541
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 4, 2025
26698d7
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 4, 2025
5112408
bug fixes around printing cluster metrics; still refactoring this
john-wagster Jun 4, 2025
dd61ba5
split kmeansresult into two classes, updated centroid assignments int…
john-wagster Jun 5, 2025
762839e
comibned kmeans and kmeanslocal classes into one class, and fixed vis…
john-wagster Jun 5, 2025
e5746a1
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 5, 2025
44d0f24
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 6, 2025
e82af9c
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 6, 2025
cf7c6b3
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 6, 2025
2c96c82
added trimtosize and fixed a spot where we should be returning KMeans…
john-wagster Jun 6, 2025
cc5570a
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 6, 2025
3fec326
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 6, 2025
5dffeea
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 6, 2025
904f52d
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 7, 2025
aad4b3b
minor test fixes and edge cases
john-wagster Jun 8, 2025
1f93921
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 8, 2025
968f539
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 9, 2025
1048f7f
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 9, 2025
490946f
Merge remote-tracking branch 'upstream/main' into ivf_hkmeans
benwtrent Jun 9, 2025
12a1207
fixing bugs
benwtrent Jun 9, 2025
8ca12bc
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 9, 2025
69a0b4e
merge
john-wagster Jun 9, 2025
ab5a61c
removed unnecessary int[]
john-wagster Jun 10, 2025
93ca452
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 10, 2025
f935144
removed null checking for ffvslice for now because it's extra cruft; …
john-wagster Jun 10, 2025
fc41d7d
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 10, 2025
ff0fad4
making constructor private to reduce confusion
john-wagster Jun 10, 2025
b48bfcf
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 10, 2025
e05ac74
Merge branch 'main' into ivf_hkmeans
john-wagster Jun 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -276,12 +276,11 @@ TopDocs doVectorQuery(byte[] vector, IndexSearcher searcher) throws IOException
TopDocs doVectorQuery(float[] vector, IndexSearcher searcher) throws IOException {
Query knnQuery;
int topK = this.topK;
int efSearch = this.efSearch;
if (overSamplingFactor > 1f) {
// oversample the topK results to get more candidates for the final result
topK = (int) Math.ceil(topK * overSamplingFactor);
efSearch = Math.max(topK, efSearch);
}
int efSearch = Math.max(topK, this.efSearch);
if (indexType == KnnIndexTester.IndexType.IVF) {
knnQuery = new IVFKnnFloatVectorQuery(VECTOR_FIELD, vector, topK, efSearch, null, nProbe);
} else {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the "Elastic License
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
* Public License v 1"; you may not use this file except in compliance with, at
* your election, the "Elastic License 2.0", the "GNU Affero General Public
* License v3.0 only", or the "Server Side Public License, v 1".
*/

package org.elasticsearch.index.codec.vectors;

import org.apache.lucene.internal.hppc.IntArrayList;

final class CentroidAssignments {

private final int numCentroids;
private final float[][] cachedCentroids;
private final IntArrayList[] assignmentsByCluster;

private CentroidAssignments(int numCentroids, float[][] cachedCentroids, IntArrayList[] assignmentsByCluster) {
this.numCentroids = numCentroids;
this.cachedCentroids = cachedCentroids;
this.assignmentsByCluster = assignmentsByCluster;
}

CentroidAssignments(float[][] centroids, IntArrayList[] assignmentsByCluster) {
this(centroids.length, centroids, assignmentsByCluster);
}

CentroidAssignments(int numCentroids, IntArrayList[] assignmentsByCluster) {
this(numCentroids, null, assignmentsByCluster);
}

// Getters and setters
public int numCentroids() {
return numCentroids;
}

public float[][] cachedCentroids() {
return cachedCentroids;
}

public IntArrayList[] assignmentsByCluster() {
return assignmentsByCluster;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -112,15 +112,6 @@ public float score(int centroidOrdinal) throws IOException {
};
}

@Override
protected FloatVectorValues getCentroids(IndexInput indexInput, int numCentroids, FieldInfo info) {
FieldEntry entry = fields.get(info.number);
if (entry == null) {
return null;
}
return new OffHeapCentroidFloatVectorValues(numCentroids, indexInput, info.getVectorDimension());
}

@Override
NeighborQueue scorePostingLists(FieldInfo fieldInfo, KnnCollector knnCollector, CentroidQueryScorer centroidQueryScorer, int nProbe)
throws IOException {
Expand Down
Loading