Skip to content

[IVF] Simplify how we build CentroidAssignments #130709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 7, 2025

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Jul 7, 2025

This commit removes the flag for cache centroids when building cetroid assigments as I don't think is needed. It makes CentroidAssignments a record too.

@iverase iverase requested a review from john-wagster July 7, 2025 11:36
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@iverase iverase changed the title [IVF] Simplify how we buildCentroidAssignments [IVF] Simplify how we build CentroidAssignments Jul 7, 2025
this.cachedCentroids = cachedCentroids;
this.assignmentsByCluster = assignmentsByCluster;
}
record CentroidAssignments(int numCentroids, float[][] centroids, int[][] assignmentsByCluster) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I started with originally when I did this hkmeans work but if I remember correctly @benwtrent had concerns about the centroids being in memory on merge and had also originally in the code prior to hkmeans dropped the centroids from in memory after write and then read them in one at a time with what is now the CentroidSupplier. I like the simplification but I lack the intuition or the evaluations to suggest that leaving the centroids in memory on merge is ok, which I think is happening here because we'll hold onto the CentroidAssignments object after doing the calculateAndWriteCentroids call which will keep the centroids loaded. May also be worth considering refactoring the CentroidSupplier logic if we think it's ok to keep the centroids loaded in memory. I may be misunderstanding too so please feel free to correct me here. But i'll defer to Ben here largely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment, we are moving how we use CentroidAssignments during merging so it is only on heap when it is needed.

try {
centroidTemp = mergeState.segmentInfo.dir.createTempOutput(mergeState.segmentInfo.name, "civf_", IOContext.DEFAULT);
centroidTempName = centroidTemp.getName();
centroidAssignments = calculateAndWriteCentroids(
CentroidAssignments centroidAssignments = calculateAndWriteCentroids(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-wagster Having the CentroidAssignments here will prevent having the centroids in memory during merge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok yes sorry I missed the subtle change there. I can get behind this. +1

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@iverase iverase merged commit a3b6165 into elastic:main Jul 7, 2025
34 checks passed
@iverase iverase deleted the CentroidSupplier branch July 7, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants