-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Enable concurrent intra merge for HNSW graphs #108164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: lucene_snapshot
Are you sure you want to change the base?
Enable concurrent intra merge for HNSW graphs #108164
Conversation
Pinging @elastic/es-search (Team:Search) |
3a2da0b
to
81d8248
Compare
Lucene PR apache/lucene#12660 introduced concurrent HNSW merge and PR apache/lucene#13124 made ConcurrentMergeScheduler to provide an intra merge executor. But to enable concurrent intra merge for HNSW graphs we still need to provide numMergeWorker to the codec definition. This PR does the following: - provides numMergeWorker to the HNSW vectors codecs definions, where numMergerWorkes set as a node's property of index.merge.scheduler.max_thread_count - enables concurrent merge only for force merge operations.
0456125
to
1ac0e0d
Compare
@mayya-sharipova one of the failures is due to your change:
One of the test failures is worrisome and indicates a leak @original-brownbear :
|
// TODO: this is temporarily, remove this override and enable multithreaded merges for all kind of merges | ||
@Override | ||
public Executor getIntraMergeExecutor(MergePolicy.OneMerge merge) { | ||
// Enable multithreaded merges only for force merge operations | ||
if (merge.getStoreMergeInfo().mergeMaxNumSegments != -1) { | ||
return super.getIntraMergeExecutor(merge); | ||
} else { | ||
return null; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this should return null
. It should instead return a same thread executor.
@@ -838,7 +841,7 @@ private abstract static class IndexOptions implements ToXContent { | |||
this.type = type; | |||
} | |||
|
|||
abstract KnnVectorsFormat getVectorsFormat(); | |||
abstract KnnVectorsFormat getVectorsFormat(int mergeThreadCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something about this really bugs me. The merge thread count is a dynamically updatable value. But doing this, is it really dynamic for the workers in the dense vector field mapper?
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
80233a9
to
b527f59
Compare
apache/lucene#12660 introduced concurrent HNSW merge,
and apache/lucene#13124 made ConcurrentMergeScheduler
to provide an intra merge executor.
But to enable concurrent intra merge for HNSW graphs we still need to provide numMergeWorker to the codec definition.
This PR does the following: