[IVF] Improve the format of the tmp file written during merging #129828

iverase · 2025-06-23T06:32:07Z

During merging, we need to access the vectors in a random access fashion in order to build the clusters. In order to achieve that, we write our vectors and dicIds together on a temporary file. During testing on a memory constraint node, I noticed in the flamegraph that we were taking a lot of time reading docIds:

Looking at this process I noticed we can do much better because:

If the segment is dense, e.g all documents have a vector, we don't need to write he docIds as the docId is the ordinal of the vector.
If the segment is not dense, we can write the docIds in a separate file as they are access independent of the vectors.

This commit just adds the logic above which improved the performance on memory constraint nodes.

elasticsearchmachine · 2025-06-23T06:32:31Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent

I think separating out the doc ids vs the vectors is great!

…tic#129828) This commit separe vector and docIds on the tmp file.

[IVF] Improve the format of the tmp file written during merging

906868c

iverase requested review from benwtrent and john-wagster June 23, 2025 06:32

iverase added >non-issue :Search Relevance/Vectors Vector search v9.1.0 labels Jun 23, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jun 23, 2025

lowercase

a2ad388

benwtrent approved these changes Jun 23, 2025

View reviewed changes

iverase merged commit 72b488c into elastic:main Jun 23, 2025
27 checks passed

iverase deleted the ivfwriter_tmpfile branch June 23, 2025 12:44

kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Jun 23, 2025

[IVF] Improve the format of the tmp file written during merging (elas…

774eec4

…tic#129828) This commit separe vector and docIds on the tmp file.

mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025

[IVF] Improve the format of the tmp file written during merging (elas…

340bfba

…tic#129828) This commit separe vector and docIds on the tmp file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[IVF] Improve the format of the tmp file written during merging #129828

[IVF] Improve the format of the tmp file written during merging #129828

Uh oh!

iverase commented Jun 23, 2025

Uh oh!

elasticsearchmachine commented Jun 23, 2025

Uh oh!

benwtrent left a comment

Uh oh!

Uh oh!

Uh oh!

[IVF] Improve the format of the tmp file written during merging #129828

[IVF] Improve the format of the tmp file written during merging #129828

Uh oh!

Conversation

iverase commented Jun 23, 2025

Uh oh!

elasticsearchmachine commented Jun 23, 2025

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!