Skip to content

Investigate hierarchical centroid storage for ivf format #129071

@benwtrent

Description

@benwtrent

Description

After #128675 we will be utilizing a type of hierarchical kmeans for clustering.

I wonder if we can keep 1 or two layers (or possibly rebuild them....) when storing the centroids.

The idea would be we can score blocks of centroids at a time, but at different layers.

I am not fully convinced we actually need a bottom layer graph like SPTAG (or spfresh), generally because segments (our partitions) are generally much smaller and we want to have fairly large posting lists (generally).

I am not ruling out the benefit of a bottom layer graph with an upper layer tree. I just like the idea of starting simple first :).

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions