-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add dense vector off-heap stats to Node stats and Index stats APIs #126704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @ChrisHegarty, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at the reflection bits, and have a thought.
server/src/main/java/org/elasticsearch/index/codec/vectors/reflect/OffHeapReflectionUtils.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/vectors/reflect/OffHeapReflectionUtils.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/shard/DenseVectorStats.java
Outdated
Show resolved
Hide resolved
static { | ||
try { | ||
// Lucene99ScalarQuantizedVectorsReader | ||
var cls = Class.forName("org.apache.lucene.codecs.lucene99.Lucene99ScalarQuantizedVectorsReader$FieldEntry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it would be even more complicated, but there are all the bwc indices from Lucene (Lucene95, 92, etc.). Do we want to include those? Its probable that folks have data in those old indices.
Its likely fine to do a "good enough" job here and indicate that stats are only available on newer formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add them, it's not difficult and will improve usability. When Lucene 10.3 arrives it will have support there too, and we can delete all this reflection code.
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by delegate thing, maybe I am misunderstanding.
Do we want to add the other older readers later?
server/src/main/java/org/elasticsearch/action/admin/indices/stats/IndicesStatsResponse.java
Outdated
Show resolved
Hide resolved
The reason I've not added them yet is that they're hard to test. My latest thought is to generate an old index and stuff it into a unit test. Longer term we won't need this, since the Lucene readers will already be tested in Lucene itself, but for now we'd need some minimal sanity on the reflection code. |
@elasticmachine update branch |
merge conflict between base and head |
I added support for the old codecs, see 4638377. I tested these locally, since it's not straightforward to create these old indices. |
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/cluster.stats/10_basic.yml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by default behavior and if we always want to return these stats or if per-field stats are always returned.
This change enhances the
dense_vector
section of the Nodes stats and Index stats APIs so that they report the desired size of off-heap memory for all indexed vectors. Thedense_vector
section of the Custer stats API remains unchanged.The retrieval mechanism and structure of the new stats is the same across the various three stats APIs, but more fine-grained information is disclosed as when moving from Cluster -> Node -> Index API.
For Node stats, we aggregate the total byte sizes for all vectors, categorised by the data type. For example:
Index stats: same as Node stats with included field break down . For example:
The implementation accesses the actual statistics through reflection. This will be completely removed when Lucene exposes this, which is expected in Lucene 10.3, see apache/lucene#14426.