-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Leverage optimized native float32 vector scorers. #130541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The micro benchmarks all show approx 2x performance improvement in scorer operations, all platforms. For example: Apple Mac M2, AArch64
Scorer benchmark. Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've concentrated mainly on the native (C) part and its interaction (NativeAccess/VectorLibrary), and that looks good to me.
I have looked at the benchmarks and tests and they look sensible, but I think it's better to have another pair of eyes on the other parts (VectorScorer/lucene interaction)
Benchmarks looks good, and they are a testament of the goodness of Panama and its usage in Lucene! Hopefully OpenJDK will get rid of the bug "soon" :) |
…ions (#130635) This commit adds low-level optimized Neon, AVX2, and AVX 512 float32 vector operations; cosine, dot product, and square distance. The changes in this PR give approximately 2x performance increase for float32 vector operations across Linux/ Mac AArch64 and Linux x64 (both AVX2 and AVX 512). The performance increase comes mostly from being able to score the vectors off-heap (rather than copying on-heap before scoring). The low-level native scorer implementations show only approx ~3-5% improvement over the existing Panama Vector implementation. However, the native scorers allow to score off-heap. The use of Panama Vector with MemorySegments runs into a performance bug in Hotspot, where the bound is not optimally hoisted out of the hot loop (has been reported and acknowledged by OpenJDK) . This vector ops will be used by higher-level vector scorers in #130541
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
||
// Minimal copy of Lucene99HnswVectorsFormat in order to provide an optimized scorer, | ||
// which returns identical scores to that of the default flat vector scorer. | ||
public class ES819HnswVectorsFormat extends KnnVectorsFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering whether or not it is worth back porting this to 8.19.x.
Leverage optimized native float32 vector scorers.
The changes in this PR give approximately 2x performance increase for float32 vector operations across Linux/ Mac AArch64 and Linux x64 (both AVX2 and AVX 512).
The vector scorers leverage the native vector operations added by #130635.
The tests verify that the native scorers return similar values to that of the lucene scorers.
TODO: feature flag the new format so that we only create indices with that format when enabled.