optimize OptimizedScalarQuantizer#scalarQuantize when destination can… #129874

iverase · 2025-06-23T18:12:21Z

It is possible to optimize this method if the destination array is an integer array. In that case it is easy to panamize the following loop:

float nSteps = ((1 << bits) - 1);
        float step = (upperInterval - lowInterval) / nSteps;
        int sumQuery = 0;
        for (int h = 0; h < vector.length; h++) {
            float xi = Math.min(Math.max(vector[h], lowInterval), upperInterval);
            int assignment = Math.round((xi - lowInterval) / step);
            sumQuery += assignment;
            destination[h] = assignment;
        }
        return sumQuery;

… be an integer array

elasticsearchmachine · 2025-06-23T18:15:30Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent · 2025-06-23T18:31:15Z

server/src/main/java/org/elasticsearch/index/codec/vectors/OptimizedScalarQuantizer.java

@@ -141,6 +141,36 @@ public QuantizationResult scalarQuantize(float[] vector, byte[] destination, byt
        );
    }

+    public QuantizationResult scalarQuantizeToInts(float[] vector, int[] destination, byte bits, float[] centroid) {


if we are confident in the speed improvements, I would argue that we shouldn't bother with this new method, and simply adjust the APIs.

The on disk format is unchanged, so any older formats should be able to be adjusted that rely on the scalarQuantize signatures.

...arks/src/main/java/org/elasticsearch/benchmark/vector/OptimizedScalarQuantizerBenchmark.java

benwtrent · 2025-06-23T18:57:26Z

I ran on my macbook, there was no significant performance improvement.

You saw an improvement on AVX256?

    @Benchmark
    @Fork(jvmArgsPrepend = { "--add-modules=jdk.incubator.vector" })
    public int[] quantizeIntervalVector() {
        osq.scalarQuantizeToInts(vector, intDestination, bits, centroid);
        return intDestination;
    }

    @Benchmark
    @Fork(jvmArgsPrepend = { "--add-modules=jdk.incubator.vector" })
    public byte[] quantizeIntervalScalar() {
        osq.scalarQuantize(vector, destination, bits, centroid);
        return destination;
    }

./gradlew -p benchmarks run --args 'OptimizedScalarQuantizerBenchmark.quantizeInterval*'

Benchmark                                                 (bits)  (dims)   Mode  Cnt    Score    Error   Units
OptimizedScalarQuantizerBenchmark.quantizeIntervalScalar       1     768  thrpt   15  223.539 ± 21.878  ops/ms
OptimizedScalarQuantizerBenchmark.quantizeIntervalScalar       4     768  thrpt   15  216.348 ± 15.598  ops/ms
OptimizedScalarQuantizerBenchmark.quantizeIntervalScalar       7     768  thrpt   15  249.156 ± 19.012  ops/ms
OptimizedScalarQuantizerBenchmark.quantizeIntervalVector       1     768  thrpt   15  285.095 ± 39.949  ops/ms
OptimizedScalarQuantizerBenchmark.quantizeIntervalVector       4     768  thrpt   15  264.729 ± 56.267  ops/ms
OptimizedScalarQuantizerBenchmark.quantizeIntervalVector       7     768  thrpt   15  223.167 ± 96.211  ops/ms

iverase · 2025-06-23T21:32:53Z

I saw the same running locally. I will see if I can speed it up on mac tomorrow.

iverase · 2025-06-24T07:26:22Z

Running the benchmarks on AVX512 it shows a nice improvemnet:

Benchmark                                      (bits)  (dims)   Mode  Cnt    Score    Error   Units
OptimizedScalarQuantizerBenchmark.vector            1     384  thrpt   15  171.379 ± 13.438  ops/ms
OptimizedScalarQuantizerBenchmark.vector            1     702  thrpt   15   86.122 ± 12.200  ops/ms
OptimizedScalarQuantizerBenchmark.vector            1    1024  thrpt   15   66.933 ±  5.120  ops/ms
OptimizedScalarQuantizerBenchmark.vector            4     384  thrpt   15  164.831 ± 13.994  ops/ms
OptimizedScalarQuantizerBenchmark.vector            4     702  thrpt   15   77.198 ±  3.074  ops/ms
OptimizedScalarQuantizerBenchmark.vector            4    1024  thrpt   15   60.467 ±  2.358  ops/ms
OptimizedScalarQuantizerBenchmark.vector            7     384  thrpt   15  170.618 ± 10.339  ops/ms
OptimizedScalarQuantizerBenchmark.vector            7     702  thrpt   15   88.564 ±  5.674  ops/ms
OptimizedScalarQuantizerBenchmark.vector            7    1024  thrpt   15   65.541 ±  3.965  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       1     384  thrpt   15  375.294 ± 64.289  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       1     702  thrpt   15  172.473 ± 34.064  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       1    1024  thrpt   15  141.787 ± 10.931  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       4     384  thrpt   15  345.374 ± 54.182  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       4     702  thrpt   15  162.475 ± 32.259  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       4    1024  thrpt   15  141.852 ± 21.408  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       7     384  thrpt   15  389.065 ± 38.826  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       7     702  thrpt   15  168.443 ± 13.488  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       7    1024  thrpt   15  153.517 ± 23.609  ops/ms

Running the benchmarks on AVX2 still shows an improvement, a bit less than in AVX512:

Benchmark                                      (bits)  (dims)   Mode  Cnt    Score    Error   Units
OptimizedScalarQuantizerBenchmark.vector            1     384  thrpt   15  321.188 ± 44.707  ops/ms
OptimizedScalarQuantizerBenchmark.vector            1     702  thrpt   15  182.116 ± 25.225  ops/ms
OptimizedScalarQuantizerBenchmark.vector            1    1024  thrpt   15  120.325 ± 10.075  ops/ms
OptimizedScalarQuantizerBenchmark.vector            4     384  thrpt   15  302.188 ± 30.618  ops/ms
OptimizedScalarQuantizerBenchmark.vector            4     702  thrpt   15  163.192 ± 15.973  ops/ms
OptimizedScalarQuantizerBenchmark.vector            4    1024  thrpt   15  112.773 ±  7.515  ops/ms
OptimizedScalarQuantizerBenchmark.vector            7     384  thrpt   15  338.509 ± 36.895  ops/ms
OptimizedScalarQuantizerBenchmark.vector            7     702  thrpt   15  170.923 ±  9.042  ops/ms
OptimizedScalarQuantizerBenchmark.vector            7    1024  thrpt   15  126.368 ± 10.726  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       1     384  thrpt   15  457.541 ± 72.812  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       1     702  thrpt   15  240.875 ± 28.577  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       1    1024  thrpt   15  172.829 ± 20.372  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       4     384  thrpt   15  457.395 ± 65.854  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       4     702  thrpt   15  185.427 ± 55.006  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       4    1024  thrpt   15  155.515 ± 17.608  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       7     384  thrpt   15  455.067 ± 49.820  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       7     702  thrpt   15  280.439 ± 34.467  ops/ms
OptimizedScalarQuantizerBenchmark.vectorToInt       7    1024  thrpt   15  162.441 ±  7.979  ops/ms

I suspect that the expensive part of the algorithm is the #intoArray call. The biggest the bit size, the less calls we need to make to that method so the faster it goes. For Mac with bit size of 128 we just don't see any real improvement.

benwtrent · 2025-07-01T12:15:51Z

@iverase You still want to make this change? I can review if you are ready?

iverase · 2025-07-01T12:49:40Z

I couldn't make it faster in AVX so we are not getting faster there but it is clearly faster in AVX2 and AVX512 so I would say it is a net net win so I am good to push as it is.

server/src/main/java/org/elasticsearch/index/codec/vectors/OptimizedScalarQuantizer.java

benwtrent

Its good to me! Being faster on 256/512 is a good win and its no slower on neon-128

# Conflicts: # server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsWriter.java

…ticsearch into quantizeVectorWithIntervals

optimize OptimizedScalarQuantizer#scalarQuantize when destination can optimize OptimizedScalarQuantizer#scalarQuantize when destination can be an integer array

optimize OptimizedScalarQuantizer#scalarQuantize when destination can…

5420431

… be an integer array

elasticsearchmachine added v9.1.0 needs:triage Requires assignment of a team area label labels Jun 23, 2025

iverase requested review from benwtrent and john-wagster June 23, 2025 18:14

iverase added >non-issue :Search Relevance/Search Catch all for Search Relevance and removed needs:triage Requires assignment of a team area label labels Jun 23, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jun 23, 2025

benwtrent reviewed Jun 23, 2025

View reviewed changes

iter

cba551a

benwtrent reviewed Jun 23, 2025

View reviewed changes

...arks/src/main/java/org/elasticsearch/benchmark/vector/OptimizedScalarQuantizerBenchmark.java Outdated Show resolved Hide resolved

doh

6a14ee7

iverase added 3 commits June 24, 2025 09:26

Merge branch 'main' into quantizeVectorWithIntervals

4f5b3af

iter

ff23968

relax condition

cf623f0

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

Merge branch 'main' into quantizeVectorWithIntervals

80c87b0

benwtrent reviewed Jul 1, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/codec/vectors/OptimizedScalarQuantizer.java Show resolved Hide resolved

benwtrent approved these changes Jul 1, 2025

View reviewed changes

iverase added 3 commits July 2, 2025 07:29

Merge branch 'main' into quantizeVectorWithIntervals

8f25521

# Conflicts: # server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsWriter.java

Merge branch 'quantizeVectorWithIntervals' of github.com:iverase/elas…

f112834

…ticsearch into quantizeVectorWithIntervals

comment

d5ab6a8

iverase added 3 commits July 2, 2025 09:25

Merge branch 'main' into quantizeVectorWithIntervals

058c145

Merge branch 'main' into quantizeVectorWithIntervals

eb91164

Merge branch 'main' into quantizeVectorWithIntervals

2b4a91f

iverase merged commit f81d355 into elastic:main Jul 2, 2025
32 checks passed

iverase deleted the quantizeVectorWithIntervals branch July 2, 2025 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize OptimizedScalarQuantizer#scalarQuantize when destination can… #129874

optimize OptimizedScalarQuantizer#scalarQuantize when destination can… #129874

Uh oh!

iverase commented Jun 23, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jun 23, 2025

Uh oh!

benwtrent Jun 23, 2025

Uh oh!

Uh oh!

benwtrent commented Jun 23, 2025

Uh oh!

iverase commented Jun 23, 2025 •

edited

Loading

Uh oh!

iverase commented Jun 24, 2025

Uh oh!

benwtrent commented Jul 1, 2025

Uh oh!

iverase commented Jul 1, 2025

Uh oh!

Uh oh!

benwtrent left a comment

Uh oh!

Uh oh!

Uh oh!

optimize OptimizedScalarQuantizer#scalarQuantize when destination can… #129874

optimize OptimizedScalarQuantizer#scalarQuantize when destination can… #129874

Uh oh!

Conversation

iverase commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 23, 2025

Uh oh!

benwtrent Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benwtrent commented Jun 23, 2025

Uh oh!

iverase commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iverase commented Jun 24, 2025

Uh oh!

benwtrent commented Jul 1, 2025

Uh oh!

iverase commented Jul 1, 2025

Uh oh!

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

iverase commented Jun 23, 2025 •

edited

Loading

iverase commented Jun 23, 2025 •

edited

Loading