Use a competitive iterator in `FiltersAggregator` #98360

kkrik-es · 2023-08-10T16:19:00Z

The iterator is used to combine filtering with querying in leaf
collection. Its benefit is that rangers with docs that are filtered out
by all filters are skipped from doc collection.

The competitive iterator is restricted to FiltersAggregator, not used in
FilterByFilterAggregator that's already optimized. It only applies to
top-level filter aggregations with no "other" bucket defined; the latter
leads to collecting all docs so there's no point in skipping doc ranges.

Fixes #97544

The iterator is used to combine filtering with querying in leaf collection. Its benefit is that rangers with docs that are filtered out by all filters are skipped from doc collection. The competitive iterator is restricted to FiltersAggregator, not used in FilterByFilterAggregator that's already optimized. It only applies to top-level filter aggregations with no "other" bucket defined; the latter leads to collecting all docs so there's no point in skipping doc ranges. Fixes elastic#97544

# Conflicts: # server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

kkrik-es · 2023-08-11T13:34:18Z

@elasticsearchmachine run elasticsearch-ci/part-2

elasticsearchmachine · 2023-08-11T14:48:04Z

Hi @kkrik-es, I've created a changelog YAML for you.

elasticsearchmachine · 2023-08-11T14:48:04Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

martijnvg

I left a few question. This looks in the right direction.
Did you already check what the speedup of this change is?
I think the filter-aggs challenge in the noaa rally track can give us this information.

martijnvg · 2023-08-14T06:06:09Z

...-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/DatafeedJobsIT.java

@@ -107,7 +107,6 @@ public void testLookbackOnly() throws Exception {
        startDatafeed(datafeedConfig.getId(), 0L, now);
        assertBusy(() -> {
            DataCounts dataCounts = getDataCounts(job.getId());
-            assertThat(dataCounts.getProcessedRecordCount(), equalTo(numDocs + numDocs2));


Why are these changes to ML tests needed?

I think the competitive iterator kicks in sometimes and changess these metadata. We probably need someone who knows more about these tests to review the changes and, more importantly, assert that the actual output doesn't change.

Yes, this needs double checking.

I don't know whether filters aggs get used by data feeds, but even it is being used the counts that are returned shouldn't change. How did these assertions fail?

docs/changelog/98360.yaml

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

martijnvg · 2023-08-14T06:16:34Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

+        final int maxDocId;
+
+        // Tracks next doc matches as they get returned from the per-filter iterators.
+        final PriorityQueue<Integer> nextDocs;


Maybe use Lucene's PriorityQueue instead here and that contains DocIdSetIterator instances?
(PriorityQueue#lessThan() would compare DocIdSetIterartor#docId() and each time a DocIdSetIterartor #advance(...) is invoked PriorityQueue#updateTop(...) should be invoked )

(this pattern is used in other places as well)

Hm that may be somewhat tricky to use as we want to keep the last doc per filter to update the corresponding bucket. Let's discuss this offline in our sync.

I think we can extract the last docid from this priority queue and use it in several places. But let's chat more about this in the sync.

martijnvg · 2023-08-14T06:21:21Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

+
+        final DocIdSetIterator[] filteredDocIdSetIterators;
+        final int[] lastMatches;
+        final DocId currentDocID;


No need for the DocId class? This can just be a mutable int field?

Wait I see, this used in collector as well...

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

elasticsearchmachine · 2023-08-14T06:55:48Z

Hi @kkrik-es, I've updated the changelog YAML for you.

kkrik-es · 2023-08-16T11:32:36Z

Rally reports a decent improvement, up to 18% win in 50 pct latency:

|                                                        Metric |                                      Task |         Baseline |        Contender |         Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|------------------------------------------:|-----------------:|-----------------:|-------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|             Min cumulative indexing time across primary shard |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|          Median cumulative indexing time across primary shard |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|             Max cumulative indexing time across primary shard |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|           Cumulative indexing throttle time of primary shards |                                           |      0           |      0           |      0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                           |      0           |      0           |      0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                           |      0           |      0           |      0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                           |      0           |      0           |      0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|                      Cumulative merge count of primary shards |                                           |     41           |     40           |     -1       |        |   -2.44% |
|                Min cumulative merge time across primary shard |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|             Median cumulative merge time across primary shard |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|                Max cumulative merge time across primary shard |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|              Cumulative merge throttle time of primary shards |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|       Min cumulative merge throttle time across primary shard |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|    Median cumulative merge throttle time across primary shard |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|       Max cumulative merge throttle time across primary shard |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|                     Cumulative refresh time of primary shards |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|                    Cumulative refresh count of primary shards |                                           |     22           |     21           |     -1       |        |   -4.55% |
|              Min cumulative refresh time across primary shard |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|           Median cumulative refresh time across primary shard |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|              Max cumulative refresh time across primary shard |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|                       Cumulative flush time of primary shards |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|                      Cumulative flush count of primary shards |                                           |      4           |      3           |     -1       |        |  -25.00% |
|                Min cumulative flush time across primary shard |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|             Median cumulative flush time across primary shard |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|                Max cumulative flush time across primary shard |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|                                       Total Young Gen GC time |                                           |      8.439       |      8.482       |      0.043   |      s |   +0.51% |
|                                      Total Young Gen GC count |                                           |   2271           |   2282           |     11       |        |   +0.48% |
|                                         Total Old Gen GC time |                                           |      0           |      0           |      0       |      s |    0.00% |
|                                        Total Old Gen GC count |                                           |      0           |      0           |      0       |        |    0.00% |
|                                                    Store size |                                           |      8.31853     |      8.37377     |      0.05524 |     GB |   +0.66% |
|                                                 Translog size |                                           |      5.12227e-08 |      5.12227e-08 |      0       |     GB |    0.00% |
|                                        Heap used for segments |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                      Heap used for doc values |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for terms |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for norms |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                          Heap used for points |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                   Heap used for stored fields |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                                 Segment count |                                           |     30           |     26           |     -4       |        |  -13.33% |
|                                   Total Ingest Pipeline count |                                           |      0           |      0           |      0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                                           |      0           |      0           |      0       |     ms |    0.00% |
|                                  Total Ingest Pipeline failed |                                           |      0           |      0           |      0       |        |    0.00% |
|                                                Min Throughput |                                     index | 104997           | 110246           |   5248.8     | docs/s |   +5.00% |
|                                               Mean Throughput |                                     index | 182787           | 176019           |  -6767.43    | docs/s |   -3.70% |
|                                             Median Throughput |                                     index | 187914           | 179756           |  -8157.83    | docs/s |   -4.34% |
|                                                Max Throughput |                                     index | 192355           | 182198           | -10157       | docs/s |   -5.28% |
|                                       50th percentile latency |                                     index |    164.601       |    172.243       |      7.64185 |     ms |   +4.64% |
|                                       90th percentile latency |                                     index |    210.395       |    219.667       |      9.27198 |     ms |   +4.41% |
|                                       99th percentile latency |                                     index |   1012.29        |   1067.79        |     55.498   |     ms |   +5.48% |
|                                     99.9th percentile latency |                                     index |   1248.07        |   1264.29        |     16.2213  |     ms |   +1.30% |
|                                      100th percentile latency |                                     index |   1615.45        |   2107.48        |    492.022   |     ms |  +30.46% |
|                                  50th percentile service time |                                     index |    164.601       |    172.243       |      7.64185 |     ms |   +4.64% |
|                                  90th percentile service time |                                     index |    210.395       |    219.667       |      9.27198 |     ms |   +4.41% |
|                                  99th percentile service time |                                     index |   1012.29        |   1067.79        |     55.498   |     ms |   +5.48% |
|                                99.9th percentile service time |                                     index |   1248.07        |   1264.29        |     16.2213  |     ms |   +1.30% |
|                                 100th percentile service time |                                     index |   1615.45        |   2107.48        |    492.022   |     ms |  +30.46% |
|                                                    error rate |                                     index |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                               Mean Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                             Median Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                                Max Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                       50th percentile latency |  filters-agg-with-low-cardinality-filters |      2.52713     |      2.07235     |     -0.45477 |     ms |  -18.00% |
|                                       90th percentile latency |  filters-agg-with-low-cardinality-filters |      2.93371     |      2.45436     |     -0.47935 |     ms |  -16.34% |
|                                      100th percentile latency |  filters-agg-with-low-cardinality-filters |      4.88392     |      4.55642     |     -0.3275  |     ms |   -6.71% |
|                                  50th percentile service time |  filters-agg-with-low-cardinality-filters |      2.52713     |      2.07235     |     -0.45477 |     ms |  -18.00% |
|                                  90th percentile service time |  filters-agg-with-low-cardinality-filters |      2.93371     |      2.45436     |     -0.47935 |     ms |  -16.34% |
|                                 100th percentile service time |  filters-agg-with-low-cardinality-filters |      4.88392     |      4.55642     |     -0.3275  |     ms |   -6.71% |
|                                                    error rate |  filters-agg-with-low-cardinality-filters |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                               Mean Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                             Median Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                                Max Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                       50th percentile latency | filters-agg-with-high-cardinality-filters |      1.5521      |      1.61546     |      0.06335 |     ms |   +4.08% |
|                                       90th percentile latency | filters-agg-with-high-cardinality-filters |      1.9342      |      1.97363     |      0.03943 |     ms |   +2.04% |
|                                      100th percentile latency | filters-agg-with-high-cardinality-filters |      2.48071     |      2.3245      |     -0.15621 |     ms |   -6.30% |
|                                  50th percentile service time | filters-agg-with-high-cardinality-filters |      1.5521      |      1.61546     |      0.06335 |     ms |   +4.08% |
|                                  90th percentile service time | filters-agg-with-high-cardinality-filters |      1.9342      |      1.97363     |      0.03943 |     ms |   +2.04% |
|                                 100th percentile service time | filters-agg-with-high-cardinality-filters |      2.48071     |      2.3245      |     -0.15621 |     ms |   -6.30% |
|                                                    error rate | filters-agg-with-high-cardinality-filters |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |           filters-per-10-degrees-interval |      1.83235     |      1.99487     |      0.16252 |  ops/s |   +8.87% |
|                                               Mean Throughput |           filters-per-10-degrees-interval |      2.03136     |      2.12661     |      0.09525 |  ops/s |   +4.69% |
|                                             Median Throughput |           filters-per-10-degrees-interval |      2.05562     |      2.14339     |      0.08776 |  ops/s |   +4.27% |
|                                                Max Throughput |           filters-per-10-degrees-interval |      2.1094      |      2.17495     |      0.06555 |  ops/s |   +3.11% |
|                                       50th percentile latency |           filters-per-10-degrees-interval |    456.609       |    448.627       |     -7.9825  |     ms |   -1.75% |
|                                       90th percentile latency |           filters-per-10-degrees-interval |    461.71        |    451.149       |    -10.5617  |     ms |   -2.29% |
|                                      100th percentile latency |           filters-per-10-degrees-interval |    489.581       |    455.286       |    -34.2948  |     ms |   -7.00% |
|                                  50th percentile service time |           filters-per-10-degrees-interval |    456.609       |    448.627       |     -7.9825  |     ms |   -1.75% |
|                                  90th percentile service time |           filters-per-10-degrees-interval |    461.71        |    451.149       |    -10.5617  |     ms |   -2.29% |
|                                 100th percentile service time |           filters-per-10-degrees-interval |    489.581       |    455.286       |    -34.2948  |     ms |   -7.00% |
|                                                    error rate |           filters-per-10-degrees-interval |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                               Mean Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                             Median Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                                Max Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                       50th percentile latency |  filters-agg-with-high-cardinality-filter |      1.26246     |      1.12433     |     -0.13813 |     ms |  -10.94% |
|                                       90th percentile latency |  filters-agg-with-high-cardinality-filter |      1.6089      |      1.41743     |     -0.19147 |     ms |  -11.90% |
|                                      100th percentile latency |  filters-agg-with-high-cardinality-filter |      1.87308     |      1.66788     |     -0.20521 |     ms |  -10.96% |
|                                  50th percentile service time |  filters-agg-with-high-cardinality-filter |      1.26246     |      1.12433     |     -0.13813 |     ms |  -10.94% |
|                                  90th percentile service time |  filters-agg-with-high-cardinality-filter |      1.6089      |      1.41743     |     -0.19147 |     ms |  -11.90% |
|                                 100th percentile service time |  filters-agg-with-high-cardinality-filter |      1.87308     |      1.66788     |     -0.20521 |     ms |  -10.96% |
|                                                    error rate |  filters-agg-with-high-cardinality-filter |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                               Mean Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                             Median Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                                Max Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                       50th percentile latency |   filter-agg-with-high-cardinality-filter |      1.14133     |      1.01967     |     -0.12167 |     ms |  -10.66% |
|                                       90th percentile latency |   filter-agg-with-high-cardinality-filter |      1.68486     |      1.37135     |     -0.31351 |     ms |  -18.61% |
|                                      100th percentile latency |   filter-agg-with-high-cardinality-filter |      7.11062     |      5.19767     |     -1.91296 |     ms |  -26.90% |
|                                  50th percentile service time |   filter-agg-with-high-cardinality-filter |      1.14133     |      1.01967     |     -0.12167 |     ms |  -10.66% |
|                                  90th percentile service time |   filter-agg-with-high-cardinality-filter |      1.68486     |      1.37135     |     -0.31351 |     ms |  -18.61% |
|                                 100th percentile service time |   filter-agg-with-high-cardinality-filter |      7.11062     |      5.19767     |     -1.91296 |     ms |  -26.90% |
|                                                    error rate |   filter-agg-with-high-cardinality-filter |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            filter-per-10-degrees-interval |      2.15795     |      2.17317     |      0.01522 |  ops/s |   +0.71% |
|                                               Mean Throughput |            filter-per-10-degrees-interval |      2.18321     |      2.20445     |      0.02123 |  ops/s |   +0.97% |
|                                             Median Throughput |            filter-per-10-degrees-interval |      2.18675     |      2.20842     |      0.02166 |  ops/s |   +0.99% |
|                                                Max Throughput |            filter-per-10-degrees-interval |      2.19188     |      2.21463     |      0.02275 |  ops/s |   +1.04% |
|                                       50th percentile latency |            filter-per-10-degrees-interval |    454.153       |    449.334       |     -4.81954 |     ms |   -1.06% |
|                                       90th percentile latency |            filter-per-10-degrees-interval |    456.357       |    451.217       |     -5.13981 |     ms |   -1.13% |
|                                      100th percentile latency |            filter-per-10-degrees-interval |    459.647       |    456.502       |     -3.14542 |     ms |   -0.68% |
|                                  50th percentile service time |            filter-per-10-degrees-interval |    454.153       |    449.334       |     -4.81954 |     ms |   -1.06% |
|                                  90th percentile service time |            filter-per-10-degrees-interval |    456.357       |    451.217       |     -5.13981 |     ms |   -1.13% |
|                                 100th percentile service time |            filter-per-10-degrees-interval |    459.647       |    456.502       |     -3.14542 |     ms |   -0.68% |
|                                                    error rate |            filter-per-10-degrees-interval |      0           |      0           |      0       |      % |    0.00% |

kkrik-es · 2023-09-04T10:30:53Z

Adding @jpountz to take a look on search side, please feel free to reassign if too busy.

jpountz · 2023-09-04T12:27:46Z

server/src/main/java/org/elasticsearch/common/lucene/Lucene.java

+        final TwoPhaseIterator twoPhase = scorer.twoPhaseIterator();
+        final DocIdSetIterator iterator = (twoPhase != null) ? twoPhase.approximation() : scorer.iterator();
+
+        return new DocIdSetIterator() {


FYI Lucene already has this via TwoPhaseIterator#asDocIdSetIterator. That said, I don't think we should use it because it defeats the point of using a TwoPhaseIterator? For instance, if the main query matches doc IDs 3 and 10 only, and we call advance(3) on the filter of the filter aggregation: if 3 doesn't match, it will then try out 4 then 5... until it finds one that matches, but we don't care about them since they don't match the query?

Ideally, the competitive aggregator would ignore the TwoPhaseIterator and just look at the approximation part, so that TwoPhaseIterator#matches would only be called when both the main query and the filter match.

Thanks Adrien. I updated FiltersAggregator, is this what you had in mind? If so, I'll go ahead and revert changes to Lucene.java.

Not exactly. I pushed a small change that shows what I have in mind athttps://github.com/elastic/elasticsearch/compare/main...jpountz:elasticsearch:filter_competitive_iterator_poc?expand=1. Beware it's not tested and has at least one bug, but hopefully it helps?

Nice, this is much cleaner. Ptal

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

martijnvg

I think this looks good! The only remaining issue is the changes to the ml data feed integration tests.

martijnvg · 2023-09-05T09:37:44Z

...-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/DatafeedJobsIT.java

@@ -107,7 +107,6 @@ public void testLookbackOnly() throws Exception {
        startDatafeed(datafeedConfig.getId(), 0L, now);
        assertBusy(() -> {
            DataCounts dataCounts = getDataCounts(job.getId());
-            assertThat(dataCounts.getProcessedRecordCount(), equalTo(numDocs + numDocs2));


I don't know whether filters aggs get used by data feeds, but even it is being used the counts that are returned shouldn't change. How did these assertions fail?

jpountz

I left minor suggestions but this looks good to me. Is my understanding correct that this change also optimizes the filter aggregation, in addition to the filters aggregation? I'd be interested in looking at updated benchmark numbers since the logic changed since you first looked into performance numbers.

.../src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java

jpountz · 2023-09-05T11:23:43Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

-            IntPredicate[] docFilters = new IntPredicate[filters().size()];
-            for (int filterOrd = 0; filterOrd < filters().size(); filterOrd++) {
-                docFilters[filterOrd] = filters().get(filterOrd).matchingDocIds(aggCtx.getLeafReaderContext());
+            final boolean useCompetitiveIterator = (parent == null && otherBucketKey == null);


I wonder if we should also not use the competitive iterator if the sum of the costs (scorer.iterator().cost()) of filters is greater than or equal to maxDoc, which suggests that most documents would match one filter, ie. configuring a competitive iterator would add a bit of overhead but unlikely save evaluating some documents.

martijnvg · 2023-09-05T11:58:39Z

Is my understanding correct that this change also optimizes the filter aggregation, in addition to the filters aggregation?

Yes, the FilterAggregator under the hood adopts to the FiltersAggregator.

jpountz · 2023-09-05T12:20:33Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

+                public DocIdSetIterator competitiveIterator() throws IOException {
+                    if (useCompetitiveIterator) {
+                        // A DocIdSetIterator view of the filterIterators heap
+                        return new DisjunctionDISIApproximation(filterIterators);


Looking at the code of DisjunctionDISIApproximation, it will not like if filterIterators is empty. Maybe check for emptiness and return a DocIdSetIterator.empty() in that case? It should be possible to write a test case by using filters on terms that do not exist.

Right, yeah that's why I was returning a scorer with an empty DISI instead of null. I updated the code to address this.

Note that, if filterIterators is empty, we won't be using the competitive iterator in the latest version.

kkrik-es · 2023-09-05T12:32:24Z

The numbers I got from my latest run are more of a mixed bag. Granted, I'm using a different (better) laptop (M2 vs M1, 32GB vs 16GB), but I'd hope there would be a more clear benefit across the board. Still, there's a win in tail latency up to 40%:

|                                                        Metric |                                      Task |         Baseline |        Contender |        Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|------------------------------------------:|-----------------:|-----------------:|------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|             Min cumulative indexing time across primary shard |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|          Median cumulative indexing time across primary shard |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|             Max cumulative indexing time across primary shard |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|           Cumulative indexing throttle time of primary shards |                                           |      0           |      0           |     0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                           |      0           |      0           |     0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                           |      0           |      0           |     0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                           |      0           |      0           |     0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|                      Cumulative merge count of primary shards |                                           |     42           |     41           |    -1       |        |   -2.38% |
|                Min cumulative merge time across primary shard |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|             Median cumulative merge time across primary shard |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|                Max cumulative merge time across primary shard |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|              Cumulative merge throttle time of primary shards |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|       Min cumulative merge throttle time across primary shard |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|    Median cumulative merge throttle time across primary shard |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|       Max cumulative merge throttle time across primary shard |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|                     Cumulative refresh time of primary shards |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|                    Cumulative refresh count of primary shards |                                           |     23           |     24           |     1       |        |   +4.35% |
|              Min cumulative refresh time across primary shard |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|           Median cumulative refresh time across primary shard |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|              Max cumulative refresh time across primary shard |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|                       Cumulative flush time of primary shards |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|                      Cumulative flush count of primary shards |                                           |      4           |      4           |     0       |        |    0.00% |
|                Min cumulative flush time across primary shard |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|             Median cumulative flush time across primary shard |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|                Max cumulative flush time across primary shard |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|                                       Total Young Gen GC time |                                           |      6.696       |      6.83        |     0.134   |      s |   +2.00% |
|                                      Total Young Gen GC count |                                           |   2292           |   2317           |    25       |        |   +1.09% |
|                                         Total Old Gen GC time |                                           |      0           |      0           |     0       |      s |    0.00% |
|                                        Total Old Gen GC count |                                           |      0           |      0           |     0       |        |    0.00% |
|                                                    Store size |                                           |      8.30888     |      7.98571     |    -0.32317 |     GB |   -3.89% |
|                                                 Translog size |                                           |      5.12227e-08 |      5.12227e-08 |     0       |     GB |    0.00% |
|                                        Heap used for segments |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                      Heap used for doc values |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                           Heap used for terms |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                           Heap used for norms |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                          Heap used for points |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                   Heap used for stored fields |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                                 Segment count |                                           |     28           |     30           |     2       |        |   +7.14% |
|                                   Total Ingest Pipeline count |                                           |      0           |      0           |     0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                                           |      0           |      0           |     0       |     ms |    0.00% |
|                                  Total Ingest Pipeline failed |                                           |      0           |      0           |     0       |        |    0.00% |
|                                                Min Throughput |                                     index | 147040           | 144539           | -2501.53    | docs/s |   -1.70% |
|                                               Mean Throughput |                                     index | 215127           | 218379           |  3252.05    | docs/s |   +1.51% |
|                                             Median Throughput |                                     index | 220766           | 224646           |  3880.11    | docs/s |   +1.76% |
|                                                Max Throughput |                                     index | 222988           | 229950           |  6962.6     | docs/s |   +3.12% |
|                                       50th percentile latency |                                     index |    134.935       |    132.967       |    -1.96869 |     ms |   -1.46% |
|                                       90th percentile latency |                                     index |    173.228       |    169.454       |    -3.7734  |     ms |   -2.18% |
|                                       99th percentile latency |                                     index |    849.205       |    860.867       |    11.6618  |     ms |   +1.37% |
|                                     99.9th percentile latency |                                     index |    970.257       |   1030.91        |    60.6566  |     ms |   +6.25% |
|                                      100th percentile latency |                                     index |   1030.54        |   1620.65        |   590.11    |     ms |  +57.26% |
|                                  50th percentile service time |                                     index |    134.935       |    132.967       |    -1.96869 |     ms |   -1.46% |
|                                  90th percentile service time |                                     index |    173.228       |    169.454       |    -3.7734  |     ms |   -2.18% |
|                                  99th percentile service time |                                     index |    849.205       |    860.867       |    11.6618  |     ms |   +1.37% |
|                                99.9th percentile service time |                                     index |    970.257       |   1030.91        |    60.6566  |     ms |   +6.25% |
|                                 100th percentile service time |                                     index |   1030.54        |   1620.65        |   590.11    |     ms |  +57.26% |
|                                                    error rate |                                     index |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                               Mean Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                             Median Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                                Max Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                       50th percentile latency |  filters-agg-with-low-cardinality-filters |      1.62896     |      1.58442     |    -0.04454 |     ms |   -2.73% |
|                                       90th percentile latency |  filters-agg-with-low-cardinality-filters |      1.88333     |      2.16011     |     0.27678 |     ms |  +14.70% |
|                                      100th percentile latency |  filters-agg-with-low-cardinality-filters |      4.00417     |      7.05975     |     3.05558 |     ms |  +76.31% |
|                                  50th percentile service time |  filters-agg-with-low-cardinality-filters |      1.62896     |      1.58442     |    -0.04454 |     ms |   -2.73% |
|                                  90th percentile service time |  filters-agg-with-low-cardinality-filters |      1.88333     |      2.16011     |     0.27678 |     ms |  +14.70% |
|                                 100th percentile service time |  filters-agg-with-low-cardinality-filters |      4.00417     |      7.05975     |     3.05558 |     ms |  +76.31% |
|                                                    error rate |  filters-agg-with-low-cardinality-filters |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                               Mean Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                             Median Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                                Max Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                       50th percentile latency | filters-agg-with-high-cardinality-filters |      1.26246     |      1.27048     |     0.00802 |     ms |   +0.64% |
|                                       90th percentile latency | filters-agg-with-high-cardinality-filters |      1.55798     |      1.68032     |     0.12235 |     ms |   +7.85% |
|                                      100th percentile latency | filters-agg-with-high-cardinality-filters |      1.77612     |      1.93946     |     0.16333 |     ms |   +9.20% |
|                                  50th percentile service time | filters-agg-with-high-cardinality-filters |      1.26246     |      1.27048     |     0.00802 |     ms |   +0.64% |
|                                  90th percentile service time | filters-agg-with-high-cardinality-filters |      1.55798     |      1.68032     |     0.12235 |     ms |   +7.85% |
|                                 100th percentile service time | filters-agg-with-high-cardinality-filters |      1.77612     |      1.93946     |     0.16333 |     ms |   +9.20% |
|                                                    error rate | filters-agg-with-high-cardinality-filters |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |           filters-per-10-degrees-interval |      2.52592     |      2.55281     |     0.0269  |  ops/s |   +1.06% |
|                                               Mean Throughput |           filters-per-10-degrees-interval |      2.68119     |      2.71924     |     0.03805 |  ops/s |   +1.42% |
|                                             Median Throughput |           filters-per-10-degrees-interval |      2.70415     |      2.73906     |     0.03491 |  ops/s |   +1.29% |
|                                                Max Throughput |           filters-per-10-degrees-interval |      2.74167     |      2.79418     |     0.05251 |  ops/s |   +1.92% |
|                                       50th percentile latency |           filters-per-10-degrees-interval |    357.206       |    349.096       |    -8.11021 |     ms |   -2.27% |
|                                       90th percentile latency |           filters-per-10-degrees-interval |    358.384       |    350.317       |    -8.06631 |     ms |   -2.25% |
|                                      100th percentile latency |           filters-per-10-degrees-interval |    360.014       |    353.38        |    -6.63342 |     ms |   -1.84% |
|                                  50th percentile service time |           filters-per-10-degrees-interval |    357.206       |    349.096       |    -8.11021 |     ms |   -2.27% |
|                                  90th percentile service time |           filters-per-10-degrees-interval |    358.384       |    350.317       |    -8.06631 |     ms |   -2.25% |
|                                 100th percentile service time |           filters-per-10-degrees-interval |    360.014       |    353.38        |    -6.63342 |     ms |   -1.84% |
|                                                    error rate |           filters-per-10-degrees-interval |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                               Mean Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                             Median Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                                Max Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                       50th percentile latency |  filters-agg-with-high-cardinality-filter |      0.854437    |      0.827646    |    -0.02679 |     ms |   -3.14% |
|                                       90th percentile latency |  filters-agg-with-high-cardinality-filter |      1.02233     |      1.07289     |     0.05056 |     ms |   +4.95% |
|                                      100th percentile latency |  filters-agg-with-high-cardinality-filter |      1.64175     |      1.21754     |    -0.42421 |     ms |  -25.84% |
|                                  50th percentile service time |  filters-agg-with-high-cardinality-filter |      0.854437    |      0.827646    |    -0.02679 |     ms |   -3.14% |
|                                  90th percentile service time |  filters-agg-with-high-cardinality-filter |      1.02233     |      1.07289     |     0.05056 |     ms |   +4.95% |
|                                 100th percentile service time |  filters-agg-with-high-cardinality-filter |      1.64175     |      1.21754     |    -0.42421 |     ms |  -25.84% |
|                                                    error rate |  filters-agg-with-high-cardinality-filter |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                               Mean Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                             Median Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                                Max Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                       50th percentile latency |   filter-agg-with-high-cardinality-filter |      0.80975     |      0.749291    |    -0.06046 |     ms |   -7.47% |
|                                       90th percentile latency |   filter-agg-with-high-cardinality-filter |      1.19683     |      1.04217     |    -0.15466 |     ms |  -12.92% |
|                                      100th percentile latency |   filter-agg-with-high-cardinality-filter |      2.49263     |      1.48458     |    -1.00804 |     ms |  -40.44% |
|                                  50th percentile service time |   filter-agg-with-high-cardinality-filter |      0.80975     |      0.749291    |    -0.06046 |     ms |   -7.47% |
|                                  90th percentile service time |   filter-agg-with-high-cardinality-filter |      1.19683     |      1.04217     |    -0.15466 |     ms |  -12.92% |
|                                 100th percentile service time |   filter-agg-with-high-cardinality-filter |      2.49263     |      1.48458     |    -1.00804 |     ms |  -40.44% |
|                                                    error rate |   filter-agg-with-high-cardinality-filter |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |            filter-per-10-degrees-interval |      2.79406     |      2.85966     |     0.06561 |  ops/s |   +2.35% |
|                                               Mean Throughput |            filter-per-10-degrees-interval |      2.81185     |      2.88326     |     0.07141 |  ops/s |   +2.54% |
|                                             Median Throughput |            filter-per-10-degrees-interval |      2.81407     |      2.88606     |     0.07199 |  ops/s |   +2.56% |
|                                                Max Throughput |            filter-per-10-degrees-interval |      2.82022     |      2.89194     |     0.07172 |  ops/s |   +2.54% |
|                                       50th percentile latency |            filter-per-10-degrees-interval |    353.514       |    344.275       |    -9.23923 |     ms |   -2.61% |
|                                       90th percentile latency |            filter-per-10-degrees-interval |    354.635       |    345.801       |    -8.834   |     ms |   -2.49% |
|                                      100th percentile latency |            filter-per-10-degrees-interval |    355.348       |    350.131       |    -5.21658 |     ms |   -1.47% |
|                                  50th percentile service time |            filter-per-10-degrees-interval |    353.514       |    344.275       |    -9.23923 |     ms |   -2.61% |
|                                  90th percentile service time |            filter-per-10-degrees-interval |    354.635       |    345.801       |    -8.834   |     ms |   -2.49% |
|                                 100th percentile service time |            filter-per-10-degrees-interval |    355.348       |    350.131       |    -5.21658 |     ms |   -1.47% |
|                                                    error rate |            filter-per-10-degrees-interval |      0           |      0           |     0       |      % |    0.00% |

jpountz · 2023-09-05T14:29:27Z

A potential idea for speeding this up further would be to specialize for the case when there is a single filter, which I assume is quite common. In this case we can save the overhead of the heap in collect() and in the competitiveIterator().

martijnvg · 2023-09-05T15:18:08Z

A potential idea for speeding this up further would be to specialize for the case when there is a single filter, which I assume is quite common. In this case we can save the overhead of the heap in collect() and in the competitiveIterator().

👍 Let's create a followup issue for this.

jpountz · 2023-09-05T16:49:54Z

I opened #99202.

kkrik-es added 3 commits August 10, 2023 19:04

Merge branch 'main' into fix/97544

d4c29f7

# Conflicts: # server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

Fix function name.

c32622d

elasticsearchmachine added the v8.10.0 label Aug 10, 2023

kkrik-es added 6 commits August 11, 2023 11:48

Advance iterator on two-phase mismatch.

98134b6

Restore docid tracking.

6c17912

Merge branch 'main' into fix/97544

33ee13e

Fix failing tests.

a2c6ba6

Fix failing test.

64474b1

Fix more tests.

718fdd6

kkrik-es marked this pull request as ready for review August 11, 2023 14:44

kkrik-es requested a review from martijnvg August 11, 2023 14:45

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 11, 2023

kkrik-es added :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >feature labels Aug 11, 2023

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Aug 11, 2023

Update docs/changelog/98360.yaml

7b42231

kkrik-es added the needs:triage Requires assignment of a team area label label Aug 11, 2023

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Aug 11, 2023

kkrik-es added 2 commits August 11, 2023 18:42

More test fixes.

3f4e471

Merge remote-tracking branch 'upstream/fix/97544' into fix/97544

7202e99

martijnvg reviewed Aug 14, 2023

View reviewed changes

kkrik-es added >enhancement and removed >feature labels Aug 14, 2023

Update docs/changelog/98360.yaml

0c07463

kkrik-es added 2 commits August 16, 2023 14:54

Skip checking useCompetitiveIterator in collect

0b09fc1

Merge remote-tracking branch 'upstream/fix/97544' into fix/97544

2ac567e

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

Merge branch 'main' into fix/97544

53f6e83

kkrik-es requested a review from jpountz September 4, 2023 10:30

jpountz reviewed Sep 4, 2023

View reviewed changes

Find approximate matches in CompetitiveIterator

cda3d92

martijnvg reviewed Sep 5, 2023

View reviewed changes

Use DisiPriorityQueue to simplify FiltersAggregator

e04b72c

martijnvg reviewed Sep 5, 2023

View reviewed changes

jpountz approved these changes Sep 5, 2023

View reviewed changes

Skip competitive iterator when all docs match.

3df2239

jpountz reviewed Sep 5, 2023

View reviewed changes

Check for empty priority queue.

84f24e0

kkrik-es merged commit 8af9b4a into elastic:main Sep 5, 2023

jpountz mentioned this pull request Sep 5, 2023

Optimize filters aggregation with a single filter #99202

Closed

iverase mentioned this pull request Oct 21, 2024

ES nodes OOM due to unbounded allocation's in Lucene90DocValuesProducer #114941

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a competitive iterator in `FiltersAggregator` #98360

Use a competitive iterator in `FiltersAggregator` #98360

kkrik-es commented Aug 10, 2023

kkrik-es commented Aug 11, 2023

elasticsearchmachine commented Aug 11, 2023

elasticsearchmachine commented Aug 11, 2023

martijnvg left a comment

martijnvg Aug 14, 2023

kkrik-es Aug 14, 2023

martijnvg Aug 15, 2023

martijnvg Sep 5, 2023

martijnvg Aug 14, 2023

martijnvg Aug 14, 2023

kkrik-es Aug 14, 2023

martijnvg Aug 15, 2023

martijnvg Aug 14, 2023

martijnvg Aug 14, 2023

elasticsearchmachine commented Aug 14, 2023

kkrik-es commented Aug 16, 2023 •

edited

Loading

kkrik-es commented Sep 4, 2023

jpountz Sep 4, 2023

jpountz Sep 4, 2023

kkrik-es Sep 4, 2023

jpountz Sep 4, 2023

kkrik-es Sep 5, 2023

martijnvg left a comment

martijnvg Sep 5, 2023

jpountz left a comment

jpountz Sep 5, 2023

martijnvg commented Sep 5, 2023

jpountz Sep 5, 2023

kkrik-es Sep 5, 2023 •

edited

Loading

kkrik-es commented Sep 5, 2023

jpountz commented Sep 5, 2023

martijnvg commented Sep 5, 2023

jpountz commented Sep 5, 2023

Use a competitive iterator in FiltersAggregator #98360

Use a competitive iterator in FiltersAggregator #98360

Conversation

kkrik-es commented Aug 10, 2023

kkrik-es commented Aug 11, 2023

elasticsearchmachine commented Aug 11, 2023

elasticsearchmachine commented Aug 11, 2023

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Aug 14, 2023

kkrik-es commented Aug 16, 2023 • edited Loading

kkrik-es commented Sep 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg commented Sep 5, 2023

Choose a reason for hiding this comment

kkrik-es Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

kkrik-es commented Sep 5, 2023

jpountz commented Sep 5, 2023

martijnvg commented Sep 5, 2023

jpountz commented Sep 5, 2023

Use a competitive iterator in `FiltersAggregator` #98360

Use a competitive iterator in `FiltersAggregator` #98360

kkrik-es commented Aug 16, 2023 •

edited

Loading

kkrik-es Sep 5, 2023 •

edited

Loading