Skip to content

Use a competitive iterator in FiltersAggregator #98360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 5, 2023

Conversation

kkrik-es
Copy link
Contributor

The iterator is used to combine filtering with querying in leaf
collection. Its benefit is that rangers with docs that are filtered out
by all filters are skipped from doc collection.

The competitive iterator is restricted to FiltersAggregator, not used in
FilterByFilterAggregator that's already optimized. It only applies to
top-level filter aggregations with no "other" bucket defined; the latter
leads to collecting all docs so there's no point in skipping doc ranges.

Fixes #97544

The iterator is used to combine filtering with querying in leaf
collection. Its benefit is that rangers with docs that are filtered out
by all filters are skipped from doc collection.

The competitive iterator is restricted to FiltersAggregator, not used in
FilterByFilterAggregator that's already optimized. It only applies to
top-level filter aggregations with no "other" bucket defined; the latter
leads to collecting all docs so there's no point in skipping doc ranges.

Fixes elastic#97544
# Conflicts:
#	server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-2

@kkrik-es kkrik-es marked this pull request as ready for review August 11, 2023 14:44
@kkrik-es kkrik-es requested a review from martijnvg August 11, 2023 14:45
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 11, 2023
@kkrik-es kkrik-es added :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >feature labels Aug 11, 2023
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Aug 11, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@kkrik-es kkrik-es added the needs:triage Requires assignment of a team area label label Aug 11, 2023
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Aug 11, 2023
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few question. This looks in the right direction.
Did you already check what the speedup of this change is?
I think the filter-aggs challenge in the noaa rally track can give us this information.

@@ -107,7 +107,6 @@ public void testLookbackOnly() throws Exception {
startDatafeed(datafeedConfig.getId(), 0L, now);
assertBusy(() -> {
DataCounts dataCounts = getDataCounts(job.getId());
assertThat(dataCounts.getProcessedRecordCount(), equalTo(numDocs + numDocs2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changes to ML tests needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the competitive iterator kicks in sometimes and changess these metadata. We probably need someone who knows more about these tests to review the changes and, more importantly, assert that the actual output doesn't change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs double checking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether filters aggs get used by data feeds, but even it is being used the counts that are returned shouldn't change. How did these assertions fail?

final int maxDocId;

// Tracks next doc matches as they get returned from the per-filter iterators.
final PriorityQueue<Integer> nextDocs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use Lucene's PriorityQueue instead here and that contains DocIdSetIterator instances?
(PriorityQueue#lessThan() would compare DocIdSetIterartor#docId() and each time a DocIdSetIterartor #advance(...) is invoked PriorityQueue#updateTop(...) should be invoked )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this pattern is used in other places as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm that may be somewhat tricky to use as we want to keep the last doc per filter to update the corresponding bucket. Let's discuss this offline in our sync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can extract the last docid from this priority queue and use it in several places. But let's chat more about this in the sync.


final DocIdSetIterator[] filteredDocIdSetIterators;
final int[] lastMatches;
final DocId currentDocID;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the DocId class? This can just be a mutable int field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I see, this used in collector as well...

@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've updated the changelog YAML for you.

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Aug 16, 2023

Rally reports a decent improvement, up to 18% win in 50 pct latency:

|                                                        Metric |                                      Task |         Baseline |        Contender |         Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|------------------------------------------:|-----------------:|-----------------:|-------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|             Min cumulative indexing time across primary shard |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|          Median cumulative indexing time across primary shard |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|             Max cumulative indexing time across primary shard |                                           |     18.9263      |     19.8058      |      0.87942 |    min |   +4.65% |
|           Cumulative indexing throttle time of primary shards |                                           |      0           |      0           |      0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                           |      0           |      0           |      0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                           |      0           |      0           |      0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                           |      0           |      0           |      0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|                      Cumulative merge count of primary shards |                                           |     41           |     40           |     -1       |        |   -2.44% |
|                Min cumulative merge time across primary shard |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|             Median cumulative merge time across primary shard |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|                Max cumulative merge time across primary shard |                                           |      5.92372     |      6.43607     |      0.51235 |    min |   +8.65% |
|              Cumulative merge throttle time of primary shards |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|       Min cumulative merge throttle time across primary shard |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|    Median cumulative merge throttle time across primary shard |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|       Max cumulative merge throttle time across primary shard |                                           |      0.545783    |      0.620317    |      0.07453 |    min |  +13.66% |
|                     Cumulative refresh time of primary shards |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|                    Cumulative refresh count of primary shards |                                           |     22           |     21           |     -1       |        |   -4.55% |
|              Min cumulative refresh time across primary shard |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|           Median cumulative refresh time across primary shard |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|              Max cumulative refresh time across primary shard |                                           |      0.177633    |      0.154333    |     -0.0233  |    min |  -13.12% |
|                       Cumulative flush time of primary shards |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|                      Cumulative flush count of primary shards |                                           |      4           |      3           |     -1       |        |  -25.00% |
|                Min cumulative flush time across primary shard |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|             Median cumulative flush time across primary shard |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|                Max cumulative flush time across primary shard |                                           |      0.171267    |      0.133933    |     -0.03733 |    min |  -21.80% |
|                                       Total Young Gen GC time |                                           |      8.439       |      8.482       |      0.043   |      s |   +0.51% |
|                                      Total Young Gen GC count |                                           |   2271           |   2282           |     11       |        |   +0.48% |
|                                         Total Old Gen GC time |                                           |      0           |      0           |      0       |      s |    0.00% |
|                                        Total Old Gen GC count |                                           |      0           |      0           |      0       |        |    0.00% |
|                                                    Store size |                                           |      8.31853     |      8.37377     |      0.05524 |     GB |   +0.66% |
|                                                 Translog size |                                           |      5.12227e-08 |      5.12227e-08 |      0       |     GB |    0.00% |
|                                        Heap used for segments |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                      Heap used for doc values |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for terms |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for norms |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                          Heap used for points |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                   Heap used for stored fields |                                           |      0           |      0           |      0       |     MB |    0.00% |
|                                                 Segment count |                                           |     30           |     26           |     -4       |        |  -13.33% |
|                                   Total Ingest Pipeline count |                                           |      0           |      0           |      0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                                           |      0           |      0           |      0       |     ms |    0.00% |
|                                  Total Ingest Pipeline failed |                                           |      0           |      0           |      0       |        |    0.00% |
|                                                Min Throughput |                                     index | 104997           | 110246           |   5248.8     | docs/s |   +5.00% |
|                                               Mean Throughput |                                     index | 182787           | 176019           |  -6767.43    | docs/s |   -3.70% |
|                                             Median Throughput |                                     index | 187914           | 179756           |  -8157.83    | docs/s |   -4.34% |
|                                                Max Throughput |                                     index | 192355           | 182198           | -10157       | docs/s |   -5.28% |
|                                       50th percentile latency |                                     index |    164.601       |    172.243       |      7.64185 |     ms |   +4.64% |
|                                       90th percentile latency |                                     index |    210.395       |    219.667       |      9.27198 |     ms |   +4.41% |
|                                       99th percentile latency |                                     index |   1012.29        |   1067.79        |     55.498   |     ms |   +5.48% |
|                                     99.9th percentile latency |                                     index |   1248.07        |   1264.29        |     16.2213  |     ms |   +1.30% |
|                                      100th percentile latency |                                     index |   1615.45        |   2107.48        |    492.022   |     ms |  +30.46% |
|                                  50th percentile service time |                                     index |    164.601       |    172.243       |      7.64185 |     ms |   +4.64% |
|                                  90th percentile service time |                                     index |    210.395       |    219.667       |      9.27198 |     ms |   +4.41% |
|                                  99th percentile service time |                                     index |   1012.29        |   1067.79        |     55.498   |     ms |   +5.48% |
|                                99.9th percentile service time |                                     index |   1248.07        |   1264.29        |     16.2213  |     ms |   +1.30% |
|                                 100th percentile service time |                                     index |   1615.45        |   2107.48        |    492.022   |     ms |  +30.46% |
|                                                    error rate |                                     index |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                               Mean Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                             Median Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                                Max Throughput |  filters-agg-with-low-cardinality-filters |    144.834       |    156.926       |     12.0922  |  ops/s |   +8.35% |
|                                       50th percentile latency |  filters-agg-with-low-cardinality-filters |      2.52713     |      2.07235     |     -0.45477 |     ms |  -18.00% |
|                                       90th percentile latency |  filters-agg-with-low-cardinality-filters |      2.93371     |      2.45436     |     -0.47935 |     ms |  -16.34% |
|                                      100th percentile latency |  filters-agg-with-low-cardinality-filters |      4.88392     |      4.55642     |     -0.3275  |     ms |   -6.71% |
|                                  50th percentile service time |  filters-agg-with-low-cardinality-filters |      2.52713     |      2.07235     |     -0.45477 |     ms |  -18.00% |
|                                  90th percentile service time |  filters-agg-with-low-cardinality-filters |      2.93371     |      2.45436     |     -0.47935 |     ms |  -16.34% |
|                                 100th percentile service time |  filters-agg-with-low-cardinality-filters |      4.88392     |      4.55642     |     -0.3275  |     ms |   -6.71% |
|                                                    error rate |  filters-agg-with-low-cardinality-filters |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                               Mean Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                             Median Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                                Max Throughput | filters-agg-with-high-cardinality-filters |    396.708       |    401.998       |      5.28949 |  ops/s |   +1.33% |
|                                       50th percentile latency | filters-agg-with-high-cardinality-filters |      1.5521      |      1.61546     |      0.06335 |     ms |   +4.08% |
|                                       90th percentile latency | filters-agg-with-high-cardinality-filters |      1.9342      |      1.97363     |      0.03943 |     ms |   +2.04% |
|                                      100th percentile latency | filters-agg-with-high-cardinality-filters |      2.48071     |      2.3245      |     -0.15621 |     ms |   -6.30% |
|                                  50th percentile service time | filters-agg-with-high-cardinality-filters |      1.5521      |      1.61546     |      0.06335 |     ms |   +4.08% |
|                                  90th percentile service time | filters-agg-with-high-cardinality-filters |      1.9342      |      1.97363     |      0.03943 |     ms |   +2.04% |
|                                 100th percentile service time | filters-agg-with-high-cardinality-filters |      2.48071     |      2.3245      |     -0.15621 |     ms |   -6.30% |
|                                                    error rate | filters-agg-with-high-cardinality-filters |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |           filters-per-10-degrees-interval |      1.83235     |      1.99487     |      0.16252 |  ops/s |   +8.87% |
|                                               Mean Throughput |           filters-per-10-degrees-interval |      2.03136     |      2.12661     |      0.09525 |  ops/s |   +4.69% |
|                                             Median Throughput |           filters-per-10-degrees-interval |      2.05562     |      2.14339     |      0.08776 |  ops/s |   +4.27% |
|                                                Max Throughput |           filters-per-10-degrees-interval |      2.1094      |      2.17495     |      0.06555 |  ops/s |   +3.11% |
|                                       50th percentile latency |           filters-per-10-degrees-interval |    456.609       |    448.627       |     -7.9825  |     ms |   -1.75% |
|                                       90th percentile latency |           filters-per-10-degrees-interval |    461.71        |    451.149       |    -10.5617  |     ms |   -2.29% |
|                                      100th percentile latency |           filters-per-10-degrees-interval |    489.581       |    455.286       |    -34.2948  |     ms |   -7.00% |
|                                  50th percentile service time |           filters-per-10-degrees-interval |    456.609       |    448.627       |     -7.9825  |     ms |   -1.75% |
|                                  90th percentile service time |           filters-per-10-degrees-interval |    461.71        |    451.149       |    -10.5617  |     ms |   -2.29% |
|                                 100th percentile service time |           filters-per-10-degrees-interval |    489.581       |    455.286       |    -34.2948  |     ms |   -7.00% |
|                                                    error rate |           filters-per-10-degrees-interval |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                               Mean Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                             Median Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                                Max Throughput |  filters-agg-with-high-cardinality-filter |    529.173       |    606.764       |     77.5907  |  ops/s |  +14.66% |
|                                       50th percentile latency |  filters-agg-with-high-cardinality-filter |      1.26246     |      1.12433     |     -0.13813 |     ms |  -10.94% |
|                                       90th percentile latency |  filters-agg-with-high-cardinality-filter |      1.6089      |      1.41743     |     -0.19147 |     ms |  -11.90% |
|                                      100th percentile latency |  filters-agg-with-high-cardinality-filter |      1.87308     |      1.66788     |     -0.20521 |     ms |  -10.96% |
|                                  50th percentile service time |  filters-agg-with-high-cardinality-filter |      1.26246     |      1.12433     |     -0.13813 |     ms |  -10.94% |
|                                  90th percentile service time |  filters-agg-with-high-cardinality-filter |      1.6089      |      1.41743     |     -0.19147 |     ms |  -11.90% |
|                                 100th percentile service time |  filters-agg-with-high-cardinality-filter |      1.87308     |      1.66788     |     -0.20521 |     ms |  -10.96% |
|                                                    error rate |  filters-agg-with-high-cardinality-filter |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                               Mean Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                             Median Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                                Max Throughput |   filter-agg-with-high-cardinality-filter |    485.843       |    595.453       |    109.611   |  ops/s |  +22.56% |
|                                       50th percentile latency |   filter-agg-with-high-cardinality-filter |      1.14133     |      1.01967     |     -0.12167 |     ms |  -10.66% |
|                                       90th percentile latency |   filter-agg-with-high-cardinality-filter |      1.68486     |      1.37135     |     -0.31351 |     ms |  -18.61% |
|                                      100th percentile latency |   filter-agg-with-high-cardinality-filter |      7.11062     |      5.19767     |     -1.91296 |     ms |  -26.90% |
|                                  50th percentile service time |   filter-agg-with-high-cardinality-filter |      1.14133     |      1.01967     |     -0.12167 |     ms |  -10.66% |
|                                  90th percentile service time |   filter-agg-with-high-cardinality-filter |      1.68486     |      1.37135     |     -0.31351 |     ms |  -18.61% |
|                                 100th percentile service time |   filter-agg-with-high-cardinality-filter |      7.11062     |      5.19767     |     -1.91296 |     ms |  -26.90% |
|                                                    error rate |   filter-agg-with-high-cardinality-filter |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            filter-per-10-degrees-interval |      2.15795     |      2.17317     |      0.01522 |  ops/s |   +0.71% |
|                                               Mean Throughput |            filter-per-10-degrees-interval |      2.18321     |      2.20445     |      0.02123 |  ops/s |   +0.97% |
|                                             Median Throughput |            filter-per-10-degrees-interval |      2.18675     |      2.20842     |      0.02166 |  ops/s |   +0.99% |
|                                                Max Throughput |            filter-per-10-degrees-interval |      2.19188     |      2.21463     |      0.02275 |  ops/s |   +1.04% |
|                                       50th percentile latency |            filter-per-10-degrees-interval |    454.153       |    449.334       |     -4.81954 |     ms |   -1.06% |
|                                       90th percentile latency |            filter-per-10-degrees-interval |    456.357       |    451.217       |     -5.13981 |     ms |   -1.13% |
|                                      100th percentile latency |            filter-per-10-degrees-interval |    459.647       |    456.502       |     -3.14542 |     ms |   -0.68% |
|                                  50th percentile service time |            filter-per-10-degrees-interval |    454.153       |    449.334       |     -4.81954 |     ms |   -1.06% |
|                                  90th percentile service time |            filter-per-10-degrees-interval |    456.357       |    451.217       |     -5.13981 |     ms |   -1.13% |
|                                 100th percentile service time |            filter-per-10-degrees-interval |    459.647       |    456.502       |     -3.14542 |     ms |   -0.68% |
|                                                    error rate |            filter-per-10-degrees-interval |      0           |      0           |      0       |      % |    0.00% |

@quux00 quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023
@kkrik-es kkrik-es requested a review from jpountz September 4, 2023 10:30
@kkrik-es
Copy link
Contributor Author

kkrik-es commented Sep 4, 2023

Adding @jpountz to take a look on search side, please feel free to reassign if too busy.

final TwoPhaseIterator twoPhase = scorer.twoPhaseIterator();
final DocIdSetIterator iterator = (twoPhase != null) ? twoPhase.approximation() : scorer.iterator();

return new DocIdSetIterator() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI Lucene already has this via TwoPhaseIterator#asDocIdSetIterator. That said, I don't think we should use it because it defeats the point of using a TwoPhaseIterator? For instance, if the main query matches doc IDs 3 and 10 only, and we call advance(3) on the filter of the filter aggregation: if 3 doesn't match, it will then try out 4 then 5... until it finds one that matches, but we don't care about them since they don't match the query?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the competitive aggregator would ignore the TwoPhaseIterator and just look at the approximation part, so that TwoPhaseIterator#matches would only be called when both the main query and the filter match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adrien. I updated FiltersAggregator, is this what you had in mind? If so, I'll go ahead and revert changes to Lucene.java.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. I pushed a small change that shows what I have in mind athttps://github.com/elastic/elasticsearch/compare/main...jpountz:elasticsearch:filter_competitive_iterator_poc?expand=1. Beware it's not tested and has at least one bug, but hopefully it helps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is much cleaner. Ptal

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! The only remaining issue is the changes to the ml data feed integration tests.

@@ -107,7 +107,6 @@ public void testLookbackOnly() throws Exception {
startDatafeed(datafeedConfig.getId(), 0L, now);
assertBusy(() -> {
DataCounts dataCounts = getDataCounts(job.getId());
assertThat(dataCounts.getProcessedRecordCount(), equalTo(numDocs + numDocs2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether filters aggs get used by data feeds, but even it is being used the counts that are returned shouldn't change. How did these assertions fail?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left minor suggestions but this looks good to me. Is my understanding correct that this change also optimizes the filter aggregation, in addition to the filters aggregation? I'd be interested in looking at updated benchmark numbers since the logic changed since you first looked into performance numbers.

IntPredicate[] docFilters = new IntPredicate[filters().size()];
for (int filterOrd = 0; filterOrd < filters().size(); filterOrd++) {
docFilters[filterOrd] = filters().get(filterOrd).matchingDocIds(aggCtx.getLeafReaderContext());
final boolean useCompetitiveIterator = (parent == null && otherBucketKey == null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should also not use the competitive iterator if the sum of the costs (scorer.iterator().cost()) of filters is greater than or equal to maxDoc, which suggests that most documents would match one filter, ie. configuring a competitive iterator would add a bit of overhead but unlikely save evaluating some documents.

@martijnvg
Copy link
Member

Is my understanding correct that this change also optimizes the filter aggregation, in addition to the filters aggregation?

Yes, the FilterAggregator under the hood adopts to the FiltersAggregator.

public DocIdSetIterator competitiveIterator() throws IOException {
if (useCompetitiveIterator) {
// A DocIdSetIterator view of the filterIterators heap
return new DisjunctionDISIApproximation(filterIterators);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code of DisjunctionDISIApproximation, it will not like if filterIterators is empty. Maybe check for emptiness and return a DocIdSetIterator.empty() in that case? It should be possible to write a test case by using filters on terms that do not exist.

Copy link
Contributor Author

@kkrik-es kkrik-es Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah that's why I was returning a scorer with an empty DISI instead of null. I updated the code to address this.

Note that, if filterIterators is empty, we won't be using the competitive iterator in the latest version.

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Sep 5, 2023

The numbers I got from my latest run are more of a mixed bag. Granted, I'm using a different (better) laptop (M2 vs M1, 32GB vs 16GB), but I'd hope there would be a more clear benefit across the board. Still, there's a win in tail latency up to 40%:

|                                                        Metric |                                      Task |         Baseline |        Contender |        Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|------------------------------------------:|-----------------:|-----------------:|------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|             Min cumulative indexing time across primary shard |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|          Median cumulative indexing time across primary shard |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|             Max cumulative indexing time across primary shard |                                           |     15.5604      |     15.3616      |    -0.19883 |    min |   -1.28% |
|           Cumulative indexing throttle time of primary shards |                                           |      0           |      0           |     0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                           |      0           |      0           |     0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                           |      0           |      0           |     0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                           |      0           |      0           |     0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|                      Cumulative merge count of primary shards |                                           |     42           |     41           |    -1       |        |   -2.38% |
|                Min cumulative merge time across primary shard |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|             Median cumulative merge time across primary shard |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|                Max cumulative merge time across primary shard |                                           |      5.44072     |      5.5067      |     0.06598 |    min |   +1.21% |
|              Cumulative merge throttle time of primary shards |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|       Min cumulative merge throttle time across primary shard |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|    Median cumulative merge throttle time across primary shard |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|       Max cumulative merge throttle time across primary shard |                                           |      0.743767    |      0.569167    |    -0.1746  |    min |  -23.48% |
|                     Cumulative refresh time of primary shards |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|                    Cumulative refresh count of primary shards |                                           |     23           |     24           |     1       |        |   +4.35% |
|              Min cumulative refresh time across primary shard |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|           Median cumulative refresh time across primary shard |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|              Max cumulative refresh time across primary shard |                                           |      0.1464      |      0.154667    |     0.00827 |    min |   +5.65% |
|                       Cumulative flush time of primary shards |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|                      Cumulative flush count of primary shards |                                           |      4           |      4           |     0       |        |    0.00% |
|                Min cumulative flush time across primary shard |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|             Median cumulative flush time across primary shard |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|                Max cumulative flush time across primary shard |                                           |      0.124033    |      0.140567    |     0.01653 |    min |  +13.33% |
|                                       Total Young Gen GC time |                                           |      6.696       |      6.83        |     0.134   |      s |   +2.00% |
|                                      Total Young Gen GC count |                                           |   2292           |   2317           |    25       |        |   +1.09% |
|                                         Total Old Gen GC time |                                           |      0           |      0           |     0       |      s |    0.00% |
|                                        Total Old Gen GC count |                                           |      0           |      0           |     0       |        |    0.00% |
|                                                    Store size |                                           |      8.30888     |      7.98571     |    -0.32317 |     GB |   -3.89% |
|                                                 Translog size |                                           |      5.12227e-08 |      5.12227e-08 |     0       |     GB |    0.00% |
|                                        Heap used for segments |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                      Heap used for doc values |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                           Heap used for terms |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                           Heap used for norms |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                          Heap used for points |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                   Heap used for stored fields |                                           |      0           |      0           |     0       |     MB |    0.00% |
|                                                 Segment count |                                           |     28           |     30           |     2       |        |   +7.14% |
|                                   Total Ingest Pipeline count |                                           |      0           |      0           |     0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                                           |      0           |      0           |     0       |     ms |    0.00% |
|                                  Total Ingest Pipeline failed |                                           |      0           |      0           |     0       |        |    0.00% |
|                                                Min Throughput |                                     index | 147040           | 144539           | -2501.53    | docs/s |   -1.70% |
|                                               Mean Throughput |                                     index | 215127           | 218379           |  3252.05    | docs/s |   +1.51% |
|                                             Median Throughput |                                     index | 220766           | 224646           |  3880.11    | docs/s |   +1.76% |
|                                                Max Throughput |                                     index | 222988           | 229950           |  6962.6     | docs/s |   +3.12% |
|                                       50th percentile latency |                                     index |    134.935       |    132.967       |    -1.96869 |     ms |   -1.46% |
|                                       90th percentile latency |                                     index |    173.228       |    169.454       |    -3.7734  |     ms |   -2.18% |
|                                       99th percentile latency |                                     index |    849.205       |    860.867       |    11.6618  |     ms |   +1.37% |
|                                     99.9th percentile latency |                                     index |    970.257       |   1030.91        |    60.6566  |     ms |   +6.25% |
|                                      100th percentile latency |                                     index |   1030.54        |   1620.65        |   590.11    |     ms |  +57.26% |
|                                  50th percentile service time |                                     index |    134.935       |    132.967       |    -1.96869 |     ms |   -1.46% |
|                                  90th percentile service time |                                     index |    173.228       |    169.454       |    -3.7734  |     ms |   -2.18% |
|                                  99th percentile service time |                                     index |    849.205       |    860.867       |    11.6618  |     ms |   +1.37% |
|                                99.9th percentile service time |                                     index |    970.257       |   1030.91        |    60.6566  |     ms |   +6.25% |
|                                 100th percentile service time |                                     index |   1030.54        |   1620.65        |   590.11    |     ms |  +57.26% |
|                                                    error rate |                                     index |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                               Mean Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                             Median Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                                Max Throughput |  filters-agg-with-low-cardinality-filters |    197.418       |    197.752       |     0.33477 |  ops/s |   +0.17% |
|                                       50th percentile latency |  filters-agg-with-low-cardinality-filters |      1.62896     |      1.58442     |    -0.04454 |     ms |   -2.73% |
|                                       90th percentile latency |  filters-agg-with-low-cardinality-filters |      1.88333     |      2.16011     |     0.27678 |     ms |  +14.70% |
|                                      100th percentile latency |  filters-agg-with-low-cardinality-filters |      4.00417     |      7.05975     |     3.05558 |     ms |  +76.31% |
|                                  50th percentile service time |  filters-agg-with-low-cardinality-filters |      1.62896     |      1.58442     |    -0.04454 |     ms |   -2.73% |
|                                  90th percentile service time |  filters-agg-with-low-cardinality-filters |      1.88333     |      2.16011     |     0.27678 |     ms |  +14.70% |
|                                 100th percentile service time |  filters-agg-with-low-cardinality-filters |      4.00417     |      7.05975     |     3.05558 |     ms |  +76.31% |
|                                                    error rate |  filters-agg-with-low-cardinality-filters |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                               Mean Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                             Median Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                                Max Throughput | filters-agg-with-high-cardinality-filters |    497.959       |    507.121       |     9.16147 |  ops/s |   +1.84% |
|                                       50th percentile latency | filters-agg-with-high-cardinality-filters |      1.26246     |      1.27048     |     0.00802 |     ms |   +0.64% |
|                                       90th percentile latency | filters-agg-with-high-cardinality-filters |      1.55798     |      1.68032     |     0.12235 |     ms |   +7.85% |
|                                      100th percentile latency | filters-agg-with-high-cardinality-filters |      1.77612     |      1.93946     |     0.16333 |     ms |   +9.20% |
|                                  50th percentile service time | filters-agg-with-high-cardinality-filters |      1.26246     |      1.27048     |     0.00802 |     ms |   +0.64% |
|                                  90th percentile service time | filters-agg-with-high-cardinality-filters |      1.55798     |      1.68032     |     0.12235 |     ms |   +7.85% |
|                                 100th percentile service time | filters-agg-with-high-cardinality-filters |      1.77612     |      1.93946     |     0.16333 |     ms |   +9.20% |
|                                                    error rate | filters-agg-with-high-cardinality-filters |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |           filters-per-10-degrees-interval |      2.52592     |      2.55281     |     0.0269  |  ops/s |   +1.06% |
|                                               Mean Throughput |           filters-per-10-degrees-interval |      2.68119     |      2.71924     |     0.03805 |  ops/s |   +1.42% |
|                                             Median Throughput |           filters-per-10-degrees-interval |      2.70415     |      2.73906     |     0.03491 |  ops/s |   +1.29% |
|                                                Max Throughput |           filters-per-10-degrees-interval |      2.74167     |      2.79418     |     0.05251 |  ops/s |   +1.92% |
|                                       50th percentile latency |           filters-per-10-degrees-interval |    357.206       |    349.096       |    -8.11021 |     ms |   -2.27% |
|                                       90th percentile latency |           filters-per-10-degrees-interval |    358.384       |    350.317       |    -8.06631 |     ms |   -2.25% |
|                                      100th percentile latency |           filters-per-10-degrees-interval |    360.014       |    353.38        |    -6.63342 |     ms |   -1.84% |
|                                  50th percentile service time |           filters-per-10-degrees-interval |    357.206       |    349.096       |    -8.11021 |     ms |   -2.27% |
|                                  90th percentile service time |           filters-per-10-degrees-interval |    358.384       |    350.317       |    -8.06631 |     ms |   -2.25% |
|                                 100th percentile service time |           filters-per-10-degrees-interval |    360.014       |    353.38        |    -6.63342 |     ms |   -1.84% |
|                                                    error rate |           filters-per-10-degrees-interval |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                               Mean Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                             Median Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                                Max Throughput |  filters-agg-with-high-cardinality-filter |    743.456       |    729.694       |   -13.7619  |  ops/s |   -1.85% |
|                                       50th percentile latency |  filters-agg-with-high-cardinality-filter |      0.854437    |      0.827646    |    -0.02679 |     ms |   -3.14% |
|                                       90th percentile latency |  filters-agg-with-high-cardinality-filter |      1.02233     |      1.07289     |     0.05056 |     ms |   +4.95% |
|                                      100th percentile latency |  filters-agg-with-high-cardinality-filter |      1.64175     |      1.21754     |    -0.42421 |     ms |  -25.84% |
|                                  50th percentile service time |  filters-agg-with-high-cardinality-filter |      0.854437    |      0.827646    |    -0.02679 |     ms |   -3.14% |
|                                  90th percentile service time |  filters-agg-with-high-cardinality-filter |      1.02233     |      1.07289     |     0.05056 |     ms |   +4.95% |
|                                 100th percentile service time |  filters-agg-with-high-cardinality-filter |      1.64175     |      1.21754     |    -0.42421 |     ms |  -25.84% |
|                                                    error rate |  filters-agg-with-high-cardinality-filter |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                               Mean Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                             Median Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                                Max Throughput |   filter-agg-with-high-cardinality-filter |    710.018       |    760.931       |    50.9128  |  ops/s |   +7.17% |
|                                       50th percentile latency |   filter-agg-with-high-cardinality-filter |      0.80975     |      0.749291    |    -0.06046 |     ms |   -7.47% |
|                                       90th percentile latency |   filter-agg-with-high-cardinality-filter |      1.19683     |      1.04217     |    -0.15466 |     ms |  -12.92% |
|                                      100th percentile latency |   filter-agg-with-high-cardinality-filter |      2.49263     |      1.48458     |    -1.00804 |     ms |  -40.44% |
|                                  50th percentile service time |   filter-agg-with-high-cardinality-filter |      0.80975     |      0.749291    |    -0.06046 |     ms |   -7.47% |
|                                  90th percentile service time |   filter-agg-with-high-cardinality-filter |      1.19683     |      1.04217     |    -0.15466 |     ms |  -12.92% |
|                                 100th percentile service time |   filter-agg-with-high-cardinality-filter |      2.49263     |      1.48458     |    -1.00804 |     ms |  -40.44% |
|                                                    error rate |   filter-agg-with-high-cardinality-filter |      0           |      0           |     0       |      % |    0.00% |
|                                                Min Throughput |            filter-per-10-degrees-interval |      2.79406     |      2.85966     |     0.06561 |  ops/s |   +2.35% |
|                                               Mean Throughput |            filter-per-10-degrees-interval |      2.81185     |      2.88326     |     0.07141 |  ops/s |   +2.54% |
|                                             Median Throughput |            filter-per-10-degrees-interval |      2.81407     |      2.88606     |     0.07199 |  ops/s |   +2.56% |
|                                                Max Throughput |            filter-per-10-degrees-interval |      2.82022     |      2.89194     |     0.07172 |  ops/s |   +2.54% |
|                                       50th percentile latency |            filter-per-10-degrees-interval |    353.514       |    344.275       |    -9.23923 |     ms |   -2.61% |
|                                       90th percentile latency |            filter-per-10-degrees-interval |    354.635       |    345.801       |    -8.834   |     ms |   -2.49% |
|                                      100th percentile latency |            filter-per-10-degrees-interval |    355.348       |    350.131       |    -5.21658 |     ms |   -1.47% |
|                                  50th percentile service time |            filter-per-10-degrees-interval |    353.514       |    344.275       |    -9.23923 |     ms |   -2.61% |
|                                  90th percentile service time |            filter-per-10-degrees-interval |    354.635       |    345.801       |    -8.834   |     ms |   -2.49% |
|                                 100th percentile service time |            filter-per-10-degrees-interval |    355.348       |    350.131       |    -5.21658 |     ms |   -1.47% |
|                                                    error rate |            filter-per-10-degrees-interval |      0           |      0           |     0       |      % |    0.00% |

@kkrik-es kkrik-es merged commit 8af9b4a into elastic:main Sep 5, 2023
@jpountz
Copy link
Contributor

jpountz commented Sep 5, 2023

A potential idea for speeding this up further would be to specialize for the case when there is a single filter, which I assume is quite common. In this case we can save the overhead of the heap in collect() and in the competitiveIterator().

@martijnvg
Copy link
Member

A potential idea for speeding this up further would be to specialize for the case when there is a single filter, which I assume is quite common. In this case we can save the overhead of the heap in collect() and in the competitiveIterator().

👍 Let's create a followup issue for this.

@jpountz
Copy link
Contributor

jpountz commented Sep 5, 2023

I opened #99202.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement competitiveIterator() on FilterAggregator
5 participants