Skip to content

Optimize filters aggregation with a single filter #99202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jpountz opened this issue Sep 5, 2023 · 1 comment · Fixed by #99215
Closed

Optimize filters aggregation with a single filter #99202

jpountz opened this issue Sep 5, 2023 · 1 comment · Fixed by #99215
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@jpountz
Copy link
Contributor

jpountz commented Sep 5, 2023

Description

Follow-up of #98360: when there is a single filter, the collector could save the overhead of the priority queue (both in collect() and competitiveIterator()), which would likely result in a speedup.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 5, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Sep 6, 2023
When FiltersAggregator has a single filter, there is no benefit in using
a DisiPriorityQueue as the heap will only contain values from a single
iterator. In such a case, it's preferable to use the filtering
approximation iterator directly as competitive iterator.

Fixes elastic#99202
kkrik-es added a commit that referenced this issue Sep 6, 2023
* Use a competitive iterator in FiltersAggregator.

The iterator is used to combine filtering with querying in leaf
collection. Its benefit is that rangers with docs that are filtered out
by all filters are skipped from doc collection.

The competitive iterator is restricted to FiltersAggregator, not used in
FilterByFilterAggregator that's already optimized. It only applies to
top-level filter aggregations with no "other" bucket defined; the latter
leads to collecting all docs so there's no point in skipping doc ranges.

Fixes #97544

* Fix function name.

* Advance iterator on two-phase mismatch.

* Restore docid tracking.

* Fix failing tests.

* Fix failing test.

* Fix more tests.

* Update docs/changelog/98360.yaml

* More test fixes.

* Update docs/changelog/98360.yaml

* Skip checking useCompetitiveIterator in collect

* Find approximate matches in CompetitiveIterator

* Use DisiPriorityQueue to simplify FiltersAggregator

* Skip competitive iterator when all docs match.

* Check for empty priority queue.

* Skip DisiPriorityQueue on single filter agg.

When FiltersAggregator has a single filter, there is no benefit in using
a DisiPriorityQueue as the heap will only contain values from a single
iterator. In such a case, it's preferable to use the filtering
approximation iterator directly as competitive iterator.

Fixes #99202

* Update docs/changelog/99215.yaml

* Use FilterMatchingDisiWrapper in leaf collectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants