-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Implement competitiveIterator() on FilterAggregator #97544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
:Analytics/Aggregations
Aggregations
>enhancement
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Comments
Pinging @elastic/es-analytics-geo (Team:Analytics) |
kkrik-es
added a commit
to kkrik-es/elasticsearch
that referenced
this issue
Aug 10, 2023
The iterator is used to combine filtering with querying in leaf collection. Its benefit is that rangers with docs that are filtered out by all filters are skipped from doc collection. The competitive iterator is restricted to FiltersAggregator, not used in FilterByFilterAggregator that's already optimized. It only applies to top-level filter aggregations with no "other" bucket defined; the latter leads to collecting all docs so there's no point in skipping doc ranges. Fixes elastic#97544
kkrik-es
added a commit
that referenced
this issue
Sep 5, 2023
* Use a competitive iterator in FiltersAggregator. The iterator is used to combine filtering with querying in leaf collection. Its benefit is that rangers with docs that are filtered out by all filters are skipped from doc collection. The competitive iterator is restricted to FiltersAggregator, not used in FilterByFilterAggregator that's already optimized. It only applies to top-level filter aggregations with no "other" bucket defined; the latter leads to collecting all docs so there's no point in skipping doc ranges. Fixes #97544 * Fix function name. * Advance iterator on two-phase mismatch. * Restore docid tracking. * Fix failing tests. * Fix failing test. * Fix more tests. * Update docs/changelog/98360.yaml * More test fixes. * Update docs/changelog/98360.yaml * Skip checking useCompetitiveIterator in collect * Find approximate matches in CompetitiveIterator * Use DisiPriorityQueue to simplify FiltersAggregator * Skip competitive iterator when all docs match. * Check for empty priority queue.
kkrik-es
added a commit
that referenced
this issue
Sep 6, 2023
* Use a competitive iterator in FiltersAggregator. The iterator is used to combine filtering with querying in leaf collection. Its benefit is that rangers with docs that are filtered out by all filters are skipped from doc collection. The competitive iterator is restricted to FiltersAggregator, not used in FilterByFilterAggregator that's already optimized. It only applies to top-level filter aggregations with no "other" bucket defined; the latter leads to collecting all docs so there's no point in skipping doc ranges. Fixes #97544 * Fix function name. * Advance iterator on two-phase mismatch. * Restore docid tracking. * Fix failing tests. * Fix failing test. * Fix more tests. * Update docs/changelog/98360.yaml * More test fixes. * Update docs/changelog/98360.yaml * Skip checking useCompetitiveIterator in collect * Find approximate matches in CompetitiveIterator * Use DisiPriorityQueue to simplify FiltersAggregator * Skip competitive iterator when all docs match. * Check for empty priority queue. * Skip DisiPriorityQueue on single filter agg. When FiltersAggregator has a single filter, there is no benefit in using a DisiPriorityQueue as the heap will only contain values from a single iterator. In such a case, it's preferable to use the filtering approximation iterator directly as competitive iterator. Fixes #99202 * Update docs/changelog/99215.yaml * Use FilterMatchingDisiWrapper in leaf collectors.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/Aggregations
Aggregations
>enhancement
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Description
We have had this longstanding issue that running a top-level
filter
aggregation is often (not always, e.g. not if you need the total hit count too) a performance bug, as Elasticsearch would iterate over docs that match thequery
and would then check each of these docs against the filter of thefilter
aggregation. It's usually a better idea to put the filter directly into the query, where it can be used to drive iteration if appropriate.We could do this automatically by implementing the
LeafCollector#competitiveIterator
API that was introduced for dynamic pruning, in order to automatically start intersecting the query with the filter of thefilter
aggregator once enough hits have been counted.The text was updated successfully, but these errors were encountered: