Skip to content

ESQL: Compute engine support for tagged queries (#128521) #128638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 11, 2025

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 29, 2025

Begins adding support for running "tagged queries" to the compute engine. Here, it's just the LuceneSourceOperator because that's useful and contained.

Example time! Say you are running:

FROM foo
| STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000)

It's often faster to run this as four queries:

  • The docs that round to 0
  • The docs that round to 100
  • The docs that round to 1000
  • The docs that round to 100000

This creates an ESQL operator that can run these queries, one after the other and attach those tags.

Aggs uses this trick and it's way faster when it can push down count queries, but it's still faster when it pushes doc loading things. This implementation in LuceneSourceOperator is quite similar to the doc loading version in _search.

I don't have performance measurements yet because I haven't plugged this into the language. In _search we call this filter-by-filter and enable it when each group averages to more than 5000 documents and when there isn't an _doc_count field. It's faster in those cases not to push. I expect we'll be pretty similar.

Begins adding support for running "tagged queries" to the compute
engine. Here, it's just the `LuceneSourceOperator` because that's
useful and contained.

Example time! Say you are running:
```
FROM foo
| STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000)
```

It's *often* faster to run this as four queries:
* The docs that round to `0`
* The docs that round to `100`
* The docs that round to `1000`
* The docs that round to `100000`

This creates an ESQL operator that can run these queries, one after the
other and attach those tags.

Aggs uses this trick and it's *way* faster when it can push down count
queries, but it's still faster when it pushes doc loading things. This
implementation in `LuceneSourceOperator` is quite similar to the doc
loading version in _search.

I don't have performance measurements yet because I haven't plugged this
into the language. In _search we call this `filter-by-filter` and enable
it when each group averages to more than 5000 documents and when there
isn't an `_doc_count` field. It's faster in those cases not to push. I
expect we'll be pretty similar.
@nik9000 nik9000 merged commit 0b29de3 into elastic:8.19 Jun 11, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant