ESQL: Compute engine support for tagged queries (#128521) #128638

nik9000 · 2025-05-29T19:49:35Z

Begins adding support for running "tagged queries" to the compute engine. Here, it's just the LuceneSourceOperator because that's useful and contained.

Example time! Say you are running:

FROM foo
| STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000)

It's often faster to run this as four queries:

The docs that round to 0
The docs that round to 100
The docs that round to 1000
The docs that round to 100000

This creates an ESQL operator that can run these queries, one after the other and attach those tags.

Aggs uses this trick and it's way faster when it can push down count queries, but it's still faster when it pushes doc loading things. This implementation in LuceneSourceOperator is quite similar to the doc loading version in _search.

I don't have performance measurements yet because I haven't plugged this into the language. In _search we call this filter-by-filter and enable it when each group averages to more than 5000 documents and when there isn't an _doc_count field. It's faster in those cases not to push. I expect we'll be pretty similar.

Begins adding support for running "tagged queries" to the compute engine. Here, it's just the `LuceneSourceOperator` because that's useful and contained. Example time! Say you are running: ``` FROM foo | STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000) ``` It's *often* faster to run this as four queries: * The docs that round to `0` * The docs that round to `100` * The docs that round to `1000` * The docs that round to `100000` This creates an ESQL operator that can run these queries, one after the other and attach those tags. Aggs uses this trick and it's *way* faster when it can push down count queries, but it's still faster when it pushes doc loading things. This implementation in `LuceneSourceOperator` is quite similar to the doc loading version in _search. I don't have performance measurements yet because I haven't plugged this into the language. In _search we call this `filter-by-filter` and enable it when each group averages to more than 5000 documents and when there isn't an `_doc_count` field. It's faster in those cases not to push. I expect we'll be pretty similar.

nik9000 added backport :Analytics/ES|QL AKA ESQL v8.19.0 labels May 29, 2025

nik9000 mentioned this pull request May 29, 2025

ESQL: Compute engine support for tagged queries #128521

Merged

nik9000 added 3 commits May 29, 2025 16:06

Merge branch '8.19' into load_many_8_19

9282f6e

Merge branch '8.19' into load_many_8_19

b9bee77

Merge branch '8.19' into load_many_8_19

c292379

nik9000 merged commit 0b29de3 into elastic:8.19 Jun 11, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ESQL: Compute engine support for tagged queries (#128521) #128638

ESQL: Compute engine support for tagged queries (#128521) #128638

Uh oh!

nik9000 commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

ESQL: Compute engine support for tagged queries (#128521) #128638

ESQL: Compute engine support for tagged queries (#128521) #128638

Uh oh!

Conversation

nik9000 commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!