ESQL: Compute engine support for tagged queries #128521

nik9000 · 2025-05-27T15:47:33Z

Begins adding support for running "tagged queries" to the compute engine. Here, it's just the LuceneSourceOperator because that's useful and contained.

Example time! Say you are running:

FROM foo
| STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000)

It's often faster to run this as four queries:

The docs that round to 0
The docs that round to 100
The docs that round to 1000
The docs that round to 100000

This creates an ESQL operator that can run these queries, one after the other and attach those tags.

Aggs uses this trick and it's way faster when it can push down count queries, but it's still faster when it pushes doc loading things. This implementation in LuceneSourceOperator is quite similar to the doc loading version in _search.

I don't have performance measurements yet because I haven't plugged this into the language. In _search we call this filter-by-filter and enable it when each group averages to more than 5000 documents and when there isn't an _doc_count field. It's faster in those cases not to push. I expect we'll be pretty similar.

Begins adding support for running "tagged queries" to the compute engine. Here, it's just the `LuceneSourceOperator` because that's useful and contained. Example time! Say you are running: ``` FROM foo | STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000) ``` It's *often* faster to run this as four queries: * The docs that round to `0` * The docs that round to `100` * The docs that round to `1000` * The docs that round to `100000` This creates an ESQL operator that can run these queries, one after the other and attach those tags. Aggs uses this trick and it's *way* faster when it can push down count queries, but it's still faster when it pushes doc loading things. This implementation in `LuceneSourceOperator` is quite similar to the doc loading version in _search. I don't have performance measurements yet because I haven't plugged this into the language. In _search we call this `filter-by-filter` and enable it when each group averages to more than 5000 documents and when there isn't an `_doc_count` field. It's faster in those cases not to push. I expect we'll be pretty similar.

elasticsearchmachine · 2025-05-27T15:47:57Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000 · 2025-05-27T16:57:54Z

This should also work well for things like:

FROM foo
| STATS MAX(v) BY a > 10

With an extension to this PR that enables this behavior for MAX and COUNT and friends we could push really simple queries like the above all the way to lucene. The trick is to figure out exactly what that should look like from an execution standpoint. This PR was "easier" to model.

nik9000 · 2025-05-27T21:39:09Z

The test failure was quite real - it came from attempting to reuse the scorer from one slice with a different query. I'll push a fix.

nik9000 · 2025-05-28T13:18:49Z

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneOperator.java

-            if (currentScorer == null || currentScorer.leafReaderContext() != leaf) {
+            if (currentScorer == null // First time
+                || currentScorer.leafReaderContext() != leaf // Moved to a new leaf
+                || currentScorer.weight != currentSlice.weight() // Moved to a new query


This last bit of the if statement took most of a day to figure out I needed. But tests caught it.

nik9000 · 2025-05-28T13:20:32Z

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneOperator.java

+        /**
+         * Tags to add to the data returned by this query.
+         */
+        List<Object> tags() {


I'm not entirely sure Object is the right thing. It works, but we might want Supplier<Block> or something more specific. But for now this is good enough.

Yes, the object list can provide a better debugging message, but the block supplier might be better; otherwise, we would need to provide the exact boxed type for numeric values.

Yeah, the getting the boxed type perfect could be tricky. Suppliers are quite explicit. Let's keep it as is for now and rework when we find a rough edge.

dnhatn

I thought it was big, so I delayed the review until the end of my day, but it's just the first part. Sorry about that. LGTM! Thanks, Nik.

dnhatn · 2025-05-29T01:34:47Z

.../plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneCountOperator.java

@@ -121,6 +120,9 @@ protected Page getCheckedOutput() throws IOException {
            if (scorer == null) {
                remainingDocs = 0;
            } else {
+                if (scorer.tags().isEmpty() == false) {


I think we can leverage this and min/max later too.

nik9000 · 2025-05-29T16:41:42Z

I thought it was big, so I delayed the review until the end of my day, but it's just the first part. Sorry about that. LGTM! Thanks, Nik.

Right! I tried to do the next bit and it got big so I put that down.

elasticsearchmachine · 2025-05-29T16:43:13Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 128521

Begins adding support for running "tagged queries" to the compute engine. Here, it's just the `LuceneSourceOperator` because that's useful and contained. Example time! Say you are running: ``` FROM foo | STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000) ``` It's *often* faster to run this as four queries: * The docs that round to `0` * The docs that round to `100` * The docs that round to `1000` * The docs that round to `100000` This creates an ESQL operator that can run these queries, one after the other and attach those tags. Aggs uses this trick and it's *way* faster when it can push down count queries, but it's still faster when it pushes doc loading things. This implementation in `LuceneSourceOperator` is quite similar to the doc loading version in _search. I don't have performance measurements yet because I haven't plugged this into the language. In _search we call this `filter-by-filter` and enable it when each group averages to more than 5000 documents and when there isn't an `_doc_count` field. It's faster in those cases not to push. I expect we'll be pretty similar.

nik9000 · 2025-05-29T19:50:06Z

backport: #128638

nik9000 · 2025-06-03T14:50:28Z

Backported with #128638

Begins adding support for running "tagged queries" to the compute engine. Here, it's just the `LuceneSourceOperator` because that's useful and contained. Example time! Say you are running: ``` FROM foo | STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000) ``` It's *often* faster to run this as four queries: * The docs that round to `0` * The docs that round to `100` * The docs that round to `1000` * The docs that round to `100000` This creates an ESQL operator that can run these queries, one after the other and attach those tags. Aggs uses this trick and it's *way* faster when it can push down count queries, but it's still faster when it pushes doc loading things. This implementation in `LuceneSourceOperator` is quite similar to the doc loading version in _search. I don't have performance measurements yet because I haven't plugged this into the language. In _search we call this `filter-by-filter` and enable it when each group averages to more than 5000 documents and when there isn't an `_doc_count` field. It's faster in those cases not to push. I expect we'll be pretty similar.

nik9000 added >non-issue auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 v9.1.0 labels May 27, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 27, 2025

Merge branch 'main' into load_many

aaa002b

Debug

229e253

dnhatn self-requested a review May 27, 2025 22:11

nik9000 requested review from idegtiarenko and fang-xing-esql May 28, 2025 01:46

Rename and fix bug

fc142df

nik9000 commented May 28, 2025

View reviewed changes

nik9000 added 2 commits May 28, 2025 09:29

Explain

c048c17

Merge branch 'main' into load_many

303359d

dnhatn approved these changes May 29, 2025

View reviewed changes

nik9000 merged commit 1b151ed into elastic:main May 29, 2025
18 checks passed

elasticsearchmachine added the backport pending label May 29, 2025

nik9000 added backport pending and removed backport pending labels Jun 3, 2025

nik9000 removed the backport pending label Jun 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ESQL: Compute engine support for tagged queries #128521

ESQL: Compute engine support for tagged queries #128521

Uh oh!

nik9000 commented May 27, 2025

Uh oh!

elasticsearchmachine commented May 27, 2025

Uh oh!

nik9000 commented May 27, 2025

Uh oh!

nik9000 commented May 27, 2025

Uh oh!

nik9000 May 28, 2025

Uh oh!

nik9000 May 28, 2025

Uh oh!

dnhatn May 29, 2025

Uh oh!

nik9000 May 29, 2025

Uh oh!

dnhatn left a comment

Uh oh!

dnhatn May 29, 2025

Uh oh!

nik9000 May 29, 2025

Uh oh!

nik9000 commented May 29, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented May 29, 2025

Uh oh!

nik9000 commented May 29, 2025

Uh oh!

nik9000 commented Jun 3, 2025

Uh oh!

Uh oh!

ESQL: Compute engine support for tagged queries #128521

ESQL: Compute engine support for tagged queries #128521

Uh oh!

Conversation

nik9000 commented May 27, 2025

Uh oh!

elasticsearchmachine commented May 27, 2025

Uh oh!

nik9000 commented May 27, 2025

Uh oh!

nik9000 commented May 27, 2025

Uh oh!

nik9000 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn May 29, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 May 29, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn May 29, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 May 29, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 commented May 29, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented May 29, 2025

💔 Backport failed

Uh oh!

nik9000 commented May 29, 2025

Uh oh!

nik9000 commented Jun 3, 2025

Uh oh!

Uh oh!