Push down field extraction to time-series source #127445

dnhatn · 2025-04-27T23:40:57Z

This change pushes down field extractions to the time-series source operator, providing these advantages:

Avoids building DocVector and its forward/backward maps.
Leverages the DocValues cache (i.e., blocks that are already decompressed/decoded) when loading values, which can be lost when reading blocks with the ValuesSourceReaderOperator.
Eliminates the need to rebuild blocks with backward mappings after reading values.

The following query against the TSDB track previously took 19 seconds but was reduced to 13 seconds with this change:

TS tsdb 
| STATS sum(rate(kubernetes.container.memory.pagefaults)) by bucket(@timestamp, 5minute)

Note that with this change:

TS tsdb 
| STATS sum(rate(kubernetes.container.memory.pagefaults)) by bucket(@timestamp, 5minute)

now performs as well as:

FROM tsdb 
| STATS sum(last_over_time(kubernetes.container.memory.pagefaults)) by bucket(@timestamp, 5minute)

when using the shard level data partitioning. This means the performance of the TS command is comparable to the FROM command, except that it does not yet support segment-level or doc-level concurrency. I will try to add support for segment-level concurrency, as document-level partitioning is not useful when iterating over documents in order.

elasticsearchmachine · 2025-04-28T15:08:56Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

dnhatn · 2025-04-28T15:10:15Z

...rch/xpack/esql/optimizer/rules/physical/local/PushDownFieldExtractionToTimeSeriesSource.java

+        if (plan.anyMatch(p -> p instanceof EsQueryExec q && q.indexMode() == IndexMode.TIME_SERIES) == false) {
+            return plan;
+        }
+        final List<FieldExtractExec> pushDownExtracts = new ArrayList<>();


This is the main change in planning, where field extractions are pushed down to the time-series source

dnhatn · 2025-04-28T15:13:05Z

...te/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorFactory.java

+        }
+    }
+
+    static final class ShardLevelFieldsReader implements Releasable {


Unfortunately, we cannot reuse the ValuesSourceReaderOperator and must duplicate some logic here.

...te/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorFactory.java

kkrik-es · 2025-04-28T15:51:14Z

...te/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorFactory.java

-                    timestampsBuilder = blockFactory.newLongVectorBuilder(Math.min(remainingDocs, maxPageSize));
-                    tsids = tsHashesBuilder.build();
+                    int blockIndex = 0;
+                    if (emitDocIds) {


if (docCollector != null) {

for consistency?

...te/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorFactory.java

...pute/src/test/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorTests.java

kkrik-es · 2025-04-28T17:49:19Z

...rch/xpack/esql/optimizer/rules/physical/local/PushDownFieldExtractionToTimeSeriesSource.java

+import java.util.Set;
+
+/**
+ * A rule that pushes down field extractions to occur before filter/limit/topN in the time-series source plan.


Don't we want filtering to happen before field extraction? Or combine them, at least? Maybe I misunderstood this comment..

For the query TS index | WHERE host = 'a' | STATS max(rate(counter)) BY host, bucket(1minute), we should extract only the host field (and tsid and timestamp) in the time-series source command. The counter field should be extracted later by the ValuesSourceReaderOperator because we don't know how many rows of will be filtered out.

Correct, what if we have

TS index | WHERE host = 'a' AND TRANGE(1hour) | STATS max(rate(counter)) BY host, bucket(1minute)

I'd think we want to push down the filters on host and @timestamp to Lucene, to run before extracting the fields. If this is the case, maybe clarify in the comment that filters on the extracted fields are still pushed down.

I expanded the javadoc: 164d788

kkrik-es · 2025-04-28T17:57:44Z

...te/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorFactory.java

            } else {
                sourceLoader = null;
            }
            this.storedFieldsSpec = storedFieldsSpec;
+            ;


Nit: remove.

sorry, removed 144f13d

kkrik-es

I'd be lying if I claimed I fully understand this change.. It'd be great if Mark can also take a look, or maybe Nik. That said, we'll be iterating on this code, I expect it to be extended and updated iteratively.

…traction

dnhatn · 2025-04-28T21:41:09Z

Thanks Kostas!

elasticsearchmachine added the v9.1.0 label Apr 27, 2025

dnhatn force-pushed the time-series-field-extraction branch from cda5378 to f68492b Compare April 27, 2025 23:55

dnhatn changed the title ~~Push field extraction to time-series source~~ Push down field extraction to time-series source Apr 27, 2025

dnhatn force-pushed the time-series-field-extraction branch 4 times, most recently from db193c6 to e48b3e2 Compare April 28, 2025 04:44

Push down field extraction to time-series source

aa331a5

dnhatn force-pushed the time-series-field-extraction branch from e48b3e2 to aa331a5 Compare April 28, 2025 15:05

dnhatn added :StorageEngine/TSDB You know, for Metrics >non-issue labels Apr 28, 2025

dnhatn marked this pull request as ready for review April 28, 2025 15:08

dnhatn requested a review from kkrik-es April 28, 2025 15:08

elasticsearchmachine added the Team:StorageEngine label Apr 28, 2025

dnhatn requested a review from not-napoleon April 28, 2025 15:08

dnhatn commented Apr 28, 2025

View reviewed changes

kkrik-es reviewed Apr 28, 2025

View reviewed changes

...te/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorFactory.java Show resolved Hide resolved

kkrik-es reviewed Apr 28, 2025

View reviewed changes

...te/src/main/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorFactory.java Show resolved Hide resolved

naming

7e510ae

kkrik-es reviewed Apr 28, 2025

View reviewed changes

...pute/src/test/java/org/elasticsearch/compute/lucene/TimeSeriesSortedSourceOperatorTests.java Outdated Show resolved Hide resolved

dnhatn added 3 commits April 28, 2025 10:46

consistency

845c2d6

merge with stored-fields from loaders

77af752

leftover

c034b2f

kkrik-es reviewed Apr 28, 2025

View reviewed changes

dnhatn requested a review from kkrik-es April 28, 2025 17:55

kkrik-es reviewed Apr 28, 2025

View reviewed changes

oops

144f13d

kkrik-es approved these changes Apr 28, 2025

View reviewed changes

dnhatn added 2 commits April 28, 2025 12:44

javadoc

164d788

Merge remote-tracking branch 'elastic/main' into time-series-field-ex…

558f5de

…traction

dnhatn merged commit d65f34d into elastic:main Apr 28, 2025
16 of 17 checks passed

dnhatn deleted the time-series-field-extraction branch April 28, 2025 21:41

dnhatn mentioned this pull request Apr 28, 2025

Speed up time-series aggregation #127444

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push down field extraction to time-series source #127445

Push down field extraction to time-series source #127445

dnhatn commented Apr 27, 2025 •

edited

Loading

elasticsearchmachine commented Apr 28, 2025

dnhatn Apr 28, 2025

dnhatn Apr 28, 2025

kkrik-es Apr 28, 2025

kkrik-es Apr 28, 2025

dnhatn Apr 28, 2025

kkrik-es Apr 28, 2025

dnhatn Apr 28, 2025

kkrik-es Apr 28, 2025

dnhatn Apr 28, 2025

kkrik-es left a comment

dnhatn commented Apr 28, 2025

Push down field extraction to time-series source #127445

Push down field extraction to time-series source #127445

Conversation

dnhatn commented Apr 27, 2025 • edited Loading

elasticsearchmachine commented Apr 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkrik-es left a comment

Choose a reason for hiding this comment

dnhatn commented Apr 28, 2025

dnhatn commented Apr 27, 2025 •

edited

Loading