Tsdb doc values inline building jump table #126499

martijnvg · 2025-04-09T05:56:45Z

Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table.

In case when indexing sorting is active, this requires an additional sorting of segments while merging.

Follow up from #125403

Build jump table (disi) when iterating over SortedNumericDocValues, instead of separately iterating over SortedNumericDocValues. In case when indexing sorting is active, this requires an additional merge sort. Follow up from elastic#125403

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

martijnvg · 2025-04-11T19:11:53Z

The latest micro benchmark result on top of this PR.

Benchmark                                                          (deltaTime)   (nDocs)  (seed)  Mode  Cnt      Score   Error  Units
TSDBDocValuesMergeBenchmark.forceMergeDenseWithOptimizedMerge             1000  20431204      42    ss       12322.078          ms/op
TSDBDocValuesMergeBenchmark.forceMergeDenseWithoutOptimizedMerge          1000  20431204      42    ss       18732.804          ms/op
TSDBDocValuesMergeBenchmark.forceMergeSparseWithOptimizedMerge            1000  20431204      42    ss       10461.210          ms/op
TSDBDocValuesMergeBenchmark.forceMergeSparseWithoutOptimizedMerge         1000  20431204      42    ss       13807.956          ms/op

Two new benchmark methods were added that test the sparse case. The dense benchmark methods were the same as before. The benchmark method changed to Mode.SingleShotTime, which makes more sense given that only one force merge is executed per benchmark method. In both sparse and dense case the optimized merge is faster.

dnhatn

I am good with this approach. Thanks Martijn!

dnhatn · 2025-04-15T19:06:43Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

-                                encoder.encode(buffer, data);
+        IndexOutput disiTempOutput = null;
+        String skipListTempFileName = null;
+        IndexedDISIBuilder docIdSetBuilder = null;


Should we make IndexedDISIBuilder (or maybe Accumulator is a better name?) Closable, pass a Directory in its constructor, and pass the IndexOutput in the build (or flush) method to copy the temporary output to the data output? This way, we only need a single reference here, making the code more manageable.

This really made the code more manageable! I also forked the original IndexedDISI tests and adapted that for DISIAccumulator.

dnhatn · 2025-04-15T19:07:27Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+                    values = valuesProducer.getSortedNumeric(field);
+                    final int bitsPerOrd = maxOrd >= 0 ? PackedInts.bitsRequired(maxOrd - 1) : -1;
+                    if (enableOptimizedMerge && numDocsWithValue < maxDoc) {
+                        // TODO: which IOContext should be used here?


I think we should use MERGE for this IOContext?

I don't see a MERGE constant for io context. But I did the following instead: 45308e4

I think this way we always get most appropriate io context? In case of merge it will be an io context for merging?

elasticsearchmachine · 2025-04-16T08:00:34Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

dnhatn

Looks great. Thanks Martijn!

dnhatn · 2025-04-17T05:47:24Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/DISIAccumulator.java

+        this.denseRankPower = denseRankPower;
+
+        this.origo = disiTempOutput.getFilePointer(); // All jumps are relative to the origo
+        if ((denseRankPower < 7 || denseRankPower > 15) && denseRankPower != -1) {


nit: Move this check above dir.createTempOutput to avoid leaks if the check fails.

martijnvg · 2025-04-17T08:41:14Z

Recent run on top of latest commit of this pr:

Benchmark                                                          (deltaTime)   (nDocs)  (seed)  Mode  Cnt      Score   Error  Units
TSDBDocValuesMergeBenchmark.forceMergeDenseWithOptimizedMerge             1000  20431204      42    ss        8507.824          ms/op
TSDBDocValuesMergeBenchmark.forceMergeDenseWithoutOptimizedMerge          1000  20431204      42    ss       12509.560          ms/op
TSDBDocValuesMergeBenchmark.forceMergeSparseWithOptimizedMerge            1000  20431204      42    ss       10777.702          ms/op
TSDBDocValuesMergeBenchmark.forceMergeSparseWithoutOptimizedMerge         1000  20431204      42    ss       17307.381          ms/op

Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table. In case when indexing sorting is active, this requires an additional merge sort. Follow up from elastic#125403

elasticsearchmachine · 2025-04-17T10:10:19Z

💚 Backport successful

Status	Branch	Result
✅	8.x

Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table. In case when indexing sorting is active, this requires an additional merge sort. Follow up from #125403

Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

#127346) Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

Tsdb doc values inline building jump table

946d793

Build jump table (disi) when iterating over SortedNumericDocValues, instead of separately iterating over SortedNumericDocValues. In case when indexing sorting is active, this requires an additional merge sort. Follow up from elastic#125403

martijnvg added the :StorageEngine/Codec label Apr 9, 2025

elasticsearchmachine added the v9.1.0 label Apr 9, 2025

martijnvg mentioned this pull request Apr 9, 2025

Optimize segment merging in the tsdb doc value codec #126111

Closed

5 tasks

martijnvg added 3 commits April 10, 2025 21:42

Merge remote-tracking branch 'es/main' into merge_tsdb_dv_disi

55b54cd

fix checkstyle

0c644a2

Merge remote-tracking branch 'es/main' into merge_tsdb_dv_disi

5547721

martijnvg commented Apr 11, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java Outdated Show resolved Hide resolved

martijnvg added 2 commits April 11, 2025 18:50

Merge remote-tracking branch 'es/main' into merge_tsdb_dv_disi

a814b76

fix benchmark

406202c

dnhatn self-requested a review April 11, 2025 20:53

dnhatn reviewed Apr 15, 2025

View reviewed changes

dnhatn self-requested a review April 15, 2025 19:09

martijnvg added 3 commits April 16, 2025 08:57

Merge remote-tracking branch 'es/main' into merge_tsdb_dv_disi

91bcfee

refactor

79dc490

added unit tests for DISIAccumulator

75972ff

martijnvg added >non-issue v8.19.0 labels Apr 16, 2025

martijnvg marked this pull request as ready for review April 16, 2025 08:00

elasticsearchmachine added the Team:StorageEngine label Apr 16, 2025

martijnvg and others added 2 commits April 16, 2025 10:11

Use IOContext from SegmentWriteState

45308e4

[CI] Auto commit changes from spotless

13ca444

martijnvg mentioned this pull request Apr 16, 2025

Coalesce getSortedNumeric calls for ES819 doc values merging #126732

Merged

fixed license header

2b7a323

dnhatn approved these changes Apr 17, 2025

View reviewed changes

martijnvg added 2 commits April 17, 2025 09:36

Merge remote-tracking branch 'es/main' into merge_tsdb_dv_disi

feb408d

iter

df2705d

martijnvg added the auto-backport Automatically create backport pull requests when merged label Apr 17, 2025

martijnvg enabled auto-merge (squash) April 17, 2025 07:45

martijnvg merged commit 0d41e9a into elastic:main Apr 17, 2025
16 of 17 checks passed

martijnvg mentioned this pull request Apr 17, 2025

[8.x] Tsdb doc values inline building jump table (#126499) #126985

Merged

jordan-powers mentioned this pull request Apr 23, 2025

Apply TSDB jump table and offset construction optimizations to binary doc values #127278

Merged

jordan-powers added a commit that referenced this pull request Apr 24, 2025

Apply recent TSDB codec merge optimizations to binary values (#127278)

69c2eda

Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

elasticsearchmachine pushed a commit that referenced this pull request Apr 24, 2025

Apply recent TSDB codec merge optimizations to binary values (#127278) (

1d1e85d

#127346) Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tsdb doc values inline building jump table #126499

Tsdb doc values inline building jump table #126499

Uh oh!

martijnvg commented Apr 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

martijnvg commented Apr 11, 2025 •

edited

Loading

Uh oh!

dnhatn left a comment

Uh oh!

dnhatn Apr 15, 2025

Uh oh!

martijnvg Apr 16, 2025

Uh oh!

dnhatn Apr 15, 2025

Uh oh!

martijnvg Apr 16, 2025

Uh oh!

elasticsearchmachine commented Apr 16, 2025

Uh oh!

dnhatn left a comment

Uh oh!

dnhatn Apr 17, 2025

Uh oh!

martijnvg commented Apr 17, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 17, 2025

Uh oh!

Uh oh!

Tsdb doc values inline building jump table #126499

Tsdb doc values inline building jump table #126499

Uh oh!

Conversation

martijnvg commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

martijnvg commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 16, 2025

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg commented Apr 17, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 17, 2025

💚 Backport successful

Uh oh!

Uh oh!

martijnvg commented Apr 9, 2025 •

edited

Loading

martijnvg commented Apr 11, 2025 •

edited

Loading