Apply TSDB jump table and offset construction optimizations to binary doc values #127278

jordan-powers · 2025-04-23T18:40:00Z

Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

Relates to #126111

…rges

elasticsearchmachine · 2025-04-23T18:40:25Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

…ry-fields

martijnvg

LGTM 👍

martijnvg · 2025-04-24T11:12:09Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+                    meta.writeLong(offset); // docsWithFieldOffset
+                    final short jumpTableEntryCount;
+                    if (disiAccumulator != null) {
+                        jumpTableEntryCount = disiAccumulator.build(data);


I think at the place disiAccumulator should always be not null? (if numDocsWithField is not -1 or equal to max doc and valueProducer support optimized merge)

True, I'll remove the check

martijnvg · 2025-04-24T11:13:34Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/XDocValuesConsumer.java

@@ -152,6 +153,102 @@ public long longValue() throws IOException {
        };
    }

+    /** Tracks state of one binary sub-reader that we are merging */
+    private static class BinaryDocValuesSub extends DocIDMerger.Sub {


This is copied from Lucene's DocValuesConsumer?

Yes, except that it returns an anonymous subclass of TsdbDocValuesProducer instead of EmptyDocValuesProducer so that it can support merge stats.

martijnvg · 2025-04-24T11:15:42Z

...er/src/test/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesFormatTests.java

                }

+                d.add(new BinaryDocValuesField("bytes_1", new BytesRef(tags[i % tags.length])));


maybe rename bytes_1 to tags_as_bytes?

…ry-fields

elasticsearchmachine · 2025-04-24T17:13:55Z

💚 Backport successful

Status	Branch	Result
✅	8.x

#127346) Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

jordan-powers added 4 commits April 23, 2025 10:40

Add binary field type to ES819 merge tests

d267af9

Add MergeStats support for binary doc values

4a23760

Apply address offset calculation optimization to binary doc values me…

78a56de

…rges

Use DISIAccumulator for binary doc values merges

fa2e1d5

jordan-powers added >non-issue auto-backport Automatically create backport pull requests when merged :StorageEngine/Codec v8.19.0 v9.1.0 labels Apr 23, 2025

jordan-powers requested a review from martijnvg April 23, 2025 18:40

jordan-powers self-assigned this Apr 23, 2025

elasticsearchmachine added the Team:StorageEngine label Apr 23, 2025

Remove extra whitespace

e850ec4

jordan-powers mentioned this pull request Apr 23, 2025

Optimize segment merging in the tsdb doc value codec #126111

Closed

5 tasks

Merge remote-tracking branch 'upstream/main' into optimize-merge-bina…

6e46aee

…ry-fields

martijnvg approved these changes Apr 24, 2025

View reviewed changes

jordan-powers added 3 commits April 24, 2025 08:54

Remove redundant null check

5ab5937

Rename bytes_1 to tags_as_bytes

f15a9d3

Merge remote-tracking branch 'upstream/main' into optimize-merge-bina…

aa84cb4

…ry-fields

jordan-powers enabled auto-merge (squash) April 24, 2025 16:27

jordan-powers merged commit 69c2eda into elastic:main Apr 24, 2025
15 of 17 checks passed

jordan-powers mentioned this pull request Apr 24, 2025

[8.x] Apply recent TSDB codec merge optimizations to binary values (#127278) #127346

Merged

elasticsearchmachine pushed a commit that referenced this pull request Apr 24, 2025

Apply recent TSDB codec merge optimizations to binary values (#127278) (

1d1e85d

#127346) Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply TSDB jump table and offset construction optimizations to binary doc values #127278

Apply TSDB jump table and offset construction optimizations to binary doc values #127278

jordan-powers commented Apr 23, 2025

elasticsearchmachine commented Apr 23, 2025

martijnvg left a comment

martijnvg Apr 24, 2025

jordan-powers Apr 24, 2025

martijnvg Apr 24, 2025

jordan-powers Apr 24, 2025

martijnvg Apr 24, 2025

elasticsearchmachine commented Apr 24, 2025

		}

		d.add(new BinaryDocValuesField("bytes_1", new BytesRef(tags[i % tags.length])));

Apply TSDB jump table and offset construction optimizations to binary doc values #127278

Apply TSDB jump table and offset construction optimizations to binary doc values #127278

Conversation

jordan-powers commented Apr 23, 2025

elasticsearchmachine commented Apr 23, 2025

martijnvg left a comment

Choose a reason for hiding this comment

martijnvg Apr 24, 2025

Choose a reason for hiding this comment

jordan-powers Apr 24, 2025

Choose a reason for hiding this comment

martijnvg Apr 24, 2025

Choose a reason for hiding this comment

jordan-powers Apr 24, 2025

Choose a reason for hiding this comment

martijnvg Apr 24, 2025

Choose a reason for hiding this comment

elasticsearchmachine commented Apr 24, 2025

💚 Backport successful