Skip to content

Apply TSDB jump table and offset construction optimizations to binary doc values #127278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jordan-powers
Copy link
Contributor

Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

Relates to #126111

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

meta.writeLong(offset); // docsWithFieldOffset
final short jumpTableEntryCount;
if (disiAccumulator != null) {
jumpTableEntryCount = disiAccumulator.build(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the place disiAccumulator should always be not null? (if numDocsWithField is not -1 or equal to max doc and valueProducer support optimized merge)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll remove the check

@@ -152,6 +153,102 @@ public long longValue() throws IOException {
};
}

/** Tracks state of one binary sub-reader that we are merging */
private static class BinaryDocValuesSub extends DocIDMerger.Sub {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from Lucene's DocValuesConsumer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except that it returns an anonymous subclass of TsdbDocValuesProducer instead of EmptyDocValuesProducer so that it can support merge stats.

}

d.add(new BinaryDocValuesField("bytes_1", new BytesRef(tags[i % tags.length])));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename bytes_1 to tags_as_bytes?

@jordan-powers jordan-powers enabled auto-merge (squash) April 24, 2025 16:27
@jordan-powers jordan-powers merged commit 69c2eda into elastic:main Apr 24, 2025
15 of 17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Apr 24, 2025
#127346)

Applies the merge optimizations from #126499 and #126732 to binary field
types for the ES819 codec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants