-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Tsdb doc values inline building jump table #126499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Build jump table (disi) when iterating over SortedNumericDocValues, instead of separately iterating over SortedNumericDocValues. In case when indexing sorting is active, this requires an additional merge sort. Follow up from elastic#125403
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java
Outdated
Show resolved
Hide resolved
The latest micro benchmark result on top of this PR.
Two new benchmark methods were added that test the sparse case. The dense benchmark methods were the same as before. The benchmark method changed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am good with this approach. Thanks Martijn!
encoder.encode(buffer, data); | ||
IndexOutput disiTempOutput = null; | ||
String skipListTempFileName = null; | ||
IndexedDISIBuilder docIdSetBuilder = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make IndexedDISIBuilder (or maybe Accumulator is a better name?) Closable, pass a Directory in its constructor, and pass the IndexOutput in the build (or flush) method to copy the temporary output to the data output? This way, we only need a single reference here, making the code more manageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really made the code more manageable! I also forked the original IndexedDISI tests and adapted that for DISIAccumulator
.
values = valuesProducer.getSortedNumeric(field); | ||
final int bitsPerOrd = maxOrd >= 0 ? PackedInts.bitsRequired(maxOrd - 1) : -1; | ||
if (enableOptimizedMerge && numDocsWithValue < maxDoc) { | ||
// TODO: which IOContext should be used here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use MERGE for this IOContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a MERGE
constant for io context. But I did the following instead: 45308e4
I think this way we always get most appropriate io context? In case of merge it will be an io context for merging?
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks Martijn!
this.denseRankPower = denseRankPower; | ||
|
||
this.origo = disiTempOutput.getFilePointer(); // All jumps are relative to the origo | ||
if ((denseRankPower < 7 || denseRankPower > 15) && denseRankPower != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Move this check above dir.createTempOutput to avoid leaks if the check fails.
Recent run on top of latest commit of this pr:
|
Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table. In case when indexing sorting is active, this requires an additional merge sort. Follow up from elastic#125403
💚 Backport successful
|
Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table. In case when indexing sorting is active, this requires an additional merge sort. Follow up from #125403
Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table.
In case when indexing sorting is active, this requires an additional sorting of segments while merging.
Follow up from #125403