Move to Lucene 9.12's new PostingsFormat. #115021
Labels
blocker
stateful
Marking issues only relevant for stateful releases
:StorageEngine/Codec
Team:StorageEngine
A while back, Lucene changed the way that it encodes doc IDs from PFOR-delta to FOR-delta, which is a bit faster but less space-efficient. In order to avoid introducing space-efficiency regressions (especially on dense postings lists, which are common on Logging datasets), @iverase moved Elasticsearch to a copy of the Lucene postings format that would still use PFOR-delta for compression. (#103601)
But Lucene 9.12 introduced a new postings format that has better skipping logic (in general). It would be nice to take advantage of it. I would suggest the following plan:
Lucene912PostingsFormat
but with a more space-efficient encoding of doc deltas. @dnhatn and I played with it earlier this year, there is room for significant improvement by storing exceptions (the P from PFOR stands for "patched") more efficiently and allowing more exceptions per block.ES812PostingsFormat
.ES812PostingsFormat
on new indexes.ES812PostingsFormat
to the test folder.The text was updated successfully, but these errors were encountered: