LUCENE-10140: Non-heuristic intervals sub-matches caching #341
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move sub iterators to non-sub-match position inside match window, but CachingMatchesIterator logic relies on heuristic that any position inside matching interval is a sub-match.
For example: ORDERED("a", "b", "a") over "a b a" highlights (report sub-matches) only "a b a", and ORDERED("a", "b", "a", "b", "a") highlights only "a b a b a".
https://issues.apache.org/jira/browse/LUCENE-10140
Solution
Looks like there is no way to determine the right moment to cache from caching iterator perspective, so I propose to add an interface allowing minimizing IntervalIterators notify sub-sources positioned at sub-match positions.
There is a distinct pointcut for such notification: a slop calculation inside nextInterval.
Also, I think, MinimumShouldMatchIntervalsSource deserves some refactoring after this and I'm not sure that BLOCK is actually a minimizing interval source.
This patch resolving LUCENE-10075 (#270) because it removes extra endPosition() call after reaching last interval.
Tests
Added test for LUCENE-10075 and a new test for "a b a b a" highlighting.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.