-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Remove empty results before merging #126770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove empty results before merging #126770
Conversation
We addressed the empty top docs issue with elastic#126385 specifically for scenarios where empty top docs don't go through the wire. Yet they may be serialized from data node back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS. This commit expands the existing filtering of empty top docs to include also those that did go through serialization. Closes elastic#126742
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Hi @javanna, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -150,11 +150,12 @@ static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) { | |||
return topDocs; | |||
} else if (topDocs instanceof TopFieldGroups firstTopDocs) { | |||
final Sort sort = new Sort(firstTopDocs.fields); | |||
final TopFieldGroups[] shardTopDocs = results.stream().filter(td -> td != Lucene.EMPTY_TOP_DOCS).toArray(TopFieldGroups[]::new); | |||
assert results.stream().noneMatch(topDoc -> topDoc == Lucene.EMPTY_TOP_DOCS); | |||
final TopFieldGroups[] shardTopDocs = results.toArray(TopFieldGroups[]::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@original-brownbear can you keep me honest here? the filtering broke the field collapsing tests, I removed it entirely here (see TopFieldGroups#merge). We basically can't deal with empty top docs in there, so we can only assert that that scenario does not present. Is that the case in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure we can do that, can't we deal with this case by simply returning null
here like we did before if the resulting shardTopDocs
after filtering are an empty array? (I think there's cases where this can legitimately be empty looking at org.elasticsearch.lucene.grouping.SinglePassGroupingCollector#getTopGroups
for example?)
All of that said, I'm really starting to think I simply did a bit of a bad job here upstream. If we simply don't pass the empty top docs into this method ever (like we used to without data node side batching) then we would simply return null
out of the box and behavior is unchanged. But for a quick fix, returning null
should do the trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the null handling one more time, and it seems to work this time. I had to add a bunch of odd null checks, but at least we can now distinguish between proper empty top docs and our own placeholder.
@@ -722,7 +721,7 @@ private static final class QueryPerNodeState { | |||
|
|||
private static final QueryPhaseResultConsumer.MergeResult EMPTY_PARTIAL_MERGE_RESULT = new QueryPhaseResultConsumer.MergeResult( | |||
List.of(), | |||
Lucene.EMPTY_TOP_DOCS, | |||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am restoring using null, which is what we did before batched execution.
@@ -358,7 +364,7 @@ private MergeResult partialReduce( | |||
} | |||
} | |||
// we have to merge here in the same way we collect on a shard | |||
newTopDocs = topDocsList == null ? Lucene.EMPTY_TOP_DOCS : mergeTopDocs(topDocsList, topNSize, 0); | |||
newTopDocs = topDocsList == null ? null : mergeTopDocs(topDocsList, topNSize, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am restoring using null, which is what we did before batched execution.
@@ -140,24 +141,26 @@ static SortedTopDocs sortDocs( | |||
} | |||
|
|||
static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) { | |||
if (results.isEmpty()) { | |||
List<TopDocs> topDocsList = results.stream().filter(Objects::nonNull).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still unclear how null
worked before without the filtering. Stuff breaks pretty badly if you let null
go through here, but I don't understand how that did not happen before we introduced batched execution, given that we did allow null
, but I guess we were never calling mergeTopDocs against null results, while we do so now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly we should just not let the null
s get here was kinda my point initially, I think the way this PR does it now should be fine let me take a look :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never added null
to this list because we didn't even queue empty query shard responses for merging. This sort of changed now because suggest+profiling aren't (yet) batched since we have no existing partial merge logic for those.
We can clean this up in a follow-up I had planned which leaves this logic on the transport thread (discussed in the previous PR on this), for now I think doing the filtering down here is fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good thanks, hopefully this fixes the test failures once and for all at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM unless the BwC tests complain :) thanks!
@@ -384,7 +384,9 @@ public static void writeTotalHits(StreamOutput out, TotalHits totalHits) throws | |||
* by shard for sorting purposes. | |||
*/ | |||
public static void writeTopDocsIncludingShardIndex(StreamOutput out, TopDocs topDocs) throws IOException { | |||
if (topDocs instanceof TopFieldGroups topFieldGroups) { | |||
if (topDocs == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add BwC for this? I guess no need to if we merge to 8.19 right away since those tests didn't break in the first place and therefore we know that this branch is never taken :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming that we are good because we are behind a feature flag. There is no roll-out made with the feature enabled, hence no bw comp guarantees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We test against 8.19-snapshot (and it being a snapshot, it has the flag flipped) and seems like we did indeed fail the run against it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh boy, I see. It seems like a bit of an artificial problem around testing that will go away once we backport the fix to 8.x, but I guess it may end up breaking the world in between. I wonder if it makes sense disabling batching temporarily before merging this on both main and 8.19. Otherwise we need a transport version indeed, which we can not remove later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can by the same logic drop the other batched execution version constant once you're done backporting. That's actually fewer PRs I think? You just add the constant here, then backport, then adjust the version usage in main and remove the existing batched exec version constant in one PR.
-> 3 steps vs. 2 turn it off Prs + 1 merge and then two more for flipping the feature on again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do so , thanks
mergedTopDocs = TopDocs.merge(sort, from, topN, shardTopDocs); | ||
} else { | ||
final TopDocs[] shardTopDocs = results.toArray(new TopDocs[numShards]); | ||
final TopDocs[] shardTopDocs = topDocsList.toArray(new TopDocs[numShards]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NiT: here and in the other array creation spots, if we change the line anyways lets make it a clean and faster zero size (new Type[0]
) initialization for all of them like we had it before at some point? :)
@@ -140,24 +141,26 @@ static SortedTopDocs sortDocs( | |||
} | |||
|
|||
static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) { | |||
if (results.isEmpty()) { | |||
List<TopDocs> topDocsList = results.stream().filter(Objects::nonNull).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never added null
to this list because we didn't even queue empty query shard responses for merging. This sort of changed now because suggest+profiling aren't (yet) batched since we have no existing partial merge logic for those.
We can clean this up in a follow-up I had planned which leaves this logic on the transport thread (discussed in the previous PR on this), for now I think doing the filtering down here is fine :)
@original-brownbear it seems like I have made it, finally. I am going to merge this. |
💔 Backport failed
You can use sqren/backport to manually backport by running |
We addressed the empty top docs issue with #126385 specifically for scenarios where empty top docs don't go through the wire. Yet they may be serialized from data node back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS. This commit expands the existing filtering of empty top docs to include also those that did go through serialization. Closes #126742
This commit forward ports the transport version added with elastic#126770 in 8.x, and adjust the corresponding version conditionals.
This commit forward ports the transport version added with #126770 in 8.x, and adjust the corresponding version conditionals.
We addressed the empty top docs issue with #126385 specifically for scenarios where empty top docs don't go through the wire. Yet they may be serialized from data node back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS.
This commit expands the existing filtering of empty top docs to include also those that did go through serialization.
Closes #126742