Skip to content

Remove empty results before merging #126770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 17, 2025

Conversation

javanna
Copy link
Member

@javanna javanna commented Apr 14, 2025

We addressed the empty top docs issue with #126385 specifically for scenarios where empty top docs don't go through the wire. Yet they may be serialized from data node back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS.

This commit expands the existing filtering of empty top docs to include also those that did go through serialization.

Closes #126742

We addressed the empty top docs issue with elastic#126385 specifically for scenarios where
empty top docs don't go through the wire. Yet they may be serialized from data node
back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS.

This commit expands the existing filtering of empty top docs to include also those that
did go through serialization.

Closes elastic#126742
@javanna javanna added >bug auto-backport Automatically create backport pull requests when merged :Search Foundations/Search Catch all for Search Foundations v8.19.0 v9.1.0 labels Apr 14, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Apr 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -150,11 +150,12 @@ static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) {
return topDocs;
} else if (topDocs instanceof TopFieldGroups firstTopDocs) {
final Sort sort = new Sort(firstTopDocs.fields);
final TopFieldGroups[] shardTopDocs = results.stream().filter(td -> td != Lucene.EMPTY_TOP_DOCS).toArray(TopFieldGroups[]::new);
assert results.stream().noneMatch(topDoc -> topDoc == Lucene.EMPTY_TOP_DOCS);
final TopFieldGroups[] shardTopDocs = results.toArray(TopFieldGroups[]::new);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear can you keep me honest here? the filtering broke the field collapsing tests, I removed it entirely here (see TopFieldGroups#merge). We basically can't deal with empty top docs in there, so we can only assert that that scenario does not present. Is that the case in practice?

Copy link
Member

@original-brownbear original-brownbear Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure we can do that, can't we deal with this case by simply returning null here like we did before if the resulting shardTopDocs after filtering are an empty array? (I think there's cases where this can legitimately be empty looking at org.elasticsearch.lucene.grouping.SinglePassGroupingCollector#getTopGroups for example?)

All of that said, I'm really starting to think I simply did a bit of a bad job here upstream. If we simply don't pass the empty top docs into this method ever (like we used to without data node side batching) then we would simply return null out of the box and behavior is unchanged. But for a quick fix, returning null should do the trick?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the null handling one more time, and it seems to work this time. I had to add a bunch of odd null checks, but at least we can now distinguish between proper empty top docs and our own placeholder.

@@ -722,7 +721,7 @@ private static final class QueryPerNodeState {

private static final QueryPhaseResultConsumer.MergeResult EMPTY_PARTIAL_MERGE_RESULT = new QueryPhaseResultConsumer.MergeResult(
List.of(),
Lucene.EMPTY_TOP_DOCS,
null,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am restoring using null, which is what we did before batched execution.

@@ -358,7 +364,7 @@ private MergeResult partialReduce(
}
}
// we have to merge here in the same way we collect on a shard
newTopDocs = topDocsList == null ? Lucene.EMPTY_TOP_DOCS : mergeTopDocs(topDocsList, topNSize, 0);
newTopDocs = topDocsList == null ? null : mergeTopDocs(topDocsList, topNSize, 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am restoring using null, which is what we did before batched execution.

@@ -140,24 +141,26 @@ static SortedTopDocs sortDocs(
}

static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) {
if (results.isEmpty()) {
List<TopDocs> topDocsList = results.stream().filter(Objects::nonNull).toList();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still unclear how null worked before without the filtering. Stuff breaks pretty badly if you let null go through here, but I don't understand how that did not happen before we introduced batched execution, given that we did allow null, but I guess we were never calling mergeTopDocs against null results, while we do so now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly we should just not let the nulls get here was kinda my point initially, I think the way this PR does it now should be fine let me take a look :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never added null to this list because we didn't even queue empty query shard responses for merging. This sort of changed now because suggest+profiling aren't (yet) batched since we have no existing partial merge logic for those.
We can clean this up in a follow-up I had planned which leaves this logic on the transport thread (discussed in the previous PR on this), for now I think doing the filtering down here is fine :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good thanks, hopefully this fixes the test failures once and for all at least.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM unless the BwC tests complain :) thanks!

@@ -384,7 +384,9 @@ public static void writeTotalHits(StreamOutput out, TotalHits totalHits) throws
* by shard for sorting purposes.
*/
public static void writeTopDocsIncludingShardIndex(StreamOutput out, TopDocs topDocs) throws IOException {
if (topDocs instanceof TopFieldGroups topFieldGroups) {
if (topDocs == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add BwC for this? I guess no need to if we merge to 8.19 right away since those tests didn't break in the first place and therefore we know that this branch is never taken :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming that we are good because we are behind a feature flag. There is no roll-out made with the feature enabled, hence no bw comp guarantees?

Copy link
Member

@original-brownbear original-brownbear Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test against 8.19-snapshot (and it being a snapshot, it has the flag flipped) and seems like we did indeed fail the run against it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh boy, I see. It seems like a bit of an artificial problem around testing that will go away once we backport the fix to 8.x, but I guess it may end up breaking the world in between. I wonder if it makes sense disabling batching temporarily before merging this on both main and 8.19. Otherwise we need a transport version indeed, which we can not remove later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can by the same logic drop the other batched execution version constant once you're done backporting. That's actually fewer PRs I think? You just add the constant here, then backport, then adjust the version usage in main and remove the existing batched exec version constant in one PR.
-> 3 steps vs. 2 turn it off Prs + 1 merge and then two more for flipping the feature on again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do so , thanks

mergedTopDocs = TopDocs.merge(sort, from, topN, shardTopDocs);
} else {
final TopDocs[] shardTopDocs = results.toArray(new TopDocs[numShards]);
final TopDocs[] shardTopDocs = topDocsList.toArray(new TopDocs[numShards]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NiT: here and in the other array creation spots, if we change the line anyways lets make it a clean and faster zero size (new Type[0]) initialization for all of them like we had it before at some point? :)

@@ -140,24 +141,26 @@ static SortedTopDocs sortDocs(
}

static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) {
if (results.isEmpty()) {
List<TopDocs> topDocsList = results.stream().filter(Objects::nonNull).toList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never added null to this list because we didn't even queue empty query shard responses for merging. This sort of changed now because suggest+profiling aren't (yet) batched since we have no existing partial merge logic for those.
We can clean this up in a follow-up I had planned which leaves this logic on the transport thread (discussed in the previous PR on this), for now I think doing the filtering down here is fine :)

@javanna
Copy link
Member Author

javanna commented Apr 17, 2025

@original-brownbear it seems like I have made it, finally. I am going to merge this.

@javanna javanna merged commit f274ab7 into elastic:main Apr 17, 2025
16 of 17 checks passed
@javanna javanna deleted the fix/merge_remove_empty_results branch April 17, 2025 15:36
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126770

elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
We addressed the empty top docs issue with #126385 specifically for scenarios where
empty top docs don't go through the wire. Yet they may be serialized from data node
back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS.

This commit expands the existing filtering of empty top docs to include also those that
did go through serialization.

Closes #126742
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 18, 2025
This commit forward ports the transport version added with elastic#126770 in 8.x,
and adjust the corresponding version conditionals.
javanna added a commit that referenced this pull request Apr 21, 2025
This commit forward ports the transport version added with #126770 in 8.x,
and adjust the corresponding version conditionals.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] GeoDistanceIT testDistanceSortingWithUnmappedField failing
3 participants