-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Support partial results in CCS in ES|QL #122708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2c5e548
to
ba177e9
Compare
60e6cb6
to
5420891
Compare
5420891
to
b88b208
Compare
Hi @dnhatn, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, Nhat. Really nice work. Left nits and questions.
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeComputeHandler.java
Outdated
Show resolved
Hide resolved
...sterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithPartialResultsIT.java
Outdated
Show resolved
Hide resolved
...sterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithPartialResultsIT.java
Outdated
Show resolved
Hide resolved
executionInfo.isSkipUnavailable(clusterAlias) == false, | ||
() -> {}, | ||
failFast, | ||
pagesFetched::incrementAndGet, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're only using pagesFetched
as a boolean flag, and in fact only every use onPageFetched
to essentially trigger this flag. Is deeper usage planned in the future, or if not maybe it can be simplified somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a boolean is good enough. I pushed 9283823. Thanks!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ClusterComputeHandler.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ClusterComputeHandler.java
Outdated
Show resolved
Hide resolved
@@ -211,7 +214,8 @@ public void execute( | |||
computeListener.acquireCompute().delegateFailure((l, profiles) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, here I am not 100% sure of the intended logic - for a remote, it looks like if configuration.allowPartialResults()
is true and any result is delivered, any error is converted to partial. Is that also the case for local results? If yes, I don't see what checks it in the code. If no, then it looks like local and remote behaves differently with regard to partial failure logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the search_shards can fail when a node shuts down, and I've pushed d3be47f to handle this case. We also need to handle partial results for the final pipeline, but that's for a follow-up.
assertThat(id, is(in(allIds))); | ||
} | ||
if (request.includeCCSMetadata()) { | ||
EsqlExecutionInfo.Cluster localInfo = resp.getExecutionInfo().getCluster(LOCAL_CLUSTER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably should be a common method, but we can refactor it in a later patch.
...sterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterQueryWithPartialResultsIT.java
Outdated
Show resolved
Hide resolved
assertThat(localInfo.getStatus(), equalTo(EsqlExecutionInfo.Cluster.Status.SUCCESSFUL)); | ||
|
||
EsqlExecutionInfo.Cluster remoteInfo = resp.getExecutionInfo().getCluster(REMOTE_CLUSTER_1); | ||
assertThat(remoteInfo.getStatus(), equalTo(EsqlExecutionInfo.Cluster.Status.PARTIAL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: if we fail to open an exchange, we couldn't have received any data from the remote. Doesn't it mean it should be SKIPPED, not PARTIAL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was deciding between FAILED and PARTIAL and chose PARTIAL for simplicity. We can revisit it later when writing the documentation for this.
} | ||
} | ||
|
||
public void testFailToStartRequestOnRemoteCluster() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to supercede testCloseSkipUnavailable
in CrossClusterCancellationIT
so we may not need that one anymore.
@smalyshev @quux00 Thanks for reviewing. |
💚 Backport successful
|
A follow-up to #121942 that adds support for partial results in CCS in ES|QL.
Relates #121942