-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Support partial results in ES|QL #121942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support partial results in ES|QL #121942
Conversation
fe6619b
to
3c28c6f
Compare
3c28c6f
to
9af5872
Compare
9af5872
to
e2a3bf1
Compare
Hi @dnhatn, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good! Minor nits.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSender.java
Outdated
Show resolved
Hide resolved
@@ -92,7 +92,8 @@ public ActionListener<List<DriverProfile>> getClusterRequestListener() { | |||
return clusterRequestListener; | |||
} | |||
|
|||
private CancellableTask createGroupTask(Task parentTask, Supplier<String> description) { | |||
public static CancellableTask createGroupTask(TransportService transportService, Task parentTask, Supplier<String> description) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not pass in the TaskManager object rather than the TransportService to get the TaskManager. Seems like an unnecessary layer of indirection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need TransportService to get the local node id.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/Configuration.java
Outdated
Show resolved
Hide resolved
if (supportPartialResults(out.getTransportVersion())) { | ||
out.writeBoolean(allowPartialResults); | ||
} else if (allowPartialResults) { | ||
throw new IllegalArgumentException("allow_partial_result is not supported in this version"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what we want to do in this case? If I send a CCS request with allow partials set and one of my clusters is old and does not support this option, I think I'd rather prefer it to just work without allowing partials than fail immediately. I mean, this whole thing is aimed at allowing queries to work in presence of some problems, so I think it makes sense to be more lenient here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general principle in Elasticsearch is to reject queries that cannot be properly handled. Users expect queries with allow_partial_results=false
to skip failures (e.g., missing shards) and return partial results. However, this won't be fulfilled if remote clusters or nodes in a mixed cluster are running older versions. Therefore, we should reject these queries. Users need to upgrade to leverage this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I'm not sure I got this right. In this case we're talking about allow_partial_results=true
(since it's under if (allowPartialResults)
) and this says "skip missing shards". It is true that on older clusters it's not possible, but isn't total query failure even worse? I would prefer an option to have at least some failures skipped to never being able to use it at all until all clusters everywhere (which may not even be under my control in CCS case, could be entirely different setup by other team) upgrade. I'd be in a situation that if I query even one single old cluster, even one that never fails, I am unable to use allow_partial_results=true
for any of my own clusters, even upgraded ones. I think this is not the optimal result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point, but this is how we introduce new features in Elasticsearch. We should reject requests with allow_partial_results=true that are explicitly set by users if the clusters are not ready to handle them properly.
However, I've allowed this serialization to pass through so we can switch the default for allow_partial_results to true in 8.19 and 9.1.
...ql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/EsqlActionBreakerIT.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlExecutionInfo.java
Outdated
Show resolved
Hide resolved
@quux00 @smalyshev Thanks for reviewing. I think it's ready again. Can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I am approving it but I think we need to have further discussion on older cluster versions, maybe ask PM opinion for that, because I feel failing the partials=true query just for the presence of the old cluster is wrong (unless it leads to some technical breakage which I don't see now of course).
@smalyshev @quux00 Thanks for reviewing. |
This change introduces partial results in ES|QL. To minimize the scope of the changes, this PR is just the first step toward full support for partial results. The following follow-up tasks are required: - Support partial results across clusters - Return shard-level failures (currently, we only return the `is_partial` flag) - Add documentation - Allow partial results during resolution
This change introduces partial results in ES|QL. To minimize the scope of the changes, this PR is just the first step toward full support for partial results.