Skip to content

Support partial results in ES|QL #121942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 15, 2025
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Feb 6, 2025

This change introduces partial results in ES|QL. To minimize the scope of the changes, this PR is just the first step toward full support for partial results.

@dnhatn dnhatn force-pushed the allow-partial-results branch 11 times, most recently from fe6619b to 3c28c6f Compare February 13, 2025 18:13
@dnhatn dnhatn changed the title WIP Allow partial results in ES|QL Feb 13, 2025
@dnhatn dnhatn force-pushed the allow-partial-results branch from 3c28c6f to 9af5872 Compare February 13, 2025 19:02
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn marked this pull request as ready for review February 13, 2025 21:34
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@dnhatn dnhatn changed the title Allow partial results in ES|QL Support partial results in ES|QL Feb 13, 2025
Copy link
Contributor

@quux00 quux00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Minor nits.

@@ -92,7 +92,8 @@ public ActionListener<List<DriverProfile>> getClusterRequestListener() {
return clusterRequestListener;
}

private CancellableTask createGroupTask(Task parentTask, Supplier<String> description) {
public static CancellableTask createGroupTask(TransportService transportService, Task parentTask, Supplier<String> description) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not pass in the TaskManager object rather than the TransportService to get the TaskManager. Seems like an unnecessary layer of indirection?

Copy link
Member Author

@dnhatn dnhatn Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need TransportService to get the local node id.

if (supportPartialResults(out.getTransportVersion())) {
out.writeBoolean(allowPartialResults);
} else if (allowPartialResults) {
throw new IllegalArgumentException("allow_partial_result is not supported in this version");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we want to do in this case? If I send a CCS request with allow partials set and one of my clusters is old and does not support this option, I think I'd rather prefer it to just work without allowing partials than fail immediately. I mean, this whole thing is aimed at allowing queries to work in presence of some problems, so I think it makes sense to be more lenient here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general principle in Elasticsearch is to reject queries that cannot be properly handled. Users expect queries with allow_partial_results=false to skip failures (e.g., missing shards) and return partial results. However, this won't be fulfilled if remote clusters or nodes in a mixed cluster are running older versions. Therefore, we should reject these queries. Users need to upgrade to leverage this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm not sure I got this right. In this case we're talking about allow_partial_results=true (since it's under if (allowPartialResults)) and this says "skip missing shards". It is true that on older clusters it's not possible, but isn't total query failure even worse? I would prefer an option to have at least some failures skipped to never being able to use it at all until all clusters everywhere (which may not even be under my control in CCS case, could be entirely different setup by other team) upgrade. I'd be in a situation that if I query even one single old cluster, even one that never fails, I am unable to use allow_partial_results=true for any of my own clusters, even upgraded ones. I think this is not the optimal result.

Copy link
Member Author

@dnhatn dnhatn Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, but this is how we introduce new features in Elasticsearch. We should reject requests with allow_partial_results=true that are explicitly set by users if the clusters are not ready to handle them properly.

However, I've allowed this serialization to pass through so we can switch the default for allow_partial_results to true in 8.19 and 9.1.

@dnhatn dnhatn requested review from smalyshev and quux00 February 14, 2025 21:26
@dnhatn
Copy link
Member Author

dnhatn commented Feb 14, 2025

@quux00 @smalyshev Thanks for reviewing. I think it's ready again. Can you take another look?

Copy link
Contributor

@smalyshev smalyshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I am approving it but I think we need to have further discussion on older cluster versions, maybe ask PM opinion for that, because I feel failing the partials=true query just for the presence of the old cluster is wrong (unless it leads to some technical breakage which I don't see now of course).

@dnhatn
Copy link
Member Author

dnhatn commented Feb 15, 2025

@smalyshev @quux00 Thanks for reviewing.

@dnhatn dnhatn merged commit b07ba89 into elastic:main Feb 15, 2025
17 checks passed
@dnhatn dnhatn deleted the allow-partial-results branch February 15, 2025 06:57
dnhatn added a commit that referenced this pull request Feb 15, 2025
This change introduces partial results in ES|QL. To minimize the scope of the
changes, this PR is just the first step toward full support for partial
results. The following follow-up tasks are required:

- Support partial results across clusters

- Return shard-level failures (currently, we only return the `is_partial` flag)

- Add documentation

- Allow partial results during resolution
dnhatn added a commit that referenced this pull request Feb 20, 2025
A follow-up to #121942 that adds support for partial results in CCS in ES|QL.

Relates #121942
elasticsearchmachine pushed a commit that referenced this pull request Feb 20, 2025
A follow-up to #121942 that adds support for partial results in CCS in ES|QL.

Relates #121942
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants