Skip to content

Cancel expired async search task when a remote returns its results #126583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

javanna
Copy link
Member

@javanna javanna commented Apr 10, 2025

A while ago we enabled using ccs_minimize_roundtrips in async search. This makes it possible for users of async search to send a single search request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks: as part of the execution, we detect that a task expired whenever each shard comes back with its results.

In a scenario where the coord node does not hold data, or only remote data is targeted by an async search, we have much less chance of detecting cancellation if roundtrips are minimized. The local coordinator would do nothing other than waiting for the minimized results from each remote cluster. One scenario where we can check for cancellation is when each cluster comes back with its full set of results. This commit adds such check, plus some testing for async search cancellation with minimized roundtrips.

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
@@ -519,6 +519,7 @@ public void onFinalReduce(List<SearchShard> shards, TotalHits totalHits, Interna
*/
@Override
public void onClusterResponseMinimizeRoundtrips(String clusterAlias, SearchResponse clusterResponse) {
checkCancellation();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the proposed change, all the rest in this PR is testing :)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Apr 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

latch = remoteQueryLatch.get();
} else if (searchContext.indexShard().shardId().getIndexName().equals("local")) {
startedLocalLatch.get().countDown();
latch = localQueryLatch.get();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By introducing a distinction between local and remote here, I can block local or remote query phase selectively.

assertNotNull(e.getCause());
Throwable t = ExceptionsHelper.unwrap(e, ResourceNotFoundException.class);
assertNotNull("after deletion, getAsyncSearch should throw an Exception with ResourceNotFoundException in the causal chain", t);
expectThrows(ResourceNotFoundException.class, () -> getAsyncSearch(response.getId()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switching from get to actionGet simplifies some of the expect throws that I added, plus this one that we already had.

@javanna javanna requested a review from smalyshev April 14, 2025 07:16
@javanna
Copy link
Member Author

javanna commented Apr 15, 2025

I opened #126833 for a longer term fix of this issue.

Copy link
Contributor

@quux00 quux00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice job rewriting the test framework for this!

@javanna javanna added the auto-backport Automatically create backport pull requests when merged label Apr 16, 2025
@javanna javanna merged commit df83e88 into elastic:main Apr 16, 2025
17 checks passed
@javanna javanna deleted the fix/cancel_task_async_search_minimize_roundtrips branch April 16, 2025 12:22
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.0
8.18
8.x
8.17 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126583

javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 16, 2025
…lastic#126583)

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 16, 2025
…lastic#126583)

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
elasticsearchmachine pushed a commit that referenced this pull request Apr 16, 2025
…126583) (#126915)

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 16, 2025
…lastic#126583)

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
elasticsearchmachine pushed a commit that referenced this pull request Apr 16, 2025
…126583) (#126931)

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
…lts (#126583) (#126917)

* Cancel expired async search task when a remote returns its results (#126583)

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.

* compile error
elasticsearchmachine pushed a commit that referenced this pull request Apr 17, 2025
…ults (#126583) (#126916)

* Cancel expired async search task when a remote returns its results (#126583)

A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.

* compile error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Search Foundations/CCS Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.17.6 v8.18.1 v8.19.0 v9.0.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants