Async search expiration to be independent from external signals #126833
Labels
>bug
priority:high
A label for assessing bug priority to be used by ES engineers
:Search Foundations/Search
Catch all for Search Foundations
Team:Search Foundations
Meta label for the Search Foundations team in Elasticsearch
Submit async search allows users to provide a
keep_alive
. Get async search and get async status response allow users to update thekeep_alive
while retrieving incremental results or status for a specific async search.When an async search is expired, its corresponding running tasks should get cancelled in order to stop doing useless work and release resources for other non expired searches. The current cancellation mechanism is based on listener callbacks, that actively check for cancellation (and cancel the task when needed) whenever new shard results come in to the coordinating node, or a partial reduction happens. This is not ideal in that if all shards take a long time coming back to the coord node, the search is likely to expire and the coord node is unlikely to cancel it promptly.
There is also a discrepancy in that get async search includes a check for cancellation as well, while get async status does not.
In a cross-cluster scenario, this gets worse in that when minimizing roundtrips, each cluster only comes back with its full results, and there aren't frequent enough callbacks that we can leverage to check for cancellation and cancel expired tasks.
We should redesign the cancellation mechanism for async search to not depend on external signals: when a new async search starts, submit a runnable that cancels it at its expiration. When a keep_alive gets extended, cancel previously scheduled runnable and schedule a new one. This will result in cancellation that is independent from the progress made by the search as well as what API the user call to poll for status.
The text was updated successfully, but these errors were encountered: