Description
Submit async search allows users to provide a keep_alive
. Get async search and get async status response allow users to update the keep_alive
while retrieving incremental results or status for a specific async search.
When an async search is expired, its corresponding running tasks should get cancelled in order to stop doing useless work and release resources for other non expired searches. The current cancellation mechanism is based on listener callbacks, that actively check for cancellation (and cancel the task when needed) whenever new shard results come in to the coordinating node, or a partial reduction happens. This is not ideal in that if all shards take a long time coming back to the coord node, the search is likely to expire and the coord node is unlikely to cancel it promptly.
There is also a discrepancy in that get async search includes a check for cancellation as well, while get async status does not.
In a cross-cluster scenario, this gets worse in that when minimizing roundtrips, each cluster only comes back with its full results, and there aren't frequent enough callbacks that we can leverage to check for cancellation and cancel expired tasks.
We should redesign the cancellation mechanism for async search to not depend on external signals: when a new async search starts, submit a runnable that cancels it at its expiration. When a keep_alive gets extended, cancel previously scheduled runnable and schedule a new one. This will result in cancellation that is independent from the progress made by the search as well as what API the user call to poll for status.