Skip to content

ES|QL: Collect profile information for FORK and fix race condition with markEndQuery #127328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2025

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Apr 24, 2025

fix for the following test failures:
#127326
#127063

We would expect that the final listener for the main coordinator plan is called after all the final listener for the sub plans have executed.
This is the final listener for the main coordinator plan:

ComputeListener localListener = new ComputeListener(
transportService.getThreadPool(),
cancelQueryOnFailure,
finalListener.map(profiles -> {
execInfo.markEndQuery();
return new Result(finalMainPlan.output(), collectedPages, profiles, execInfo);
})
)

This is the final listener for the sub plans:

listener.map(completionInfo -> {
execInfo.markEndQuery(); // TODO: revisit this time recording model as part of INLINESTATS improvements
return new Result(outputAttributes, collectedPages, completionInfo, execInfo);
})
)

Currently the listeners for the sub plans can be called after we call the main plan one.
This can easily tested with a debugger by adding a Thread.sleep in the sub plan listener.
This caused issues with how we report took time (see #127063 (comment)).

Another issue that we currently have with FORK is that we don't collect proper profile information.

The root cause is that we don't use the same ComputeListener for both the main coordinator plan and sub plans.
We only use it for the main coordinator plan - see how we call localListener.acquireCompute().

try (
ComputeListener localListener = new ComputeListener(
transportService.getThreadPool(),
cancelQueryOnFailure,
finalListener.map(profiles -> {
execInfo.markEndQuery();
return new Result(finalMainPlan.output(), collectedPages, profiles, execInfo);
})
)
) {
runCompute(rootTask, computeContext, finalMainPlan, localListener.acquireCompute());
}

To fix this I made sure that for each sub plan we also use localListener.acquireCompute().
The ComputeListener receives an action listener that will be called after each task that's created with #acquireCompute is finished:

/**
* A variant of {@link RefCountingListener} with the following differences:
* 1. Automatically cancels sub tasks on failure (via runOnTaskFailure)
* 2. Collects driver profiles from sub tasks.
* 3. Collects response headers from sub tasks, specifically warnings emitted during compute
* 4. Collects failures and returns the most appropriate exception to the caller.
*/
final class ComputeListener implements Releasable {
private final DriverCompletionInfo.AtomicAccumulator completionInfoAccumulator = new DriverCompletionInfo.AtomicAccumulator();
private final EsqlRefCountingListener refs;
private final ResponseHeadersCollector responseHeaders;
private final Runnable runOnFailure;
ComputeListener(ThreadPool threadPool, Runnable runOnFailure, ActionListener<DriverCompletionInfo> delegate) {
this.runOnFailure = runOnFailure;
this.responseHeaders = new ResponseHeadersCollector(threadPool.getThreadContext());
// listener that executes after all the sub-listeners refs (created via acquireCompute) have completed
this.refs = new EsqlRefCountingListener(delegate.delegateFailure((l, ignored) -> {
responseHeaders.finish();
delegate.onResponse(completionInfoAccumulator.finish());
}));
}

This change also ensures that we are now collecting driver information from sub plans.
As a follow up I'd like to see if we can label the driver information from the profile response so it's more obvious to which sub plan it belongs to.

@ioanatia ioanatia added >non-issue Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/Search Catch all for Search Relevance labels Apr 24, 2025
@ioanatia ioanatia marked this pull request as ready for review April 24, 2025 15:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants