ES|QL: Collect profile information for FORK and fix race condition with markEndQuery #127328

ioanatia · 2025-04-24T14:01:01Z

fix for the following test failures:
#127326
#127063

We would expect that the final listener for the main coordinator plan is called after all the final listener for the sub plans have executed.
This is the final listener for the main coordinator plan:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

Lines 237 to 244 in 8f9c96b

    
           ComputeListener localListener = new ComputeListener( 
        
               transportService.getThreadPool(), 
        
               cancelQueryOnFailure, 
        
               finalListener.map(profiles -> { 
        
                   execInfo.markEndQuery(); 
        
                   return new Result(finalMainPlan.output(), collectedPages, profiles, execInfo); 
        
               }) 
        
           )

This is the final listener for the sub plans:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

Lines 360 to 364 in 8f9c96b

    
               listener.map(completionInfo -> { 
        
                   execInfo.markEndQuery();  // TODO: revisit this time recording model as part of INLINESTATS improvements 
        
                   return new Result(outputAttributes, collectedPages, completionInfo, execInfo); 
        
               }) 
        
           )

Currently the listeners for the sub plans can be called after we call the main plan one.
This can easily tested with a debugger by adding a Thread.sleep in the sub plan listener.
This caused issues with how we report took time (see #127063 (comment)).

Another issue that we currently have with FORK is that we don't collect proper profile information.

The root cause is that we don't use the same ComputeListener for both the main coordinator plan and sub plans.
We only use it for the main coordinator plan - see how we call localListener.acquireCompute().

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

Lines 236 to 247 in 8f9c96b

    
           try ( 
        
               ComputeListener localListener = new ComputeListener( 
        
                   transportService.getThreadPool(), 
        
                   cancelQueryOnFailure, 
        
                   finalListener.map(profiles -> { 
        
                       execInfo.markEndQuery(); 
        
                       return new Result(finalMainPlan.output(), collectedPages, profiles, execInfo); 
        
                   }) 
        
               ) 
        
           ) { 
        
               runCompute(rootTask, computeContext, finalMainPlan, localListener.acquireCompute()); 
        
           }

To fix this I made sure that for each sub plan we also use localListener.acquireCompute().
The ComputeListener receives an action listener that will be called after each task that's created with #acquireCompute is finished:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeListener.java

Lines 18 to 39 in 49a9137

    
           /** 
        
            * A variant of {@link RefCountingListener} with the following differences: 
        
            * 1. Automatically cancels sub tasks on failure (via runOnTaskFailure) 
        
            * 2. Collects driver profiles from sub tasks. 
        
            * 3. Collects response headers from sub tasks, specifically warnings emitted during compute 
        
            * 4. Collects failures and returns the most appropriate exception to the caller. 
        
            */ 
        
           final class ComputeListener implements Releasable { 
        
               private final DriverCompletionInfo.AtomicAccumulator completionInfoAccumulator = new DriverCompletionInfo.AtomicAccumulator(); 
        
               private final EsqlRefCountingListener refs; 
        
               private final ResponseHeadersCollector responseHeaders; 
        
               private final Runnable runOnFailure; 
        
               ComputeListener(ThreadPool threadPool, Runnable runOnFailure, ActionListener<DriverCompletionInfo> delegate) { 
        
                   this.runOnFailure = runOnFailure; 
        
                   this.responseHeaders = new ResponseHeadersCollector(threadPool.getThreadContext()); 
        
                   // listener that executes after all the sub-listeners refs (created via acquireCompute) have completed 
        
                   this.refs = new EsqlRefCountingListener(delegate.delegateFailure((l, ignored) -> { 
        
                       responseHeaders.finish(); 
        
                       delegate.onResponse(completionInfoAccumulator.finish()); 
        
                   })); 
        
               }

This change also ensures that we are now collecting driver information from sub plans.
As a follow up I'd like to see if we can label the driver information from the profile response so it's more obvious to which sub plan it belongs to.

elasticsearchmachine · 2025-04-24T15:39:59Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

ChrisHegarty

LGTM

Collect main plan result only after all subplan listeners have completed

d812fb4

ioanatia added >non-issue Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/Search Catch all for Search Relevance labels Apr 24, 2025

elasticsearchmachine added the v9.1.0 label Apr 24, 2025

ioanatia requested review from ChrisHegarty, nik9000 and dnhatn April 24, 2025 15:39

ioanatia marked this pull request as ready for review April 24, 2025 15:39

nik9000 approved these changes Apr 24, 2025

View reviewed changes

ChrisHegarty approved these changes Apr 24, 2025

View reviewed changes

ioanatia merged commit db53722 into elastic:main Apr 24, 2025
17 checks passed

This was referenced Apr 28, 2025

ES|QL: Unmute fixed RRF and FORK tests #127455

Merged

[CI] EsqlSpecIT test {fork.ForkWithWhereSortDescAndLimit SYNC} failing #127326

Closed

[CI] EsqlSpecIT test {rrf.SimpleRrf ASYNC} failing #127063

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES|QL: Collect profile information for FORK and fix race condition with markEndQuery #127328

ES|QL: Collect profile information for FORK and fix race condition with markEndQuery #127328

ioanatia commented Apr 24, 2025 •

edited

Loading

elasticsearchmachine commented Apr 24, 2025

ChrisHegarty left a comment

	ComputeListener localListener = new ComputeListener(
	transportService.getThreadPool(),
	cancelQueryOnFailure,
	finalListener.map(profiles -> {
	execInfo.markEndQuery();
	return new Result(finalMainPlan.output(), collectedPages, profiles, execInfo);
	})
	)

	listener.map(completionInfo -> {
	execInfo.markEndQuery(); // TODO: revisit this time recording model as part of INLINESTATS improvements
	return new Result(outputAttributes, collectedPages, completionInfo, execInfo);
	})
	)

	try (
	ComputeListener localListener = new ComputeListener(
	transportService.getThreadPool(),
	cancelQueryOnFailure,
	finalListener.map(profiles -> {
	execInfo.markEndQuery();
	return new Result(finalMainPlan.output(), collectedPages, profiles, execInfo);
	})
	)
	) {
	runCompute(rootTask, computeContext, finalMainPlan, localListener.acquireCompute());
	}

	/**
	* A variant of {@link RefCountingListener} with the following differences:
	* 1. Automatically cancels sub tasks on failure (via runOnTaskFailure)
	* 2. Collects driver profiles from sub tasks.
	* 3. Collects response headers from sub tasks, specifically warnings emitted during compute
	* 4. Collects failures and returns the most appropriate exception to the caller.
	*/
	final class ComputeListener implements Releasable {
	private final DriverCompletionInfo.AtomicAccumulator completionInfoAccumulator = new DriverCompletionInfo.AtomicAccumulator();
	private final EsqlRefCountingListener refs;
	private final ResponseHeadersCollector responseHeaders;
	private final Runnable runOnFailure;

	ComputeListener(ThreadPool threadPool, Runnable runOnFailure, ActionListener<DriverCompletionInfo> delegate) {
	this.runOnFailure = runOnFailure;
	this.responseHeaders = new ResponseHeadersCollector(threadPool.getThreadContext());
	// listener that executes after all the sub-listeners refs (created via acquireCompute) have completed
	this.refs = new EsqlRefCountingListener(delegate.delegateFailure((l, ignored) -> {
	responseHeaders.finish();
	delegate.onResponse(completionInfoAccumulator.finish());
	}));
	}

ES|QL: Collect profile information for FORK and fix race condition with markEndQuery #127328

ES|QL: Collect profile information for FORK and fix race condition with markEndQuery #127328

Conversation

ioanatia commented Apr 24, 2025 • edited Loading

elasticsearchmachine commented Apr 24, 2025

ChrisHegarty left a comment

Choose a reason for hiding this comment

ioanatia commented Apr 24, 2025 •

edited

Loading