Skip to content

ESQL: List/get query API #124832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 8, 2025
Merged

Conversation

GalLalouche
Copy link
Contributor

@GalLalouche GalLalouche commented Mar 13, 2025

This PR adds two new REST endpoints, for listing queries and getting information on a current query.

Changes from the API specified in the above issues:

  • The get API is pretty initial, as we don't have a way of fetching the memory used or number of rows processed.

List queries response:

GET /_query/queries
// returns for each of the running queries
// query_id, start_time, running_time, query

{ "queries" : {
 "abc:1234": {
  "node": "abc",
  "id": 1234,
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  },
 "abd:1729": {
  "node": "abd",
  "id":1729,
  "start_time_millis": 14585858823573,
  "running_time_nanos": 90231,
  "query": "FROM orders | LOOKUP country_code ON country"
  }
 } 
}

Get query response:

GET /_query/queries/abc:1234

{
  "id": 1234,
  "node": "abc",
  "start_time_millis": 14585858875292,
  "running_time_nanos": 762794,
  "query": "FROM logs* | STATS BY hostname"
  "coordinating_node": "oTUltX4IQMOUUVeiohTt8A"
  "data_nodes" : [ "DwrYwfytxthse49X4", "i5msnbUyWlpe86e7"]
}

@GalLalouche GalLalouche force-pushed the feature/list_query_api branch from 58ccb18 to be048dc Compare March 18, 2025 13:55
@GalLalouche GalLalouche force-pushed the feature/list_query_api branch from be048dc to 8ad4d6d Compare March 18, 2025 14:35
@GalLalouche GalLalouche requested review from costin and nik9000 March 18, 2025 16:12
@GalLalouche GalLalouche added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Mar 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @GalLalouche, I've created a changelog YAML for you.

@GalLalouche GalLalouche marked this pull request as ready for review March 19, 2025 09:33
@GalLalouche GalLalouche force-pushed the feature/list_query_api branch from 87d1040 to cb4ec47 Compare March 19, 2025 09:33
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)


@Override
public List<Route> routes() {
return List.of(new Route(GET, "/_query/queries/{id}"), new Route(GET, "/_query/queries"));
Copy link
Contributor

@idegtiarenko idegtiarenko Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repetition in the endpoint looks surprising to me. Why not /_queries or may be /_cluster/queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being we're looking to keep the ESQL APIs under the ESQL prefix(_query) and avoid potential clashes with other rest APIs.
The decision hasn't been finalized though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's sort this one out after this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this - we haven't finalized this API.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first PR.
Add a yaml integration rest test with that checks the output for each endpoint, including working and failing tasks

Moving forward, please create a meta PR for this for follow-up issues:

  • query cancellation
  • filtering (my doc didn't include any, let's use the task API ones by filtering over nodes and id)
  • have a subtask to integrate the user in
  • double check that X-Opaque-Id /trace.id information is sent over (and displayed if available) - could be used for filtering also
  • follow-up item for memory tracking
  • follow-up item for bytes / records read

The meta issue will better reflect the overall status including tasks that are blocking and incoming feedback and keep the code get merged in.


public class EsqlGetQueryAction extends ActionType<EsqlGetQueryResponse> {
public static final EsqlGetQueryAction INSTANCE = new EsqlGetQueryAction();
public static final String NAME = "cluster:monitor/xpack/esql/get_queries";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up, please reach out to the data store / security team to double check what type of security is needed (similar to cat/tasks API) both for listing and cancelling queries.
There might a need to add another permission or reuse an existing one.

Comment on lines 48 to 59
return id != null
? (channel -> client.execute(
EsqlGetQueryAction.INSTANCE,
new EsqlGetQueryRequest(new TaskId(id)),
new RestToXContentListener<>(channel)
))
: (channel -> client.execute(
EsqlListQueriesAction.INSTANCE,
new EsqlListQueriesRequest(),
new RestToXContentListener<>(channel)
));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to move the conditional when creating the parameters and have only one consumer creation:

        ActionType<?> action = id != null ? RestEsqlGetQueryAction.INSTANCE : RestEsqlListQueriesAction.INSTANCE;
        Object request = id != null ? new EsqlGetQueryRequest(new TaskId(id)) : new EsqlListQueriesRequest();

        return channel -> client.execute(
                action,
                request,
                new RestToXContentListener<>(channel)
        );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@Override
public List<Route> routes() {
return List.of(new Route(GET, "/_query/queries/{id}"), new Route(GET, "/_query/queries"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being we're looking to keep the ESQL APIs under the ESQL prefix(_query) and avoid potential clashes with other rest APIs.
The decision hasn't been finalized though.


import java.io.IOException;

public class EsqlListQueriesRequest extends ActionRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should rely on org.elasticsearch.transport.EmptyRequest instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that's a test class... is it used in production?

/**
* A type-agnostic way of comparing integer values, not caring if it's a long or an integer.
*/
public abstract sealed class IntegerMatcher<T> extends BaseMatcher<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe IntOrLongMatcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


public void testListApi_runningQuery_returnsQueriesObject() throws Exception {
bulkLoadTestData(1);
String query = fromIndex() + " | keep keyword, integer | where delay(10s) | limit 100 ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this one to the PausableFieldPlugin stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -55,7 +55,7 @@ public void ensureExchangesAreReleased() throws Exception {
ExchangeService exchangeService = esqlQueryAction.exchangeService();
assertBusy(() -> {
if (exchangeService.lifecycleState() == Lifecycle.State.STARTED) {
assertTrue("Leftover exchanges " + exchangeService + " on node " + node, exchangeService.isEmpty());
assertTrue("Leftover exchanges " + exchangeService + " on taskId " + node, exchangeService.isEmpty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is on the node, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. At some point I renamed a field called node and intellij was kind enough to replace all "node" words with "taskId" in the entire repo. I thought I caught all of them but I guess a few slipped through.

String id = null;
try (var initialResponse = sendAsyncQuery()) {
id = initialResponse.asyncExecutionId().get();
// FIXME(gal, NOCOMMIT) more copy paste
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but honest it's not worth the refactor in this case.

}
}

// FIXME(gal, NOCOMMIT) copy paste
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but honest it's not worth the refactor in this case.

return EsqlQueryRequestBuilder.newAsyncEsqlQueryRequestBuilder(client()).query(QUERY).execute().actionGet(60, TimeUnit.SECONDS);
}

// FIXME(gal, NOCOMMIT) copy pasted from another place
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover? I guess we can move this to RestTestCase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what I did :)


public record Query(org.elasticsearch.tasks.TaskId taskId, long startTimeMillis, long runningTimeNanos, String query)
implements
ToXContentObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a ToXContentFragment because it's not a valid value - it's a key and a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@Override
protected void doExecute(Task task, EsqlListQueriesRequest request, ActionListener<EsqlListQueriesResponse> listener) {
// FIXME(gal, NOCOMMIT) The + [a] hack needs a better solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can make the [a] dude a constant wherever it's used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@GalLalouche
Copy link
Contributor Author

Good first PR. Add a yaml integration rest test with that checks the output for each endpoint, including working and failing tasks

Moving forward, please create a meta PR for this for follow-up issues:

  • query cancellation
  • filtering (my doc didn't include any, let's use the task API ones by filtering over nodes and id)
  • have a subtask to integrate the user in
  • double check that X-Opaque-Id /trace.id information is sent over (and displayed if available) - could be used for filtering also
  • follow-up item for memory tracking
  • follow-up item for bytes / records read

The meta issue will better reflect the overall status including tasks that are blocking and incoming feedback and keep the code get merged in.

Done: #125662

@GalLalouche GalLalouche force-pushed the feature/list_query_api branch from 9a159a4 to 6686702 Compare March 27, 2025 18:33
@GalLalouche GalLalouche requested review from costin and nik9000 March 30, 2025 09:25
Copy link
Contributor

@jfreden jfreden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review request and sorry for the delay!

My understanding is that there are two new actions EsqlGetQueryAction and EsqlListQueriesAction that use the TransportListTasksAction under the hood and filters either on id or on the type of query (esql). From a least privilege principle point of view I think the existing monitor privilege is too wide, since we get more than we need with it. Adding a monitor_esql would probably make sense, in the same way we did for these features.

I think that would require a new esql origin and then register it here, to execute the actual TransportListTasksAction without the API caller having the monitor privilege.

WDYT?

@GalLalouche GalLalouche requested a review from a team as a code owner April 1, 2025 13:25
@GalLalouche
Copy link
Contributor Author

Thanks @jfreden! I've added the necessary privileges by following the other ones in the classes you mentioned. Could you PTAL?

new ListTasksRequestBuilder(nodeClient).setActions(
EsqlQueryAction.NAME,
EsqlQueryAction.NAME + AsyncTaskManagementService.ASYNC_ACTION_SUFFIX
).setDetailed(true).execute(new ActionListener<>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller only has monitor_esql I think this would fail because it doesn't have cluster:monitor/tasks/lists (or cluster:monitor*) that's required here. To work around that the executeAsyncWithOrigin can be used. It would be nice with an integration test that covers this. Maybe modify existing to only have the monitor_esql privilege?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I fixed it and added a couple of integration tests.

@jfreden
Copy link
Contributor

jfreden commented Apr 1, 2025

@@ -761,6 +764,36 @@ public void testFromLookupIndexForbidden() throws Exception {
assertThat(resp.getResponse().getStatusLine().getStatusCode(), equalTo(HttpStatus.SC_BAD_REQUEST));
}

public void testListQueryAllowed() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

nodeClient,
ESQL_ORIGIN,
TransportGetTaskAction.TYPE,
new GetTaskRequest().setTaskId(request.id()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to validate that this is for EsqlQueryAction or an async EsqlQueryAction? Otherwise you might be able to get any task here? Or maybe I'm missing additional validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not properly, since we would need said task to have started but not yet finished by the time we ask for it. There are other tests that check that it works, but they use a special hack to inject a semaphore in the middle of executing the query. IMO, combining both checks (semaphore + custom privileges) would be too much of a hassle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could check the type of the returned task a few lines down. So if you ask for a non-ESQL task it'd reject it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@GalLalouche GalLalouche requested a review from jfreden April 3, 2025 11:49
Copy link
Contributor

@jfreden jfreden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the suggested changes! LGTM!


@Override
public List<Route> routes() {
return List.of(new Route(GET, "/_query/queries/{id}"), new Route(GET, "/_query/queries"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's sort this one out after this PR.


@Override
public List<Route> routes() {
return List.of(new Route(GET, "/_query/queries/{id}"), new Route(GET, "/_query/queries"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this - we haven't finalized this API.

nodeClient,
ESQL_ORIGIN,
TransportGetTaskAction.TYPE,
new GetTaskRequest().setTaskId(request.id()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could check the type of the returned task a few lines down. So if you ask for a non-ESQL task it'd reject it.

@GalLalouche GalLalouche removed the request for review from costin April 8, 2025 17:35
@GalLalouche GalLalouche merged commit 953b9fb into elastic:main Apr 8, 2025
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: API for listing running queries
6 participants