Skip to content

ESQL: Retain aggregate when grouping #126598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kanoshiou
Copy link
Contributor

@kanoshiou kanoshiou commented Apr 10, 2025

This PR resolves #126026.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Apr 10, 2025
@limotova limotova added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Apr 16, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@luigidellaquila
Copy link
Contributor

buildkite test this

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou, I had a first look a the PR and I added a couple of comments

@@ -84,7 +86,8 @@ public LogicalPlan apply(LogicalPlan plan) {
);
} else {
// Aggs cannot produce pages with 0 columns, so retain one grouping.
remaining = List.of(Expressions.attribute(aggregate.groupings().get(0)));
Attribute attribute = Expressions.attribute(aggregate.groupings().getFirst());
remaining = aggregate.aggregates().stream().map(v -> new Alias(v.source(), v.name(), attribute)).toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why we are creating a list the same size as the original one?
Couldn't it just be

NamedExpression firstAggregate = aggregate.aggregates().getFirst();
remaining = List.of(new Alias(firstAggregate.source(), firstAggregate.name(), attribute));

It won't make a big difference at execution time, but a high number of attributes makes a difference at planning time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I did't think that far.

@luigidellaquila luigidellaquila self-assigned this Apr 18, 2025
@luigidellaquila
Copy link
Contributor

buildkite test this

@luigidellaquila
Copy link
Contributor

Apparently the CI is not happy, LogicalPlanOptimizerTests.testEmptyProjectInStatWithGroupAndEval needs to be fixed.

Also, probably we should preserve the NameID of the Alias, something like

remaining = List.of(new Alias(firstAggregate.source(), firstAggregate.name(), attribute, firstAggregate.id()));

It won't make a huge difference probably, as that alias is not further referenced in the plan, but at least we avoid to create a new NameID

@kanoshiou
Copy link
Contributor Author

Thank you for reviewing @luigidellaquila! Fix is on the way😊.

@luigidellaquila
Copy link
Contributor

buildkite test this

@luigidellaquila luigidellaquila added auto-backport Automatically create backport pull requests when merged v8.19.0 labels Apr 18, 2025
Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @kanoshiou!
It will only need a merge with main to resolve conflicts

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/eval.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
@kanoshiou
Copy link
Contributor Author

Thank you @luigidellaquila! Conflicts have now been resolved.

@luigidellaquila
Copy link
Contributor

buildkite test this

@kanoshiou
Copy link
Contributor Author

This is weird. I have iterated the test over 100 times and can't reproduce it.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou , very nice fix!

Comment on lines 680 to 682
| stats avg = avg(salary) by gender
| keep avg
| eval avg = 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicky suggestion: the added tests 1. keep one or several aggregates and then 2. override them. We could also add tests that keep groupings in step 1, and consider aliased groupings for good measure.

E.g.

from employees
| stats avg(salary) by g = gender
| keep g
| eval g = 12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the helpful suggestion. I hadn’t considered keeping the aliased groupings, but I’ll update those tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou ! I think a test case that doesn't keep any of the aggregates would be nice, too, like the one I suggested above - but this PR is already in a very good spot and can happily be merged as-is (once the small test regression is fixed that @luigidellaquila mentioned).

@luigidellaquila
Copy link
Contributor

buildkite test this

@luigidellaquila luigidellaquila enabled auto-merge (squash) April 23, 2025 07:53
@luigidellaquila
Copy link
Contributor

Thanks @kanoshiou, the PR is almost ready to merge; it just needs a little fix to a regression introduced in evalAfterAvgGroupingUsingSameName test with latest commit (wrong column name in the result, avg vs max)

@kanoshiou
Copy link
Contributor Author

wrong column name in the result, avg vs max

oops🥺
I don't why my test command unexpectedly skipped the tests requiring a capability, but now it has been fixed.

@luigidellaquila
Copy link
Contributor

sorry, needs another merge with main due to conflicts 😓

@kanoshiou
Copy link
Contributor Author

sorry, needs another merge with main due to conflicts 😓

No worries, let me handle it!😊

# Conflicts:
#	x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java
@luigidellaquila
Copy link
Contributor

buildkite test this

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou, LGTM

I'm merging it now

@luigidellaquila luigidellaquila merged commit 00654c8 into elastic:main Apr 23, 2025
18 of 19 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126598

@luigidellaquila
Copy link
Contributor

This will need a manual backport to 8.x branch.
I'm doing it now

@kanoshiou
Copy link
Contributor Author

Thank you @luigidellaquila @alex-spies !

luigidellaquila pushed a commit to luigidellaquila/elasticsearch that referenced this pull request Apr 23, 2025
@luigidellaquila
Copy link
Contributor

#127253

elasticsearchmachine pushed a commit that referenced this pull request Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >bug external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES|QL: columns not projected away despite KEEP
5 participants