Skip to content

[ES|QL] Resolve groupings in aggregate before resolving references to groupings in the aggregations #127524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

fang-xing-esql
Copy link
Member

Resolve: #116781

There are references to groupings in the aggregations in an Aggregate, trying to resolve the references to the groupings before the groupings are resolved lead to unexpected complexity to resolve an Aggregate, and it is error prone.

This PR changes resolveAggregate in Analyzer to resolve the references after the groupings are resolved successfully, and it also moves the ImplicitCasting after ResolveRefs, as the reason for having ImplicitCasting before ResolveRefs does not exist anymore with the change in resolveAggregate.

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql fang-xing-esql marked this pull request as ready for review April 29, 2025 22:23
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments otherwise LGTM

Comment on lines 177 to 182
new ResolveRefs(),
/*
* ImplicitCasting must be before ResolveRefs. Because a reference is created for a Bucket in Aggregate's aggregates,
* resolving this reference before implicit casting may cause this reference to have customMessage=true, it prevents further
* attempts to resolve this reference.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. why does the order matter now vs before.
  2. the comment no longer hold true (looks like it was a harbinger of things to come).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that ImplicitCasting was put before ResolveRefs is exactly because that resolveAggregate was attempting to resolve the references to the groupings, before the groupings are resolved, and it causes the references to the groupings marked unresolvable too early. This comment is outdated and removed. Thanks for the reminder!

boolean groupingResolved = Resolvables.resolved(groupings);
int size = groupingResolved ? aggregates.size() : aggregates.size() - groupings.size();
for (int i = 0; i < aggregates.size(); i++) {
NamedExpression ag = aggregates.get(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use better naming than agg and ag (e.g. maybeResolvedAgg).

Comment on lines 600 to 614
if (groupingResolved) {
if (maybeResolved != null) {
changed.set(true);
ne = maybeResolved;
}
} else {
// An item in aggregations can reference to groupings explicitly, if groupings are not resolved yet and
// maybeResolved is not resolved, return the original UnresolvedAttribute, so that it has a another chance
// to get resolved in the next iteration.
// For example STATS c = count(emp_no), x = d::int + 1 BY d = (date == "2025-01-01")
if (maybeResolved instanceof Unresolvable == false) {
changed.set(true);
ne = maybeResolved;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybeResolveAttribute never returns null
Furthermore the if/else can be simplified, e.g.

if (groupingResolved || maybeResolved.resolved() == false) {
   changed.set(true);
  ne = maybeResolved;
}

Comment on lines 617 to 620
newAggregates.add(agg);
} else {
newAggregates.add(ag); // Groupings are not resolved
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - instead of doing an if/else perform the condition only on updating agg and always update it similar to the previous code style:

var agg = ...
if (groupingResolved) {
   agg = 
}

newAggregates.add(agg);

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fang-xing-esql ! Should we also add the csv test from the issue, your original reproducer?

from sample_data
| EVAL date = "2024-01-01"::datetime
| stats max = MAX(@timestamp) BY c = (date == "2024-01-01")

@fang-xing-esql
Copy link
Member Author

Thank you for reviewing @costin @alex-spies ! All your comments have been addressed.

@fang-xing-esql fang-xing-esql added the auto-backport Automatically create backport pull requests when merged label May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
4 participants