-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ES|QL] Resolve groupings in aggregate before resolving references to groupings in the aggregations #127524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ES|QL] Resolve groupings in aggregate before resolving references to groupings in the aggregations #127524
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments otherwise LGTM
new ResolveRefs(), | ||
/* | ||
* ImplicitCasting must be before ResolveRefs. Because a reference is created for a Bucket in Aggregate's aggregates, | ||
* resolving this reference before implicit casting may cause this reference to have customMessage=true, it prevents further | ||
* attempts to resolve this reference. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- why does the order matter now vs before.
- the comment no longer hold true (looks like it was a harbinger of things to come).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that ImplicitCasting
was put before ResolveRefs
is exactly because that resolveAggregate
was attempting to resolve the references to the groupings, before the groupings are resolved, and it causes the references to the groupings marked unresolvable too early. This comment is outdated and removed. Thanks for the reminder!
boolean groupingResolved = Resolvables.resolved(groupings); | ||
int size = groupingResolved ? aggregates.size() : aggregates.size() - groupings.size(); | ||
for (int i = 0; i < aggregates.size(); i++) { | ||
NamedExpression ag = aggregates.get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use better naming than agg and ag (e.g. maybeResolvedAgg
).
if (groupingResolved) { | ||
if (maybeResolved != null) { | ||
changed.set(true); | ||
ne = maybeResolved; | ||
} | ||
} else { | ||
// An item in aggregations can reference to groupings explicitly, if groupings are not resolved yet and | ||
// maybeResolved is not resolved, return the original UnresolvedAttribute, so that it has a another chance | ||
// to get resolved in the next iteration. | ||
// For example STATS c = count(emp_no), x = d::int + 1 BY d = (date == "2025-01-01") | ||
if (maybeResolved instanceof Unresolvable == false) { | ||
changed.set(true); | ||
ne = maybeResolved; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybeResolveAttribute never returns null
Furthermore the if/else can be simplified, e.g.
if (groupingResolved || maybeResolved.resolved() == false) {
changed.set(true);
ne = maybeResolved;
}
newAggregates.add(agg); | ||
} else { | ||
newAggregates.add(ag); // Groupings are not resolved | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - instead of doing an if/else perform the condition only on updating agg and always update it similar to the previous code style:
var agg = ...
if (groupingResolved) {
agg =
}
newAggregates.add(agg);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fang-xing-esql ! Should we also add the csv test from the issue, your original reproducer?
from sample_data
| EVAL date = "2024-01-01"::datetime
| stats max = MAX(@timestamp) BY c = (date == "2024-01-01")
Thank you for reviewing @costin @alex-spies ! All your comments have been addressed. |
Resolve: #116781
There are references to
groupings
in theaggregations
in anAggregate
, trying to resolve the references to the groupings before the groupings are resolved lead to unexpected complexity to resolve anAggregate
, and it is error prone.This PR changes
resolveAggregate
inAnalyzer
to resolve the references after the groupings are resolved successfully, and it also moves theImplicitCasting
afterResolveRefs
, as the reason for havingImplicitCasting
beforeResolveRefs
does not exist anymore with the change inresolveAggregate
.