-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Split grouping functions based on their EVAL-ability #126597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This splits the grouping functions in two groups: those that can be evaluated independently through the EVAL operator (BUCKET) and those that don't (like those that that are evaluated through an agg operator, CATEGORIZE).
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec: Language not supported
Comments suppressed due to low confidence (1)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/grouping/Categorize.java:39
- [nitpick] Switching Categorize to extend NonEvaluatableGroupingFunction clarifies its intended usage. Please ensure that both documentation and error messages consistently reflect this change.
public class Categorize extends GroupingFunction.NonEvaluatableGroupingFunction {
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
groupingChanged = true; | ||
var fieldAs = new Alias(as.source(), as.name(), cat.field(), null, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested expression alias name shouldn't be the same as the grouping function alias name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
var fieldAs = new Alias(as.source(), as.name(), cat.field(), null, true); | ||
var fieldAttr = fieldAs.toAttribute(); | ||
evals.add(fieldAs); | ||
evalNames.put(fieldAs.name(), fieldAttr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed, as the expression provided to the grouping function can't show up stand-alone / aliased in the aggs.
AttributeMap.Builder<Expression> aliasesBuilder = AttributeMap.builder(); | ||
aggregate.forEachExpressionUp(Alias.class, a -> aliasesBuilder.put(a.toAttribute(), a.child())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This iteration can be combined with the one below, so much more as they do complementary things on each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for doing this refactor!
@@ -45,4 +41,20 @@ public BiConsumer<LogicalPlan, Failures> postAnalysisPlanVerification() { | |||
}; | |||
} | |||
|
|||
public abstract static non-sealed class NonEvaluatableGroupingFunction extends GroupingFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should this add a final foldable()=false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but not sure. We can't fold these functions through EvaluatorMapper / with EVAL's operator, but we might maybe with another agg operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. With the "evaluated outside of an aggregation" docs you added, it's also clear what this means specifically.
I still feel like a foldable()=true could lead to some rule doing something nasty, as "foldable()" doesn't specify "inside an eval" in any way, it's just foldable or not. But I can't think of a grouping causing trouble here, so maybe when it appears, we can do whatever we must
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel like a foldable()=true could lead to some rule doing something nasty
Indeed. The foldable()
is false
by default so one would have to actively switch it to true
, which will indeed become immediately problematic.
groupingChanged = true; | ||
var fieldAs = new Alias(as.source(), as.name(), cat.field(), null, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
GroupingFunction.NonEvaluatableGroupingFunction gf, | ||
List<Alias> evals | ||
) { | ||
int[] counter = new int[] { 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this wrapped here (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, 🙏 .
List<Expression> newChildren = new ArrayList<>(gf.children().size()); | ||
|
||
for (Expression ex : gf.children()) { | ||
if (ex instanceof Attribute == false) { // TODO: foldables shouldn't require eval'ing either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this TODO about? Are NonEvaluatableGroupingFunction
expected to be potentially foldable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't need to have to transform a ... BY CATEGORIZE("value")
into EVAL ref = "value" | ... BY CATEGORIZE(ref)
// TODO: rm me | ||
// af.children().forEach(e -> e.forEachDown(AggregateFunction.class, unused -> foundNestedAggs.set(Boolean.TRUE))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
@@ -45,4 +41,20 @@ public BiConsumer<LogicalPlan, Failures> postAnalysisPlanVerification() { | |||
}; | |||
} | |||
|
|||
public abstract static non-sealed class NonEvaluatableGroupingFunction extends GroupingFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add some javadocs here? Maybe mentioning that "it's (currently?) used for stateful grouping functions, which make them effectively non-evaluatable before the agg, so its value can't be used inside the aggs".
Even if it has some high level examples, I think it will be useful for whoever has to reuse them in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added some docs.
Thanks Ivan! |
💔 Backport failed
You can use sqren/backport to manually backport by running |
This splits the grouping functions in two: those that can be evaluated independently through the EVAL operator (
BUCKET
) and those that don't (like those that that are evaluated through an agg operator,CATEGORIZE
).Closes #124608