Skip to content

ESQL: Split grouping functions based on their EVAL-ability #126597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 11, 2025

Conversation

bpintea
Copy link
Contributor

@bpintea bpintea commented Apr 10, 2025

This splits the grouping functions in two: those that can be evaluated independently through the EVAL operator (BUCKET) and those that don't (like those that that are evaluated through an agg operator, CATEGORIZE).

Closes #124608

This splits the grouping functions in two groups: those that can be
evaluated independently through the EVAL operator (BUCKET) and those
that don't (like those that that are evaluated through an agg operator,
CATEGORIZE).
@bpintea bpintea added >refactoring auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 v9.0.1 labels Apr 10, 2025
@bpintea bpintea requested review from costin, ivancea and Copilot April 10, 2025 12:16
@elasticsearchmachine elasticsearchmachine added v9.1.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Apr 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • x-pack/plugin/esql/qa/testFixtures/src/main/resources/categorize.csv-spec: Language not supported
Comments suppressed due to low confidence (1)

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/grouping/Categorize.java:39

  • [nitpick] Switching Categorize to extend NonEvaluatableGroupingFunction clarifies its intended usage. Please ensure that both documentation and error messages consistently reflect this change.
public class Categorize extends GroupingFunction.NonEvaluatableGroupingFunction {

groupingChanged = true;
var fieldAs = new Alias(as.source(), as.name(), cat.field(), null, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested expression alias name shouldn't be the same as the grouping function alias name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

var fieldAs = new Alias(as.source(), as.name(), cat.field(), null, true);
var fieldAttr = fieldAs.toAttribute();
evals.add(fieldAs);
evalNames.put(fieldAs.name(), fieldAttr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed, as the expression provided to the grouping function can't show up stand-alone / aliased in the aggs.

AttributeMap.Builder<Expression> aliasesBuilder = AttributeMap.builder();
aggregate.forEachExpressionUp(Alias.class, a -> aliasesBuilder.put(a.toAttribute(), a.child()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iteration can be combined with the one below, so much more as they do complementary things on each iteration.

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this refactor!

@@ -45,4 +41,20 @@ public BiConsumer<LogicalPlan, Failures> postAnalysisPlanVerification() {
};
}

public abstract static non-sealed class NonEvaluatableGroupingFunction extends GroupingFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should this add a final foldable()=false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but not sure. We can't fold these functions through EvaluatorMapper / with EVAL's operator, but we might maybe with another agg operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. With the "evaluated outside of an aggregation" docs you added, it's also clear what this means specifically.

I still feel like a foldable()=true could lead to some rule doing something nasty, as "foldable()" doesn't specify "inside an eval" in any way, it's just foldable or not. But I can't think of a grouping causing trouble here, so maybe when it appears, we can do whatever we must

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel like a foldable()=true could lead to some rule doing something nasty

Indeed. The foldable() is false by default so one would have to actively switch it to true, which will indeed become immediately problematic.

groupingChanged = true;
var fieldAs = new Alias(as.source(), as.name(), cat.field(), null, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

GroupingFunction.NonEvaluatableGroupingFunction gf,
List<Alias> evals
) {
int[] counter = new int[] { 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this wrapped here (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, 🙏 .

List<Expression> newChildren = new ArrayList<>(gf.children().size());

for (Expression ex : gf.children()) {
if (ex instanceof Attribute == false) { // TODO: foldables shouldn't require eval'ing either
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this TODO about? Are NonEvaluatableGroupingFunction expected to be potentially foldable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't need to have to transform a ... BY CATEGORIZE("value") into EVAL ref = "value" | ... BY CATEGORIZE(ref)

Comment on lines 161 to 162
// TODO: rm me
// af.children().forEach(e -> e.forEachDown(AggregateFunction.class, unused -> foundNestedAggs.set(Boolean.TRUE)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@@ -45,4 +41,20 @@ public BiConsumer<LogicalPlan, Failures> postAnalysisPlanVerification() {
};
}

public abstract static non-sealed class NonEvaluatableGroupingFunction extends GroupingFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some javadocs here? Maybe mentioning that "it's (currently?) used for stateful grouping functions, which make them effectively non-evaluatable before the agg, so its value can't be used inside the aggs".

Even if it has some high level examples, I think it will be useful for whoever has to reuse them in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added some docs.

@bpintea bpintea removed the v9.0.1 label Apr 11, 2025
@bpintea
Copy link
Contributor Author

bpintea commented Apr 11, 2025

Thanks Ivan!

@bpintea bpintea merged commit 9784e0e into elastic:main Apr 11, 2025
17 checks passed
@bpintea bpintea deleted the refactor/grouping_funcs branch April 11, 2025 14:19
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126597

elasticsearchmachine pushed a commit that referenced this pull request Apr 11, 2025
…126715)

This splits the grouping functions in two: those that can be evaluated independently through the EVAL operator (`BUCKET`) and those that don't (like those that that are evaluated through an agg operator, `CATEGORIZE`).

Closes #124608

(cherry picked from commit 9784e0e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Encapsulate Categorize instanceof checks inside aggs & rules
3 participants