Skip to content

ESQL: Preserve single aggregate when all attributes are pruned #126397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kanoshiou
Copy link
Contributor

In PruneColumns, Aggregate might be optimized to produce a LocalRelation output with an EmptyAttribute. However, the subsequent eval execution does not filter out this EmptyAttribute, which could lead to an additional null column with an empty name appearing in the result.

Closes #126392

@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.1.0 needs:triage Requires assignment of a team area label labels Apr 7, 2025
@kanoshiou kanoshiou marked this pull request as draft April 7, 2025 11:55
@kanoshiou
Copy link
Contributor Author

AVG appears to function differently compared to other stats functions. Since AVG applies a SubstituteSurrogates rule, it may result in the insertion of a Project, which prevents the issue from being reproducible using the query provided below.

ROW foo = 10
| STATS x = AVG(foo) 
| EVAL x = 13
// before
Limit[1000[INTEGER],false]
\_Eval[[13[INTEGER] AS x]]
  \_Aggregate[[],[AVG(foo{r}#812,true[BOOLEAN]) AS x]]
    \_Row[[10[INTEGER] AS foo]]

// after
Limit[1000[INTEGER],false]
\_Eval[[13[INTEGER] AS x]]
  \_Project[[x{r}#815]]
    \_Eval[[$$SUM$x$0{r$}#818 / $$COUNT$x$1{r$}#819 AS x]]
      \_Aggregate[[],[SUM(foo{r}#812,true[BOOLEAN]) AS $$SUM$x$0, COUNT(foo{r}#812,true[BOOLEAN]) AS $$COUNT$x$1]]
        \_Row[[10[INTEGER] AS foo]]

@kanoshiou kanoshiou marked this pull request as ready for review April 8, 2025 05:48
@bpintea bpintea added :Analytics/ES|QL AKA ESQL and removed external-contributor Pull request authored by a developer outside the Elasticsearch team labels Apr 15, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Apr 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@bpintea bpintea added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Apr 15, 2025
@fang-xing-esql fang-xing-esql self-assigned this Apr 15, 2025
@fang-xing-esql
Copy link
Member

buildkite test this

@fang-xing-esql
Copy link
Member

buildkite test this

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this bug @kanoshiou!

I wonder if you have considered the option to limit the scope of change inside the PruneColumns rule, if it is the rule that creates an EmptyAttribute that causes trouble in NamedExpressions.mergeOutputExpressions?

Changing NamedExpressions.mergeOutputExpressions to skip EmptyAttribute fixes the bug, however this method is kind of a low level API that is called by parser, the output() method of LogicalPlans and another rule which might be irrelevant to this bug, it looks to me that the scope of impact of this change might be quite larger than we had expected.

An alternative way that I can think of is limiting the change to the PruneColumns itself, when creating a LocalRelation, avoiding using EmptyAttribute, so that it won't cause NamedExpressions.mergeOutputExpressions to add a null field to the output.

@kanoshiou
Copy link
Contributor Author

Thank you for your review @fang-xing-esql !

Your concern about NamedExpressions.mergeOutputExpressions is right. Introducing such changes might lead to unexpected bugs in the future due to the wide scope of impact.

I've updated the branch. Could you please review it again?

@fang-xing-esql fang-xing-esql added the auto-backport Automatically create backport pull requests when merged label Apr 17, 2025
Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kanoshiou! LGTM

@@ -77,7 +77,7 @@ public LogicalPlan apply(LogicalPlan plan) {
if (aggregate.groupings().isEmpty()) {
p = new LocalRelation(
aggregate.source(),
List.of(new EmptyAttribute(aggregate.source())),
List.of(Expressions.attribute(aggregate.aggregates().getFirst())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about something like List.of(aggregate.output().get(0)), your way does a similar job too.

@fang-xing-esql
Copy link
Member

buildkite test this

@kanoshiou kanoshiou changed the title ESQL: Filter EmptyAttribute in merging output expressions ESQL: Preserve single aggregate when all attributes are pruned Apr 18, 2025
@fang-xing-esql
Copy link
Member

buildkite test this

@fang-xing-esql fang-xing-esql merged commit 65c350c into elastic:main Apr 21, 2025
19 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Apr 21, 2025
…126397) (#127126)

* ESQL: Preserve single aggregate when all attributes are pruned (#126397)

* Avoid using `EmptyAttribute`

* fix compile error

---------

Co-authored-by: kanoshiou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: EVAL after STATS produces an empty column
4 participants