Skip to content

ESQL: Retain aggregate when grouping #126598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/126598.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 126598
summary: "ESQL: Retain aggregate when grouping"
area: ES|QL
type: bug
issues:
- 126026
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ public abstract class GenerativeRestTest extends ESRestTestCase {
"Plan \\[ProjectExec\\[\\[<no-fields>.* optimized incorrectly due to missing references", // https://github.com/elastic/elasticsearch/issues/125866
"only supports KEYWORD or TEXT values, found expression", // https://github.com/elastic/elasticsearch/issues/126017
"token recognition error at: '``", // https://github.com/elastic/elasticsearch/issues/125870
"Unknown column \\[.*\\]", // https://github.com/elastic/elasticsearch/issues/126026
"optimized incorrectly due to missing references", // https://github.com/elastic/elasticsearch/issues/116781
"No matches found for pattern", // https://github.com/elastic/elasticsearch/issues/126418
"The incoming YAML document exceeds the limit:" // still to investigate, but it seems to be specific to the test framework
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -636,4 +636,67 @@ row foo = "Bar" | where foo rlike "(?i)Ba.*"
foo:keyword
;

evalAfterAvgGroupingUsingSameName
required_capability: retain_aggregate_when_grouping
from employees
| stats avg = avg(salary) by gender
| keep avg
| eval avg = 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicky suggestion: the added tests 1. keep one or several aggregates and then 2. override them. We could also add tests that keep groupings in step 1, and consider aliased groupings for good measure.

E.g.

from employees
| stats avg(salary) by g = gender
| keep g
| eval g = 12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the helpful suggestion. I hadn’t considered keeping the aliased groupings, but I’ll update those tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou ! I think a test case that doesn't keep any of the aggregates would be nice, too, like the one I suggested above - but this PR is already in a very good spot and can happily be merged as-is (once the small test regression is fixed that @luigidellaquila mentioned).

;

avg:integer
12
12
12
;

evalAfterGroupingUsingSameName
required_capability: retain_aggregate_when_grouping
row foo = [10,11,9], bar = [1,2,3]
| mv_expand foo
| mv_expand bar
| stats this = max(foo) by bar
| keep this
| eval this = 12
;

this:integer
12
12
12
;

evalAfterGroupingUsingSameName2
required_capability: retain_aggregate_when_grouping
from employees
| stats count = count(emp_no) by gender, is_rehired
| keep count
| rename count as x
| keep x
| eval x = 12
;

x:integer
12
12
12
12
12
12
12
12
12
;

evalAfterGroupingUsingSameName3
required_capability: retain_aggregate_when_grouping
row foo = 10
| stats field_1 = max(foo), this = count(*), field_2 = max(foo) by foo
| rename this AS field_2
| keep field_1, field_2
| eval field_2 = 1, field_1 = 1
;

field_2:integer | field_1:integer
1 | 1
;
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,12 @@ public enum Cap {
*/
RENAME_SEQUENTIAL_PROCESSING,

/**
* Support for retain aggregate when grouping.
* See <a href="https://github.com/elastic/elasticsearch/issues/126026"> ES|QL: columns not projected away despite KEEP #126026 </a>
*/
RETAIN_AGGREGATE_WHEN_GROUPING,

/**
* Fix for union-types when some indexes are missing the required field. Done in #111932.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import org.elasticsearch.compute.data.Block;
import org.elasticsearch.compute.data.BlockUtils;
import org.elasticsearch.index.IndexMode;
import org.elasticsearch.xpack.esql.core.expression.Alias;
import org.elasticsearch.xpack.esql.core.expression.Attribute;
import org.elasticsearch.xpack.esql.core.expression.AttributeSet;
import org.elasticsearch.xpack.esql.core.expression.EmptyAttribute;
import org.elasticsearch.xpack.esql.core.expression.Expressions;
Expand Down Expand Up @@ -84,7 +86,11 @@ public LogicalPlan apply(LogicalPlan plan) {
);
} else {
// Aggs cannot produce pages with 0 columns, so retain one grouping.
remaining = List.of(Expressions.attribute(aggregate.groupings().get(0)));
Attribute attribute = Expressions.attribute(aggregate.groupings().getFirst());
NamedExpression firstAggregate = aggregate.aggregates().getFirst();
remaining = List.of(
new Alias(firstAggregate.source(), firstAggregate.name(), attribute, firstAggregate.id())
);
p = aggregate.with(aggregate.groupings(), remaining);
}
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ public void testEmptyProjectInStatWithGroupAndEval() {
var relation = as(filter.child(), EsRelation.class);

assertThat(Expressions.names(agg.groupings()), contains("emp_no"));
assertThat(Expressions.names(agg.aggregates()), contains("emp_no"));
assertThat(Expressions.names(agg.aggregates()), contains("c"));

var exprs = eval.fields();
assertThat(exprs.size(), equalTo(1));
Expand Down Expand Up @@ -2784,6 +2784,27 @@ private static List<String> orderNames(TopN topN) {
return topN.order().stream().map(o -> as(o.child(), NamedExpression.class).name()).toList();
}

/**
* Expects
* Eval[[2[INTEGER] AS x]]
* \_Limit[1000[INTEGER],false]
* \_Aggregate[[foo{r}#3],[foo{r}#3 AS x]]
* \_LocalRelation[[foo{r}#3],[IntVectorBlock[vector=ConstantIntVector[positions=1, value=1]]]]
*/
public void testEvalAfterGroupBy() {
var plan = optimizedPlan("""
ROW foo = 1
| STATS x = max(foo) by foo
| KEEP x
| EVAL x = 2
""");
var eval = as(plan, Eval.class);
var limit = as(eval.child(), Limit.class);
var aggregate = as(limit.child(), Aggregate.class);
var localRelation = as(aggregate.child(), LocalRelation.class);
assertThat(Expressions.names(eval.output()), contains("x"));
}

public void testCombineLimitWithOrderByThroughFilterAndEval() {
LogicalPlan plan = optimizedPlan("""
from test
Expand Down
Loading