Skip to content

ESQL: Specialize aggs AddInput for each block type #127582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented Apr 30, 2025

Context

While benchmarking the "skipping unused groups on aggs" WIP improvement, we found this profile:
image
Note the big itable access taking 30% of the time.

It came from the multiple IntBlock method calls we do when passing the groups to the aggregators. For example:
image
There are multiple groups.<method>() there, within the positions loop, and once for each multi-value too.

This PR

As a general improvement, and as a step towards the original improvement too, I'm replacing that itable access in that loop with a pattern matching switch on each IntBlock sub-type, and a custom call/logic for each. This change fully removed the itable access in the original feature profiling.

The bad part, is that there will be some repeated code. The good part, is that most of it is auto-generated, and it should improve performance. On the worst cases, it should be identical.

Microbenchmarks

AggregatorBenchmark ran on longs, with sum, count and count_distinct as different cases, and with all different filters cases as they are relevant too.

Benchmark                    (blockType)        (filter)  (grouping)   (op)  Mode  Cnt   Score   Error  -> Score  Error Units
AggregatorBenchmark.run     vector_longs            none       longs  count  avgt    4   8.027   1.518     7.644  0.364 ns/op
AggregatorBenchmark.run     vector_longs   constant_true       longs  count  avgt    4  16.670   1.058    11.261  1.829 ns/op
AggregatorBenchmark.run     vector_longs        all_true       longs  count  avgt    4  27.077   3.382    14.829  2.199 ns/op
AggregatorBenchmark.run     vector_longs       half_true       longs  count  avgt    4  26.575   1.860    13.400  0.679 ns/op
AggregatorBenchmark.run     vector_longs  constant_false       longs  count  avgt    4  23.083   5.042     5.823  1.269 ns/op
AggregatorBenchmark.run  half_null_longs            none       longs  count  avgt    4  20.862   1.751    20.433  2.633 ns/op
AggregatorBenchmark.run  half_null_longs   constant_true       longs  count  avgt    4  48.153   5.325    19.425  0.896 ns/op
AggregatorBenchmark.run  half_null_longs        all_true       longs  count  avgt    4  54.481  14.401    27.827  3.408 ns/op
AggregatorBenchmark.run  half_null_longs       half_true       longs  count  avgt    4  56.866   1.582    30.780  0.310 ns/op
AggregatorBenchmark.run  half_null_longs  constant_false       longs  count  avgt    4  43.879   4.710    10.911  1.149 ns/op

Notes:

  • With no filters, there's no change, as this only affects the IntBlocks created by the HashBlocks, and only the filtered aggs create nulls there now
  • Filtered aggs, however, improved quite a lot, with times reduced by an average of 50% in those benchmarks

@ivancea ivancea added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.1.0 labels Apr 30, 2025
@ivancea ivancea requested a review from nik9000 April 30, 2025 17:32
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

Comment on lines +64 to +79
default void add(int positionOffset, IntBlock groupIds) {
switch (groupIds) {
case ConstantNullBlock ignored:
// No-op
break;
case IntVectorBlock b:
add(positionOffset, b.asVector());
break;
case IntArrayBlock b:
add(positionOffset, b);
break;
case IntBigArrayBlock b:
add(positionOffset, b);
break;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class and this method specifically are the core of the PR.

For this method, I would have liked to make it final, as to avoid overrides disabling the enhancement. But it's an interface, and making it a class is probably (maybe?) not worth it (Insert more question marks here)

As for the implementation, luckily, I could simply ignore the ConstantNullBlock impl, and delegate the IntVectorBlock one to the add(IntVector) existing method. So it looks better than I expected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make it static maybe, but that'd involve changing a lot of call sites. I'd probably change it for a follow up you can call "100% mechanical"

@ivancea ivancea marked this pull request as ready for review April 30, 2025 17:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@ivancea ivancea changed the title ESQL: Specialize ags AddInput for each block type ESQL: Specialize aggs AddInput for each block type Apr 30, 2025
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I know we have coverage with the testAllFiltered, testNoneFiltered, testSomeFiltered. Could you verify that we have coverage with the big array blocks? I imagine a world where you typo something in that branch and we don't catch it until we do some funky transform that lands us in the BigArray branch. Maybe doctor testSomeFiltered to land us in that branch.

One thing I was curious about is - is it worth having an optimal path for where there's just nulls and no multivalued inputs? That's going to be fairly common. I don't think it's a thing for now, but it might be a thing for later.

Comment on lines +64 to +79
default void add(int positionOffset, IntBlock groupIds) {
switch (groupIds) {
case ConstantNullBlock ignored:
// No-op
break;
case IntVectorBlock b:
add(positionOffset, b.asVector());
break;
case IntArrayBlock b:
add(positionOffset, b);
break;
case IntBigArrayBlock b:
add(positionOffset, b);
break;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make it static maybe, but that'd involve changing a lot of call sites. I'd probably change it for a follow up you can call "100% mechanical"

}
end();
public void add(int positionOffset, IntArrayBlock groupIds) {
startAggEndHash();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nik9000
Copy link
Member

nik9000 commented May 1, 2025

I'm a bit surprised we see any performance difference on constant_true and constant_false. We should probably detect that case and short circuit out. I thought I did that. But maybe I didn't.

@ivancea
Copy link
Contributor Author

ivancea commented May 5, 2025

I'm a bit surprised we see any performance difference on constant_true and constant_false. We should probably detect that case and short circuit out.

@nik9000 Shortcircuit at which stage?

When the Filtered agg does its keepMask() on the groups (https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/FilteredGroupingAggregatorFunction.java#L61), when constant_false, it creates a ConstantNullBlock. As it's short-circuited in the AddInput in this PR, it's nearly instant as no aggs are called at all.
However, a constant_true needs to call the aggs as if there was no filter. So it takes more time. It looks ok to me at least

@nik9000
Copy link
Member

nik9000 commented May 5, 2025

Ok. Got it! It's fast - but we're still grouping which is why it's not fully instant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants