Skip to content

ESQL: Fix count optimization with pushable union types #127225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alex-spies
Copy link
Contributor

Fix #127200

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies alex-spies force-pushed the fix-count-with-pushable-union-types branch from 8e4867a to c90cacf Compare April 25, 2025 16:26
@alex-spies alex-spies added the auto-backport Automatically create backport pull requests when merged label Apr 25, 2025
@alex-spies alex-spies marked this pull request as ready for review April 25, 2025 16:49
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Ensure that the code path with pushdown is actually covered.
Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I did not know the bug was about Lucene pushdown of STATS.

// This means that an EsRelation[field1, field2, field3] where field1 and field 3 are missing will be replaced by
// Project[field1, field2, field3] <- keeps the ordering intact
// \_Eval[field1 = null, field3 = null]
// \_EsRelation[field2]
// \_EsRelation[field1, field2, field3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field is missing, would it be in the EsRelation at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Because we're in the local optimizer, the field does exist overall, and thus is put into the EsRelation after the field caps call on the coordinator. But on the local node it's missing! (Or in the search stats that this optimization run uses, to be more precise) This optimizer rule applies exclusively to such fields.

@@ -94,15 +94,19 @@ private Tuple<List<Attribute>, List<EsStatsQueryExec.Stat>> pushableStats(
// check if regular field
else {
if (target instanceof FieldAttribute fa) {
var fName = fa.name();
var fName = fa.fieldName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Is this the fix? I imagine this could have impacts in many places, so could this fix other bugs we've not noticed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is the fix. Simple oversight to use the correct field name - in the past, the field attribute name and the field name were the same, but union types had to break with this pattern.

I do not see other situations that this may fix, too, because the specific stats-pushdown optimization only triggers in a narrow slice of queries, anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm wondering is if we could make this mistake again and if we could prevent it.
Also, should we insert filters here using #fieldName()?

if (field.foldable() == false && field instanceof FieldAttribute fa && stats.isIndexed(fa.name())) {

query = QueryBuilders.existsQuery(fieldName);
}
}
}
if (fieldName != null) {
if (count.hasFilter()) {
// Note: currently, it seems like we never actually perform stats pushdown if we reach here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the question of what exactly gets pushed down and why is subtle. I wonder if we want some documentation on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we touch this rule again, I think we should add unit tests that demonstrate exactly what is pushed down and how.

I also found that union type filters don't seem to be pushed to Lucene, yet - maybe we could improve the documentation as part of that work?

If you agree, I can open up an issue for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Note: currently, it seems like we never actually perform stats pushdown if we reach here.

We evolved to this. Before the aggs filter were extracted into an upstream filter and pushed down, this code was "alive".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep yep, and if we manage to evolve the optimization to multiple stats, this code could come alive again.

There's an argument to be made that it should be deleted as long as it's dead - but properly double checking that this code path really is never ever used atm was beyond the scope that I could allocate to this bug fix.

So leaving a comment was the next best thing for the time being, I think :)

@alex-spies alex-spies merged commit 9e0a5af into elastic:main Apr 28, 2025
17 checks passed
@alex-spies alex-spies deleted the fix-count-with-pushable-union-types branch April 28, 2025 11:50
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 28, 2025
When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
8.19
9.0
8.17

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 28, 2025
When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 28, 2025
When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
Comment on lines +69 to +70
// For any missing field, place an Eval right after the EsRelation to assign null values to that attribute (using the same name
// id!), thus avoiding that InsertFieldExtrations inserts a field extraction later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

query = QueryBuilders.existsQuery(fieldName);
}
}
}
if (fieldName != null) {
if (count.hasFilter()) {
// Note: currently, it seems like we never actually perform stats pushdown if we reach here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Note: currently, it seems like we never actually perform stats pushdown if we reach here.

We evolved to this. Before the aggs filter were extracted into an upstream filter and pushed down, this code was "alive".

@@ -94,15 +94,19 @@ private Tuple<List<Attribute>, List<EsStatsQueryExec.Stat>> pushableStats(
// check if regular field
else {
if (target instanceof FieldAttribute fa) {
var fName = fa.name();
var fName = fa.fieldName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm wondering is if we could make this mistake again and if we could prevent it.
Also, should we insert filters here using #fieldName()?

if (field.foldable() == false && field instanceof FieldAttribute fa && stats.isIndexed(fa.name())) {

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 28, 2025
When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
benchaplin pushed a commit to benchaplin/elasticsearch that referenced this pull request Apr 28, 2025
When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
elasticsearchmachine pushed a commit that referenced this pull request Apr 28, 2025
…7460)

When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
elasticsearchmachine pushed a commit that referenced this pull request Apr 28, 2025
…7462)

When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
@bpintea
Copy link
Contributor

bpintea commented Apr 28, 2025

Oh, too late.

Also, should we insert filters here using #fieldName()?

I will follow up on this.

elasticsearchmachine pushed a commit that referenced this pull request Apr 28, 2025
…7461)

When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
alex-spies added a commit that referenced this pull request Apr 29, 2025
…7459)

When pushing down STATS count(field::type) to Lucene for a union-typed field, use the correct field name in the Lucene query and not the synthetic attribute name $$field$converted_to$type.
@alex-spies
Copy link
Contributor Author

@bpintea , your comment #127225 (comment) is an excellent find. Dang, we missed another usage of FieldName#name where we really meant #fieldName.

Thank you for following up on this! Maybe let's also add a javadoc comment to NamedExpression#name, or more specifically FieldAttribute#name to warn against using the attribute name (which can be synthetic).

To remove the sharp edge, we could consider refactoring methods using the actual field name to use an EsField object (each FieldAttribute has one!) rather than a String to ensure, at compile time, that we never hand them a bogus name accidentally. More specifically, SearchStats' methods should maybe not use String everywhere, but maybe EsField or a dedicated String wrapper record that we can call FielName or so, just to leverage the compiler.

@alex-spies
Copy link
Contributor Author

@bpintea , your comment #127225 (comment) is an excellent find. Dang, we missed another usage of FieldName#name where we really meant #fieldName.

Thank you for following up on this! Maybe let's also add a javadoc comment to NamedExpression#name, or more specifically FieldAttribute#name to warn against using the attribute name (which can be synthetic).

To remove the sharp edge, we could consider refactoring methods using the actual field name to use an EsField object (each FieldAttribute has one!) rather than a String to ensure, at compile time, that we never hand them a bogus name accidentally. More specifically, SearchStats' methods should maybe not use String everywhere, but maybe EsField or a dedicated String wrapper record that we can call FielName or so, just to leverage the compiler.

I think it's worth tracking this in an issue. Opened #127521

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.6 v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ES|QL] Aggregation on conversion functions may return wrong results
5 participants