Skip to content

Make OptimizerExpressionRule conditional #127500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Apr 29, 2025

This makes OptimizerExpressionRule conditional.
This should allow to skip expression traversal (as it might be quiet expensive, especially in case of multiple attributes) if we know beforehand that certain plan type is applicable for the rule or can not be changed by it.

According to QueryPlanningBenchmark.run, this change makes it twice faster to parse the query:

before        avgt   10  5.254 ± 0.079  ms/op
after         avgt   10  2.752 ± 0.085  ms/op

Closes: #124288

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. We need to double check all affected rules before merging, but this should work and I'm hoping for a nice perf boost on queries that have tons of field attributes in relations/projections (like FROM logs-*).

Comment on lines 69 to 70
case EsRelation esr -> false;
case Project p -> false;// this covers both keep and project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: IMO we should document in the javadoc that relation + projection are getting skipped per default and that one should override shouldVisit to change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I opened a draft at this point to see if this breaks a lot of tests (it did not 🎉 ) so now I will focus no documenting and testing it.

@@ -194,6 +193,12 @@ public <E extends T> T transformDown(Class<E> typeToken, Function<E, ? extends T
return transformDown((t) -> (typeToken.isInstance(t) ? rule.apply((E) t) : t));
}

@SuppressWarnings("unchecked")
public <E extends T> T transformDown(Predicate<Node<?>> tokenPredicate, Function<E, ? extends T> rule) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
public <E extends T> T transformDown(Predicate<Node<?>> tokenPredicate, Function<E, ? extends T> rule) {
public <E extends T> T transformDown(Predicate<Node<?>> nodePredicate, Function<E, ? extends T> rule) {

we don't have class tokens here.

@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Apr 30, 2025
@idegtiarenko idegtiarenko marked this pull request as ready for review April 30, 2025 14:12
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

/**
* Defines if a node should be visited or not.
* Allows to skip nodes that are not applicable for the rule even if they contain expressions.
* By default that skips FROM, LIMIT, PROJECT, KEEP and DROP but this list could be extended or replaced in subclasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with this list of commands/node types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are commands that are either common but have not expressions (limit) or common, contain list of attributes but do not require any optimization or rearranging (such as keep/project/drop) if properly constructed they only contain plain list of attribute used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context: @idegtiarenko found out that the simple FROM logs-* | LIMIT 10, which we want to make faster, seemed to spend significant amount of time just traversing plan nodes/expressions in case that the index pattern matched a ton of fields. Just not traversing the expressions of EsRelation every time, needlessly, should cut down on the planning/optimizing time for such queries.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be making a trap for ourselves in the future due to implicit filtering however the PR is contained (thanks for that), the tests are passing and the perf numbers look great.
So much so that backporting this to 8.19 makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the previous methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are unused at the moment. Please let me know if we should keep them anyways

@@ -7,9 +7,14 @@
package org.elasticsearch.xpack.esql.optimizer.rules.logical;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a physical plan equivalent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, org.elasticsearch.xpack.esql.optimizer.PhysicalOptimizerRules, however it does not seem to have an rules for expressions.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice one, @idegtiarenko !

/**
* Defines if a node should be visited or not.
* Allows to skip nodes that are not applicable for the rule even if they contain expressions.
* By default that skips FROM, LIMIT, PROJECT, KEEP and DROP but this list could be extended or replaced in subclasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context: @idegtiarenko found out that the simple FROM logs-* | LIMIT 10, which we want to make faster, seemed to spend significant amount of time just traversing plan nodes/expressions in case that the index pattern matched a ton of fields. Just not traversing the expressions of EsRelation every time, needlessly, should cut down on the planning/optimizing time for such queries.

};

rule.apply(
new EsqlParser().createStatement("FROM index | EVAL x=f1+1 | KEEP x, f2 | LIMIT 1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could also add a query with DROP.

Copy link
Contributor Author

@idegtiarenko idegtiarenko May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note, this test is only executed with a single test rule. As a result drop is not converted to project (as suggested here) and test would highlight rule execution with it.
I am not sure it is worth configuring attribute resolution here to check this aspect.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I've checked some of the optimization rules that might interfere with the plan types excluded in the tree traversal in this PR and things seem to work (stats transforms itself sometimes in a project and stats do support expressions). Since I don't think we will introduce limit 1+2 any time soon, it does look good to me.

@idegtiarenko idegtiarenko added auto-backport Automatically create backport pull requests when merged v8.19.0 labels May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OptimizerExpressionRule should be conditional
5 participants