-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Make OptimizerExpressionRule conditional #127500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Make OptimizerExpressionRule conditional #127500
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. We need to double check all affected rules before merging, but this should work and I'm hoping for a nice perf boost on queries that have tons of field attributes in relations/projections (like FROM logs-*
).
case EsRelation esr -> false; | ||
case Project p -> false;// this covers both keep and project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: IMO we should document in the javadoc that relation + projection are getting skipped per default and that one should override shouldVisit
to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I opened a draft at this point to see if this breaks a lot of tests (it did not 🎉 ) so now I will focus no documenting and testing it.
@@ -194,6 +193,12 @@ public <E extends T> T transformDown(Class<E> typeToken, Function<E, ? extends T | |||
return transformDown((t) -> (typeToken.isInstance(t) ? rule.apply((E) t) : t)); | |||
} | |||
|
|||
@SuppressWarnings("unchecked") | |||
public <E extends T> T transformDown(Predicate<Node<?>> tokenPredicate, Function<E, ? extends T> rule) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
public <E extends T> T transformDown(Predicate<Node<?>> tokenPredicate, Function<E, ? extends T> rule) { | |
public <E extends T> T transformDown(Predicate<Node<?>> nodePredicate, Function<E, ? extends T> rule) { |
we don't have class tokens here.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
/** | ||
* Defines if a node should be visited or not. | ||
* Allows to skip nodes that are not applicable for the rule even if they contain expressions. | ||
* By default that skips FROM, LIMIT, PROJECT, KEEP and DROP but this list could be extended or replaced in subclasses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you come up with this list of commands/node types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are commands that are either common but have not expressions (limit) or common, contain list of attributes but do not require any optimization or rearranging (such as keep/project/drop) if properly constructed they only contain plain list of attribute used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more context: @idegtiarenko found out that the simple FROM logs-* | LIMIT 10
, which we want to make faster, seemed to spend significant amount of time just traversing plan nodes/expressions in case that the index pattern matched a ton of fields. Just not traversing the expressions of EsRelation
every time, needlessly, should cut down on the planning/optimizing time for such queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be making a trap for ourselves in the future due to implicit filtering however the PR is contained (thanks for that), the tests are passing and the perf numbers look great.
So much so that backporting this to 8.19 makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the previous methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are unused at the moment. Please let me know if we should keep them anyways
@@ -7,9 +7,14 @@ | |||
package org.elasticsearch.xpack.esql.optimizer.rules.logical; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a physical plan equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, org.elasticsearch.xpack.esql.optimizer.PhysicalOptimizerRules
, however it does not seem to have an rules for expressions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice one, @idegtiarenko !
/** | ||
* Defines if a node should be visited or not. | ||
* Allows to skip nodes that are not applicable for the rule even if they contain expressions. | ||
* By default that skips FROM, LIMIT, PROJECT, KEEP and DROP but this list could be extended or replaced in subclasses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more context: @idegtiarenko found out that the simple FROM logs-* | LIMIT 10
, which we want to make faster, seemed to spend significant amount of time just traversing plan nodes/expressions in case that the index pattern matched a ton of fields. Just not traversing the expressions of EsRelation
every time, needlessly, should cut down on the planning/optimizing time for such queries.
.../esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/OptimizerRules.java
Outdated
Show resolved
Hide resolved
}; | ||
|
||
rule.apply( | ||
new EsqlParser().createStatement("FROM index | EVAL x=f1+1 | KEEP x, f2 | LIMIT 1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could also add a query with DROP
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note, this test is only executed with a single test rule. As a result drop is not converted to project (as suggested here) and test would highlight rule execution with it.
I am not sure it is worth configuring attribute resolution here to check this aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I've checked some of the optimization rules that might interfere with the plan types excluded in the tree traversal in this PR and things seem to work (stats
transforms itself sometimes in a project
and stats
do support expressions). Since I don't think we will introduce limit 1+2
any time soon, it does look good to me.
This makes OptimizerExpressionRule conditional.
This should allow to skip expression traversal (as it might be quiet expensive, especially in case of multiple attributes) if we know beforehand that certain plan type is applicable for the rule or can not be changed by it.
According to
QueryPlanningBenchmark.run
, this change makes it twice faster to parse the query:Closes: #124288