-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: planning perf improvements over many fields #124395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
:Analytics/ES|QL
AKA ESQL
>enhancement
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Comments
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 8, 2025
A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395
Pinging @elastic/es-analytical-engine (Team:Analytics) |
costin
added a commit
that referenced
this issue
Mar 10, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates #124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 10, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 10, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 10, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395
elasticsearchmachine
pushed a commit
that referenced
this issue
Mar 11, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates #124395
elasticsearchmachine
pushed a commit
that referenced
this issue
Mar 11, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates #124395
elasticsearchmachine
pushed a commit
that referenced
this issue
Mar 11, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates #124395
georgewallace
pushed a commit
to georgewallace/elasticsearch
that referenced
this issue
Mar 11, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 12, 2025
Avoid creating outputSet between nodes that passthrough their input Relates elastic#124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 12, 2025
(re)make these two collections immutable so they can be shared without restrictions through-out the plan. This is especially useful for reusing one nodes output as another nodes input. Relates elastic#124395
albertzaharovits
pushed a commit
to albertzaharovits/elasticsearch
that referenced
this issue
Mar 13, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395
jfreden
pushed a commit
to jfreden/elasticsearch
that referenced
this issue
Mar 13, 2025
* ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates elastic#124395
costin
added a commit
that referenced
this issue
Mar 20, 2025
Avoid creating outputSet between nodes that passthrough their input Relates #124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 20, 2025
…24611) Avoid creating outputSet between nodes that passthrough their input Relates elastic#124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 20, 2025
…24611) Avoid creating outputSet between nodes that passthrough their input Relates elastic#124395
costin
added a commit
to costin/elasticsearch
that referenced
this issue
Mar 20, 2025
…24611) Avoid creating outputSet between nodes that passthrough their input Relates elastic#124395
elasticsearchmachine
pushed a commit
that referenced
this issue
Mar 20, 2025
elasticsearchmachine
pushed a commit
that referenced
this issue
Mar 20, 2025
costin
added a commit
that referenced
this issue
Mar 20, 2025
smalyshev
pushed a commit
to smalyshev/elasticsearch
that referenced
this issue
Mar 21, 2025
…24611) Avoid creating outputSet between nodes that passthrough their input Relates elastic#124395
omricohenn
pushed a commit
to omricohenn/elasticsearch
that referenced
this issue
Mar 28, 2025
…24611) Avoid creating outputSet between nodes that passthrough their input Relates elastic#124395
bpintea
added a commit
that referenced
this issue
Apr 2, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed. Introduce/adjust builders for them, which are now the only way to use a modifiable map/set. Related #124395
bpintea
added a commit
to bpintea/elasticsearch
that referenced
this issue
Apr 3, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed. Introduce/adjust builders for them, which are now the only way to use a modifiable map/set. Related elastic#124395 (cherry picked from commit 2b512bc)
elasticsearchmachine
pushed a commit
that referenced
this issue
Apr 3, 2025
#126207) * ESQL: make `AttributeMap` and `AttributeSet` immutable (#125938) This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed. Introduce/adjust builders for them, which are now the only way to use a modifiable map/set. Related #124395 (cherry picked from commit 2b512bc) * Update 9.0-specific AttributeSet usage
bpintea
added a commit
to bpintea/elasticsearch
that referenced
this issue
Apr 3, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed. Introduce/adjust builders for them, which are now the only way to use a modifiable map/set. Related elastic#124395 (cherry picked from commit 2b512bc)
andreidan
pushed a commit
to andreidan/elasticsearch
that referenced
this issue
Apr 9, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed. Introduce/adjust builders for them, which are now the only way to use a modifiable map/set. Related elastic#124395
bpintea
added a commit
that referenced
this issue
Apr 10, 2025
Currently, each plan node iterration in ProjectAwayColumns creates 3 AttributeSet/Map_s. This can be dropped to just one by using builders. Related: #124395
elasticsearchmachine
pushed a commit
that referenced
this issue
Apr 10, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/ES|QL
AKA ESQL
>enhancement
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Description
We've noticed that planning shows up in the profiler when dealing with huge mappings (10k-100k+ fields).
Overall the goal is to add conditional and prevent code for execution on such large number of objects by avoid iteration in the first place.
This meta issue contains a list of (potential) improvements to apply to improve performance in this scenario broken down in two main buckets:
Avoiding execution
rules working on expressions should perform basic assertion to check whether the logic has to be applied or not, such as checking the size of the collection or attributes.
avoid creating new AttributeMap/Set by making the collection immutable again so it can be safely passed around
(wrap the add/removeIf/delete methods through an utility class so they can be passed only for newly created
objects).
ESQL: make
AttributeMap
andAttributeSet
immutable #125938avoid collection copying unless needed
Node#transformChildren and QueryPlan#doTransformExpression create a new Array based on the size of
the children all the time. This should be done lazy and potentially different (clone()).
ESQL: Lazy collection copying during node transform #124424
double check array sorting - Analyzer#278/279
ESQL: high count fields sorting removal #125417
ProjectAwayColumns always creates an output set clone - 44/83/84
ESQL: optimise ProjectAwayColumns handling of AttributeSet/Map #126610
Optimized execution of existing code
LogicalVerifier#verify
PruneColumns
PropagateUnmappedFields
PropagateEvalFoldables
stop using super inside TypedAttribute/NamedExpression/Attribute/FieldAttribute equals
Currently the equals method delegate to their parent which helps with code but also causes suboptimal equality since the children of the node are compared before the attributes. Better to compare all the node properties first and delegate to the collection as a last result.
use collection hashing before performing attributes equality
to avoid comparing large collections, use a hash comparison first before iterating over the collection
optimize Node#forEachProperty
prop != children && children.contains(prop) == false && typeToken.isInstance(prop)
-->prop != children && typeToken.isInstance(prop) && children.contains(prop) == false
ESQL: Lazy collection copying during node transform #124424
look in removing/replacing children.contains(prop) inside Node#forEachProperty
A (linkedhashSet) set would work better and preserve order however it would prevent a child to appear more than once. This can be an issue in projection with duplicate fields (keep a,a,a).
optimize NameId#hashCode to avoid array boxing (use Long.hashCode(id) instead) ESQL: Lazy collection copying during node transform #124424
replace Java stream api with regular for-loop
Though brief, stream(), collect(), reduce() & co are slower than their equivalent foreach and pollute the stack trace. They have an edge in parallel processing which, depending on the data size, could yield better results however that's not the case here.
The text was updated successfully, but these errors were encountered: