Skip to content

ESQL: planning perf improvements over many fields #124395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 of 15 tasks
costin opened this issue Mar 8, 2025 · 1 comment
Open
5 of 15 tasks

ESQL: planning perf improvements over many fields #124395

costin opened this issue Mar 8, 2025 · 1 comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@costin
Copy link
Member

costin commented Mar 8, 2025

Description

We've noticed that planning shows up in the profiler when dealing with huge mappings (10k-100k+ fields).
Overall the goal is to add conditional and prevent code for execution on such large number of objects by avoid iteration in the first place.
This meta issue contains a list of (potential) improvements to apply to improve performance in this scenario broken down in two main buckets:

Avoiding execution

Optimized execution of existing code

  • LogicalVerifier#verify

  • PruneColumns

  • PropagateUnmappedFields

  • PropagateEvalFoldables

  • stop using super inside TypedAttribute/NamedExpression/Attribute/FieldAttribute equals

Currently the equals method delegate to their parent which helps with code but also causes suboptimal equality since the children of the node are compared before the attributes. Better to compare all the node properties first and delegate to the collection as a last result.

  • use collection hashing before performing attributes equality
    to avoid comparing large collections, use a hash comparison first before iterating over the collection

  • optimize Node#forEachProperty
    prop != children && children.contains(prop) == false && typeToken.isInstance(prop) -->
    prop != children && typeToken.isInstance(prop) && children.contains(prop) == false
    ESQL: Lazy collection copying during node transform #124424

  • look in removing/replacing children.contains(prop) inside Node#forEachProperty
    A (linkedhashSet) set would work better and preserve order however it would prevent a child to appear more than once. This can be an issue in projection with duplicate fields (keep a,a,a).

  • optimize NameId#hashCode to avoid array boxing (use Long.hashCode(id) instead) ESQL: Lazy collection copying during node transform #124424

  • replace Java stream api with regular for-loop

Though brief, stream(), collect(), reduce() & co are slower than their equivalent foreach and pollute the stack trace. They have an edge in parallel processing which, depending on the data size, could yield better results however that's not the case here.

@costin costin added >enhancement needs:triage Requires assignment of a team area label labels Mar 8, 2025
costin added a commit to costin/elasticsearch that referenced this issue Mar 8, 2025
A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates elastic#124395
@pxsalehi pxsalehi added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Mar 10, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

costin added a commit that referenced this issue Mar 10, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates #124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 10, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates elastic#124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 10, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates elastic#124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 10, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates elastic#124395
elasticsearchmachine pushed a commit that referenced this issue Mar 11, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates #124395
elasticsearchmachine pushed a commit that referenced this issue Mar 11, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates #124395
elasticsearchmachine pushed a commit that referenced this issue Mar 11, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates #124395
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this issue Mar 11, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates elastic#124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 12, 2025
Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 12, 2025
(re)make these two collections immutable so they can be shared without
 restrictions through-out the plan. This is especially useful for
 reusing one nodes output as another nodes input.

Relates elastic#124395
albertzaharovits pushed a commit to albertzaharovits/elasticsearch that referenced this issue Mar 13, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates elastic#124395
jfreden pushed a commit to jfreden/elasticsearch that referenced this issue Mar 13, 2025
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates elastic#124395
costin added a commit that referenced this issue Mar 20, 2025
Avoid creating outputSet between nodes that passthrough their input

Relates #124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 20, 2025
…24611)

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 20, 2025
…24611)

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
costin added a commit to costin/elasticsearch that referenced this issue Mar 20, 2025
…24611)

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
elasticsearchmachine pushed a commit that referenced this issue Mar 20, 2025
…125275)

Avoid creating outputSet between nodes that passthrough their input

Relates #124395
elasticsearchmachine pushed a commit that referenced this issue Mar 20, 2025
…125273)

Avoid creating outputSet between nodes that passthrough their input

Relates #124395
costin added a commit that referenced this issue Mar 20, 2025
…125274)

Avoid creating outputSet between nodes that passthrough their input

Relates #124395
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this issue Mar 21, 2025
…24611)

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this issue Mar 28, 2025
…24611)

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
bpintea added a commit that referenced this issue Apr 2, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed.
Introduce/adjust builders for them, which are now the only way to use a modifiable map/set.

Related #124395
bpintea added a commit to bpintea/elasticsearch that referenced this issue Apr 3, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed.
Introduce/adjust builders for them, which are now the only way to use a modifiable map/set.

Related elastic#124395

(cherry picked from commit 2b512bc)
elasticsearchmachine pushed a commit that referenced this issue Apr 3, 2025
#126207)

* ESQL: make `AttributeMap` and `AttributeSet` immutable (#125938)

This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed.
Introduce/adjust builders for them, which are now the only way to use a modifiable map/set.

Related #124395

(cherry picked from commit 2b512bc)

* Update 9.0-specific AttributeSet usage
bpintea added a commit to bpintea/elasticsearch that referenced this issue Apr 3, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed.
Introduce/adjust builders for them, which are now the only way to use a modifiable map/set.

Related elastic#124395

(cherry picked from commit 2b512bc)
elasticsearchmachine pushed a commit that referenced this issue Apr 3, 2025
…6226)

This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed.
Introduce/adjust builders for them, which are now the only way to use a modifiable map/set.

Related #124395

(cherry picked from commit 2b512bc)
andreidan pushed a commit to andreidan/elasticsearch that referenced this issue Apr 9, 2025
This will allow reusing them in the plan analysis and skip recreating them in UnaryPlan/UnaryExec when not needed.
Introduce/adjust builders for them, which are now the only way to use a modifiable map/set.

Related elastic#124395
bpintea added a commit that referenced this issue Apr 10, 2025
Currently, each plan node iterration in ProjectAwayColumns creates 3
AttributeSet/Map_s. This can be dropped to just one by using builders.

Related: #124395
elasticsearchmachine pushed a commit that referenced this issue Apr 10, 2025
…126615)

Currently, each plan node iterration in ProjectAwayColumns creates 3
AttributeSet/Map_s. This can be dropped to just one by using builders.

Related: #124395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

3 participants