-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Have CombineProjections propagate references upwards #127264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Have CombineProjections propagate references upwards #127264
Conversation
This will have CombineProjections allow references from the "under" plan be kept when merging stacked Projections. This is to prevent field attributes that are dropped by ReplaceMissingFieldWithNull be used in the resulting plan.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @bpintea, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find.
💔 Backport failed
You can use sqren/backport to manually backport by running |
Heya, I don't understand what problem this is solving. @bpintea , could you elaborate why it's bad if dropped field attributes get referenced upstream? They're not really dropped, they're actually shadowed by a reference attribute with the same name id obtained by injecting an Because, this behavior is not entirely eradicated, as can be seen in your added test (thanks!), where the exchange still thinks it's handling a field attribute This is an inherent wonkiness in |
Hey @alex-spies,
Indeed, you're technically correct. :)
That happens on the coordinator, right? Which should be correct, as that's the plan output there. The shadowing with
Merging the stack plans and selecting the field attribute - which is in that case shadowed - is simply incorrect and was so just because the rule only considered aliases for the AttributeMap. The fact that it worked (so far) doesn't make it correct. Laterally, I got it by revisiting the rules, not a failed query. |
I agree with you that this should be considered incorrect that we're referencing a field attribute although we're actually using a reference attribute obtained from an Unfortunately, it's been working like this since Fixing this properly would require updating the expressions of all downstream plans after the If we really want to properly fix this, I think it should be fixed by acknowledging that |
Correct (as per the PR's title :) ); it's a small fix.
True; that would be a larger, potentially different change: I guess subsequently in some of those cases we could/should fold the |
@@ -93,7 +93,8 @@ protected NodeInfo<Alias> info() { | |||
} | |||
|
|||
public Alias replaceChild(Expression child) { | |||
return new Alias(source(), name(), child, id(), synthetic()); | |||
// these "nop" replacements can occur on attribute map resolutions having the default return same as the lookup key | |||
return child == this.child ? this : new Alias(source(), name(), child, id(), synthetic()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an optimization, unrelated to this PR, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It's supposed to avoid creating objects while planning, if not needed. The change is small and not sure if a stand-alone PR would make sense, but happy to revert and split, if needed.
} else if (ne instanceof ReferenceAttribute ra) { | ||
namedExpressionsBuilder.put(ra, ra); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This captures aliases but now also References - why do we care about them and what about other types of named expression (e.g. FieldAttribute)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this change doesn't affect queries and their computation, it just opportunistically replaces e.g. emp_no{f}
by emp_no{r}
(with same name id) in some projections if there's something like
FROM employees | KEEP emp_no
but emp_no
is missing and ReplaceMissingFieldsWithNull
injects an EVAL emp_no = null
after the FROM
.
I don't think this makes a difference, really, as the attribute type mismatch between emp_no{f}
and emp_no{r}
remains in most other commands in the query (but this is, currently, just a "cosmetic" issue in the planning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just opportunistically replaces e.g. emp_no{f} by emp_no{r} (with same name id) in some projections if there's something like
It's not just then. There'll always be two projections to combine in case of a locally missing field, due to the effects of ProjectAwayColumns
and ReplaceMissingFieldWithNull
both creating Project
nodes.
I don't think this makes a difference, really, as the attribute type mismatch between emp_no{f} and emp_no{r} remains in most other commands in the query (but this is, currently, just a "cosmetic" issue in the planning).
IMO the change the PR proposes is a correction, not just "cosmetic". But it's true that it currently makes no practical difference.
But to exemplify why this works now: FROM ... | EVAL x = missing_field + 1
gets planned as Eval
- Limit
- EsRelation
. We're pushing Eval
down as much as possible, but we push Limit
"harder". With the effect that the LimitExec
"breaks" the plan and the local fragment lacks the Eval
, which makes the extraction work (i.e. actually skip the field).
But ideally this evaluation would be distributed and one day Eval
pushed into the fragment as well. The InsertFieldExtraction
only looks at FieldAttribute
s (and MetadataAttribute
s), so we we'd want to have the merge generate a ReferenceAttribute
instead of a field one.
This captures aliases but now also References - why do we care about them and what about other types of named expression (e.g. FieldAttribute)?
I hope the above clarifies. But happy to give further details, if needed.
But I'm also happy to revert the change and add a comment instead. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InsertFieldExtraction only looks at FieldAttributes (and MetadataAttributes), so we we'd want to have the merge generate a ReferenceAttribute instead of a field one.
It's true that replacing the field attribute by a ref attr downstream in the plan will make InsertFieldExtrac
gloss over the missing attribute altogether.
But InsertFieldExtract
's existing mechanism already compares with the upstream output by name id (we use attribute sets), so this is already taken care of. Or should be, anyway.
This will have
CombineProjections
allow references from the "under" plan be kept when merging stackedProjection
s.This is to prevent field attributes that are dropped by
ReplaceMissingFieldWithNull
be used in the resulting plan.