-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ESQL: Fix count optimization with pushable union types #127225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7e46eae
c90cacf
3cc50a8
50c8a2a
becc26c
e8251f6
a7e7faa
a08b148
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pr: 127225 | ||
summary: Fix count optimization with pushable union types | ||
area: ES|QL | ||
type: bug | ||
issues: | ||
- 127200 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,12 +66,12 @@ public LogicalPlan apply(LogicalPlan plan, LocalLogicalOptimizerContext localLog | |
|
||
private LogicalPlan missingToNull(LogicalPlan plan, Predicate<FieldAttribute> shouldBeRetained) { | ||
if (plan instanceof EsRelation relation) { | ||
// Remove missing fields from the EsRelation because this is not where we will obtain them from; replace them by an Eval right | ||
// after, instead. This allows us to safely re-use the attribute ids of the corresponding FieldAttributes. | ||
// For any missing field, place an Eval right after the EsRelation to assign null values to that attribute (using the same name | ||
// id!), thus avoiding that InsertFieldExtrations inserts a field extraction later. | ||
// This means that an EsRelation[field1, field2, field3] where field1 and field 3 are missing will be replaced by | ||
// Project[field1, field2, field3] <- keeps the ordering intact | ||
// \_Eval[field1 = null, field3 = null] | ||
// \_EsRelation[field2] | ||
// \_EsRelation[field1, field2, field3] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the field is missing, would it be in the EsRelation at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! Because we're in the local optimizer, the field does exist overall, and thus is put into the EsRelation after the field caps call on the coordinator. But on the local node it's missing! (Or in the search stats that this optimization run uses, to be more precise) This optimizer rule applies exclusively to such fields. |
||
List<Attribute> relationOutput = relation.output(); | ||
Map<DataType, Alias> nullLiterals = Maps.newLinkedHashMapWithExpectedSize(DataType.types().size()); | ||
List<NamedExpression> newProjections = new ArrayList<>(relationOutput.size()); | ||
|
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -94,15 +94,19 @@ private Tuple<List<Attribute>, List<EsStatsQueryExec.Stat>> pushableStats( | |||
// check if regular field | ||||
else { | ||||
if (target instanceof FieldAttribute fa) { | ||||
var fName = fa.name(); | ||||
var fName = fa.fieldName(); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wow! Is this the fix? I imagine this could have impacts in many places, so could this fix other bugs we've not noticed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, this is the fix. Simple oversight to use the correct field name - in the past, the field attribute name and the field name were the same, but union types had to break with this pattern. I do not see other situations that this may fix, too, because the specific stats-pushdown optimization only triggers in a narrow slice of queries, anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I'm wondering is if we could make this mistake again and if we could prevent it. Line 58 in 9e0a5af
|
||||
if (context.searchStats().isSingleValue(fName)) { | ||||
fieldName = fa.name(); | ||||
fieldName = fName; | ||||
query = QueryBuilders.existsQuery(fieldName); | ||||
} | ||||
} | ||||
} | ||||
if (fieldName != null) { | ||||
if (count.hasFilter()) { | ||||
// Note: currently, it seems like we never actually perform stats pushdown if we reach here. | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems the question of what exactly gets pushed down and why is subtle. I wonder if we want some documentation on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When we touch this rule again, I think we should add unit tests that demonstrate exactly what is pushed down and how. I also found that union type filters don't seem to be pushed to Lucene, yet - maybe we could improve the documentation as part of that work? If you agree, I can open up an issue for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We evolved to this. Before the aggs filter were extracted into an upstream filter and pushed down, this code was "alive". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep yep, and if we manage to evolve the optimization to multiple stats, this code could come alive again. There's an argument to be made that it should be deleted as long as it's dead - but properly double checking that this code path really is never ever used atm was beyond the scope that I could allocate to this bug fix. So leaving a comment was the next best thing for the time being, I think :) |
||||
// That's because stats pushdown only works for 1 agg function (without BY); but in that case, filters | ||||
// are extracted into a separate filter node upstream from the aggregation (and hopefully pushed into | ||||
// the EsQueryExec separately). | ||||
if (canPushToSource(count.filter()) == false) { | ||||
return null; // can't push down | ||||
} | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍