-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ES|QL] Date nanos implicit casting in union types #123678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ES|QL] Date nanos implicit casting in union types #123678
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
@elasticmachine update branch |
Fang, can you please double check the status of this PR? Is it ready for review or still working on it? |
I'm trying to make enrich work with union typed fields, and this is the last piece that I want to figure out before making this ready for review. |
for (Expression e : newProjections) { // Add groupings | ||
newAggs.add(Expressions.attribute(e)); | ||
} | ||
return agg.with(evalForInvalidMappedField(agg.source(), agg.child(), aliases), newProjections, newAggs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we inject an EVAL
here, there's a chance that we'll accidentally inject EVAL
s multiple times. Which could lead to broken plans, when an upstream EVAL
already replaced the multi-typed field attribute by a reference attribute of the same name. Or, put the other way around, since we're replacing a field by another one with the same name, downstream plans may have invalid references now.
This could happen in case that the reference resolution manages to resolve the whole query plan, with multiple references to multi-typed fields, before we start adding Evals.
List<? extends NamedExpression> origAggs = agg.aggregates(); | ||
List<NamedExpression> newAggs = new ArrayList<>(origAggs.size()); | ||
for (int i = 0; i < origAggs.size() - newProjections.size(); i++) { // Add aggregate functions | ||
newAggs.add(origAggs.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, the way to update the expressions of a plan is to transform the expressions and use AttributeMap.resolve
with an AttributeMap<Attribute>
that contains which attribute should be replaced by which other attribute.
A date in the valid date-nanos range should be acceptable to parse. This issue was showing up in a difference in behaviour between Elasticsearch ingest and the CsvTests. This fix makes CsvTests behave the same as Elasticearch itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran out of time, so I'll post the comments so far.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Outdated
Show resolved
Hide resolved
...sql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/FieldExtractorTestCase.java
Outdated
Show resolved
Hide resolved
...sql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/FieldExtractorTestCase.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/data/employees_incompatible.csv
Show resolved
Hide resolved
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Outdated
Show resolved
Hide resolved
Thanks for reviewing and brainstorming @craigtaverner @alex-spies ! Most of the comments have been addressed, here is a summary of the list of things that have been discussed:
There are some comments that haven't been addressed yet:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much nicer to me! I have a few comments and questions, but otherwise I think it is good.
@@ -180,7 +187,9 @@ public class Analyzer extends ParameterizedRuleExecutor<LogicalPlan, AnalyzerCon | |||
*/ | |||
new ImplicitCasting(), | |||
new ResolveRefs(), | |||
new ResolveUnionTypes() // Must be after ResolveRefs, so union types can be found | |||
new ResolveUnionTypes(), // Must be after ResolveRefs, so union types can be found | |||
// Must be after ResolveUnionTypes, if there is explicit casting on the union typed fields, implicit casting won't be added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution!
// MultiTypeEsField, and don't be double cast here. | ||
Map<FieldAttribute, Alias> invalidMappedFieldCasted = new HashMap<>(); | ||
LogicalPlan transformedPlan = plan.transformUp(LogicalPlan.class, p -> { | ||
if (p instanceof UnaryPlan == false && p instanceof LookupJoin == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we allow through UnaryPlan and LookupJoin, but later we do nothing for LookupJoin. Is this a placeholder for a further update supporting LookupJoin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment here for LookupJoin
, and remove the instanceof
check, I was attempting to make date_nanos
fields work as join keys, however it didn't workout easily for me. Thank you for creating a separate issue for it!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
private static Expression castInvalidMappedField(DataType targetType, FieldAttribute fa) { | ||
Source source = fa.source(); | ||
return switch (targetType) { | ||
case DATETIME -> new ToDatetime(source, fa); // in case we decided to use DATE as a common type instead of DATE_NANOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice as an example, but is effectively dead code. I wonder what are the chances we'll change our mind on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we're making a decision prioritizing the LogsDb plans. But could we be causing trouble for other users? Or is the thinking that other users need to add an explicit cast? That seems a reasonable assumption.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
// if so add a project with eval, so that a not null value can be returned for a union typed field | ||
if (logicalPlan.resolved()) { | ||
List<Attribute> output = logicalPlan.output(); | ||
Map<FieldAttribute, Alias> newAliases = Maps.newHashMapWithExpectedSize(output.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very similar to code above, but that also took into account previous/existing field mappings. Why is that not needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of castInvalidMappedFieldInFinalOutput
is to add implicit casting to mixed date
/date_nanos
fields, that are not referenced by the query(there was no chance to do implicit casting for them if they are not referenced in any commands), however they are in the returned resultsets. For example, from index*
. It only checks the top level plan's output, it doesn't traverse the entire tree, so it doesn't have a map to keep track of the casted fields map across all plans/subplans.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Eval.java
Outdated
Show resolved
Hide resolved
Discussed offline with @fang-xing-esql and @craigtaverner. I think the proposed solution should work, it's just a little complex given that it needs to replicate a) some logic present in InsertFieldExtraction (inserting EVALs for implicitly cast fields just before they're used) and b) for passing on fields that users never explicitly dropped or kept into the final output correctly. I suggested one more approach which we agreed to try before going forward with the Eval-injection approach. The alternative would turn invalid mapped fields of date/nanos type into multi typed fields per default, before any resolution step (multi typed fields in field attributes trigger automatic conversion on field extraction). I threw together a draft as a commit on top of this PR, which seems to still run most tests fine (all existing union type tests + most tests from this PR). This approach still needs to special case for the situation where users specifically don't want to convert into nanos, but dates, i.e. when the query has |
Resolves #110009
The goal of this PR is to support union typed fields that reference both
date
anddate_nanos
fields without explicit casting. Union typed(mixed withdate
anddate_nanos
) fields are casted to a common data type during Analyzer phase automatically.numeric
orstring
typed fields are not within the scope of this PR.Today, queries that reference union(mixed with
date
anddate_nanos
) typed fields like below either return nulls or fail, because they reference union typed fields without explicit casting. After this PR, they should not fail or return null.A field can be referenced in nearly all the processing commands, and this PR should cover all the possible places a (union typed)field can appear: