Skip to content

[ES|QL] Add suggested_cast #127139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 28, 2025
Merged

[ES|QL] Add suggested_cast #127139

merged 9 commits into from
Apr 28, 2025

Conversation

limotova
Copy link
Contributor

@limotova limotova commented Apr 22, 2025

Adds a suggested_cast for unsupported fields based on original_types
for specific cases:

  • when a field is mapped to datetime and date_nanos, suggests date_nanos
  • when a field is mapped to aggregate_metric_double and other numerics, suggests aggregate_metric_double
  • when a field is unsupported anywhere, suggests nothing
  • suggests keyword in all other cases

For example, a field that is mapped to aggregate_metric_double in one index, but double in another will have a response like:

    {
      "name" : "agg_metric",
      "type" : "unsupported",
      "original_types" : [
        "aggregate_metric_double",
        "double"
      ],
      "suggested_cast" : "aggregate_metric_double"
    }

A field that is completely unsupported will have a response like:

    {
      "name" : "binary_field",
      "type" : "unsupported",
      "original_types" : [
        "binary"
      ]
    }

@limotova limotova force-pushed the recommended-cast branch 2 times, most recently from fd0b69d to b2e2546 Compare April 23, 2025 17:38
@limotova limotova changed the title recommended cast [ES|QL] Add suggested_cast Apr 23, 2025
@limotova limotova marked this pull request as ready for review April 23, 2025 21:08
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Apr 23, 2025
@limotova limotova added >enhancement :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Apr 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @limotova, I've created a changelog YAML for you.

@limotova
Copy link
Contributor Author

limotova commented Apr 23, 2025

Would it make sense to also add a suggestion to cast in the error message?
Like here so we have a message that reads more like:
Cannot use field [agg_metric] due to ambiguities being mapped as [2] incompatible types: [aggregate_metric_double] in [my-index], [double] in [my-index2]. Consider casting field to [aggregate_metric_double] ?

@nik9000 nik9000 added the ES|QL-ui Impacts ES|QL UI label Apr 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example response in the PR description? It helps folks scanning these things.

@nik9000
Copy link
Member

nik9000 commented Apr 23, 2025

I was wondering to myself "should this return a list?" and I think the answer is "no". At this point there's always one best choice. keyword is almost always an option though. Maybe it'd be nice to say ["date", "keyword"] for dates? Like, you have two choices? I don't think so.... But maybe? It's nice to know for the truly unsupported cases that you don't have any choices.

return null;
}
if (originalTypes.contains(DATE_NANOS) && originalTypes.contains(DATETIME) && originalTypes.size() == 2) {
return DATETIME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not DATE_NANOS? This is orthogonal to aggregate_metric_double, but I thought we want to auto-cast (not even suggest) to nanos and error out if the conversion is not possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still some up in the air about this one. DATE_NANOS would leave a lot of stuff out of range. I feel like it's pretty low commitment to put this here though.

}
}
if (allNumeric) {
return AGGREGATE_METRIC_DOUBLE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an option to use one of the underlying metrics (e.g. max) in a mixed query? Or, we'd have to convert the e.g. double to aggregate_metric_double first, and then use the max from all of them (original and converted)? That'd be one expensive operation to read all values - it may be fine, if there's no better way to mix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Casting is only something we can do with one value at a time but these underlying metrics are things you'd get from an agg. I think. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the underlying metrics would be accessed in specific aggregations.
Maybe when we implement implicit casting of aggregate_metric_double in aggregations instead of going double -> aggregate_metric_double -> submetric (double/int), we can do aggregate_metric_double -> submetric (double/int), and have no conversion for doubles/ints (or if the other mapping is a long or something convert that to the submetric type)?
(The count submetric is an int but the others are doubles)

@stratoula
Copy link

How can I cast to aggregate_metric_double ? Do we have a function for that already??
In case we want to utilize it in autocomplete which is the logic?

Is going to be

TO_xxx where xxx the suggested_cast value?

@nik9000
Copy link
Member

nik9000 commented Apr 24, 2025

Would it make sense to also add a suggestion to cast in the error message?

I'm cool with it. I wish we generated that error message lazily, but otherwise it's cool.

@alex-spies
Copy link
Contributor

when a field is mapped to datetime and date_nanos, suggests datetime

Sorry for the drive by, but wanted to let you know: there's ongoing work (#123678) that will implicitly cast to date_nanos in this case, contrary to the suggestion, which may also become obsolete.

@nik9000
Copy link
Member

nik9000 commented Apr 24, 2025

Is going to be

TO_xxx where xxx the suggested_cast value?

Yeah, that exists. I figure folks will do <field>::AGGREGATE_METRIC_DOUBLE though.

@limotova, what do you think of an integration test that sets up indices with the conflicting types, does a FROM a,b and gets this back and then does a FROM a,b,c | EVAL f = f::<UPPER_CASE_SUGGESTED_TYPE>. That's how we expect folks to use this and I know it works now, but this integration test would catch if we break it.

Not sure we could do it for x - but maybe. Parameterized testing would make it easy to write. It probably takes a half second per combo... We have about 30 types. That'd be 900 cases.... so 450 seconds which is way too slow. But maybe I'm off? Still worth doing for some known combos I think. Ooh - it's probably half that because we don't care about order! Still 225 seconds is a lot.

@stratoula
Copy link

I figure folks will do ::AGGREGATE_METRIC_DOUBLE though

Sure this works for me too! As long as the cast type is valid!

@kkrik-es
Copy link
Contributor

lgtm.. delegating to Nik to approve, since this is really ES|QL territory.

@limotova
Copy link
Contributor Author

limotova commented Apr 24, 2025

there's ongoing work (#123678) that will implicitly cast to date_nanos in this case, contrary to the suggestion, which may also become obsolete.

It sounds like I should change the datetime + date_nanos combo to suggest date_nanos instead?

integration test that sets up indices with the conflicting types
...
We have about 30 types. That'd be 900 cases

I can set up the integration test but I think it might be worth mentioning too that we might have 3+ types in a single request, so if we really wanted to go crazy there could be a lot more cases...
But I can set up a test that does all possible pairs

return null;
}
if (originalTypes.contains(DATE_NANOS) && originalTypes.contains(DATETIME) && originalTypes.size() == 2) {
return DATETIME;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still some up in the air about this one. DATE_NANOS would leave a lot of stuff out of range. I feel like it's pretty low commitment to put this here though.

@limotova limotova merged commit 1296258 into elastic:main Apr 28, 2025
17 checks passed
@limotova limotova deleted the recommended-cast branch April 28, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants