Skip to content

Fix behavior for _index LIKE for ESQL #130019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 61 commits into
base: main
Choose a base branch
from

Conversation

julian-elastic
Copy link
Contributor

@julian-elastic julian-elastic commented Jun 25, 2025

Fixes _index LIKE <pattern> to always have normal text matching semantics.

Implement a generic ExpressionQuery and ExpressionQueryBuilder that can be serialized to the data node. Then the ExpressionQueryBuilder can build an Automaton using TranslationAware.asLuceneQuery() and execute it in Lucine.

Introduces a breaking change for LIKE on _index fields. The old like behavior is not correct and does not have normal like semantics from ESQL. Customers upgrading from old build to new build might see a regression, where the data changes due to the like filters on clustering produces different results, but the new results are correct.

Behavior for ESQL
New CCS to New => New behavior everywhere
Old CCS to New => Old behavior everywhere (the isForESQL flag is not passed in from old)
New CCS to Old => New behavior for new, old behavior for old (the isForESQL cannot be passed, old does not know about it).
Old CCS to Old => Old behavior everywhere

Closes #129511

@julian-elastic
Copy link
Contributor Author

@nik9000 This is the draft PR

@elasticsearchmachine
Copy link
Collaborator

Hi @julian-elastic, I've updated the changelog YAML for you.

@@ -148,7 +148,7 @@ public Query asQuery(LucenePushdownPredicates pushdownPredicates, TranslatorHand
// TODO: Get the real FoldContext here
var wildcardQuery = QueryParser.escape(BytesRefs.toString(prefix.fold(FoldContext.small()))) + "*";

return new WildcardQuery(source(), fieldName, wildcardQuery);
return new WildcardQuery(source(), fieldName, wildcardQuery, false, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be false, true? So we force a string match in case folks are running START_WITH against _index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for ENDS_WITH I suppose

Copy link
Contributor Author

@julian-elastic julian-elastic Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change you requested to false, true. Then I tried adding UTs to cover the ENDS_WITH/BEGINS_WITH on index. It seems like ENDS_WITH/BEGINS_WITH is broken on an index, similar to LIKE before the change, as they all used WildcardQuery. It is fixed in some cases with my change, broken with others. If we want comprehensive fix, I can work on it, but it will delay the LIKE PR and it is getting pretty big already.
Do we want to stick with the current broken behavior and handle ENDS_WITH/BEGINS_WITH in a separate PR? I want to get this one checked in first if possible.

To make matters worse QueryParser.escape() in the line right above is the wrong escape sequence for our PR. If you call it on "remote-index" it will escape the dash and change the wildcardQuery to "remote\-index" which is wrong I think and will produce no data. QueryParser.escape() does regular expression escaping, not wildcard escaping. It seems the correct escaping is in WildcardFieldMapper.escapeWildcardSyntax(). But changing it here, might break more cases/cause different data if we CCS to an older server that uses the old WildcardQuery workflow with the new escaping. So it is not trivial to change the false to true. I think keeping it as false should preserve the old wrong behavior, so no regressions and buys us time to investigate properly and handle all the cases. I think we need to discuss...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this and will fork it into an open issue.

Copy link
Contributor

github-actions bot commented Jul 7, 2025

@julian-elastic julian-elastic marked this pull request as draft July 7, 2025 21:31
@julian-elastic julian-elastic removed request for a team July 7, 2025 21:32
@@ -148,7 +148,7 @@ public Query asQuery(LucenePushdownPredicates pushdownPredicates, TranslatorHand
// TODO: Get the real FoldContext here
var wildcardQuery = QueryParser.escape(BytesRefs.toString(prefix.fold(FoldContext.small()))) + "*";

return new WildcardQuery(source(), fieldName, wildcardQuery);
return new WildcardQuery(source(), fieldName, wildcardQuery, false, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this and will fork it into an open issue.

@nik9000
Copy link
Member

nik9000 commented Jul 8, 2025

Let's get this in.

Once it's landed I can explain the backport process. It's even more TransportVersion fun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Filtering on _index is ignored
3 participants