ES\QL: Fix scoring for non full text functions #124540

carlosdelest · 2025-03-11T09:23:59Z

When using METADATA _score, there are some inconsistencies with scoring:

Non full text functions that are pushed down to Lucene contribute to scoring. WHERE id > 7 OR ratings > 4.5 would add 1.0 to scores for results that match any of those conditions.
Default score when not using full text functions is 1.0, as match all provides 1.0 as default score.
Query DSL filters in an ES|QL request contribute to the scoring instead of just filtering (see ESQL: Use a must boolean statement when pushing down to Lucene when scoring is also needed #124001)

This PR addressed the above by adding boosting in the Query class:

Defines a scorable() method that indicates whether a query should be considered for scoring. Full text functions redefine this to be true, all other queries have false as default.
Creates a new toQueryBuilder() method that uses the above information for adding a boost of 0 to queries that are not scorable. This method uses internally the previous asBuilder() method that is the one used by queries to provide the corresponding query builder.

Changes were done to tests so they take into account the new boosting for queries that do not contribute to score.

Another option I explored was to add queries that do not contribute to score to a filter clause instead. This proved more difficult as combining queries was not straightforward, and some resulting boolean queries were hard to simplify (think filters clauses with boolean queries that have filters themselves).

…eating boolean queries

elasticsearchmachine · 2025-03-11T09:24:24Z

Hi @carlosdelest, I've created a changelog YAML for you.

…ll-text-functions' into bugfix/esql-fix-scoring-full-text-functions # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

Copilot

PR Overview

This PR fixes scoring inconsistencies for full text functions in ES|QL by introducing a scorable flag and adjusting query boosting accordingly. Key changes include updated query classes with protected asBuilder methods, integration tests validating scoring behavior, and modifications in DSL construction for proper boost application.

Reviewed Changes

File	Description
x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/ScoringIT.java	Adds integration tests for scoring behavior
docs/changelog/124540.yaml	Adds changelog entry for the fix
x-pack/plugin/esql-core/.../Query.java	Refactors asBuilder to apply boost based on scorable()
x-pack/plugin/esql-core/.../(various query files)	Changes asBuilder visibility to protected and, where applicable, adjusts scorable implementations
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushFiltersToSource.java	Uses toQueryBuilder() for query creation and simplifies clause selection
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/Term.java	Updates TermQuery construction to include explicit boolean values for case sensitivity and scoring

Copilot reviewed 38 out of 38 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/querydsl/query/TermQuery.java:78

Consider using the '==' operator for comparing the primitive boolean 'scorable' in the equals method to improve clarity and performance.

&& Objects.equals(scorable, other.scorable);

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/Term.java:144

[nitpick] Consider adding inline comments or replacing the boolean literals with named constants to clarify the meaning of the 'false' (caseInsensitive) and 'true' (scorable) arguments.

return new TermQuery(source(), ((FieldAttribute) field()).name(), queryAsObject(), false, true);

carlosdelest · 2025-03-11T11:41:47Z

...k/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/querydsl/query/Query.java

     */
-    public abstract QueryBuilder asBuilder();
+    public final QueryBuilder toQueryBuilder() {


This is the method changed to include boosting information according to the query being scorable. asBuilder() is refactored to a protected method that must be implemented by subclasses.

carlosdelest · 2025-03-11T11:42:25Z

...ugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/querydsl/query/BoolQuery.java

        BoolQueryBuilder boolQuery = boolQuery();
        for (Query query : queries) {
+            QueryBuilder queryBuilder = query.toQueryBuilder();


asBuilder() is replaced by toQueryBuilder() - see below

carlosdelest · 2025-03-11T12:22:28Z

...in/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushFiltersToSource.java

-            Queries.Clause combiningQueryClauseType = queryExec.hasScoring() ? Queries.Clause.MUST : Queries.Clause.FILTER;
-            var query = Queries.combine(combiningQueryClauseType, asList(queryExec.query(), planQuery));
+            QueryBuilder planQuery = queryDSL.toQueryBuilder();
+            var query = Queries.combine(Queries.Clause.MUST, asList(queryExec.query(), planQuery));


We go back to use MUST to combine queries, as the queries themselves have boosting to enable or disable scoring

After thinking about it more, I'm wondering what is the implication here that we always use MUST query here for all pushdown queries. Is a MUST query more expensive than a FILTER query? If this change affects queries that don't have full text functions, and there is potential performance impact, if we can have some performance data before and after this change that will be great.

My reasoning here is that it doesn't matter, as a clause with boosting 0.0 gets rewritten into a ConstantScoreQuery. This means that score mode does not apply, and that it is equivalent to using a filter (which rewrites to ConstantScoreQuery as well).

Also, the first query in the queryExec could be executed as a MUST query independently of the combination - the Queries.combine() method just adds the new query as a FILTER clause, but does nothing with the previous existing query AFAIU.

In any case, it makes sense to keep FILTER when scoring is not needed - I will restore the previous code as I'd prefer to deal with query performance as a separate issue.

I have checked with the team, and filters differ in that they can be cached. So I reverted this change in e641e38.

Thanks for raising this!

carlosdelest · 2025-03-11T12:24:17Z

...ugin/esql/src/main/java/org/elasticsearch/xpack/esql/querydsl/query/SpatialRelatesQuery.java

@@ -94,6 +94,8 @@ public ShapeRelation shapeRelation() {
     */
    public abstract class ShapeQueryBuilder implements QueryBuilder {

+        private float boost = 0.0f;


This change is needed to allow boosting this query, as now all non scoring queries are explicitly set to a boost score of 0

ioanatia

This looks good, I did not get the time to test it yet, just left some questions

ioanatia · 2025-03-11T22:19:32Z

...ugin/esql/src/main/java/org/elasticsearch/xpack/esql/querydsl/query/SpatialRelatesQuery.java

        }

        @Override
        public QueryBuilder boost(float boost) {
-            throw new UnsupportedOperationException("Unimplemented: float");
+            this.boost = boost;


do we ever expect this to be called with a value different from 0? should we add an assertion if we always expect the boost to be 0?

Right now there's no other expectation, but I think it's better to honor the QueryBuilder interface correctly and check that the boosting is done as expected on the tests.

ioanatia · 2025-03-11T22:27:03Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsStatsQueryExec.java

@@ -38,7 +38,7 @@ public enum StatsType {

    public record Stat(String name, StatsType type, QueryBuilder query) {
        public QueryBuilder filter(QueryBuilder sourceQuery) {
-            return query == null ? sourceQuery : Queries.combine(Queries.Clause.FILTER, asList(sourceQuery, query));
+            return query == null ? sourceQuery : Queries.combine(Queries.Clause.FILTER, asList(sourceQuery, query)).boost(0.0f);


is it necessary to put an explicit boost given that we combine the queries into a boolean query filter?

The combine method does not necessarily transforms both queries into a filter. When the first query is a boolean query, it will be used as the boolean query to use as a first step for the combination - which means that the first query is unchanged to a filter, and thus can have scoring.

ioanatia · 2025-03-11T22:46:24Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

@@ -173,7 +173,7 @@ private static class DefaultShardContextForUnmappedField extends DefaultShardCon
    }

    public Function<org.elasticsearch.compute.lucene.ShardContext, Query> querySupplier(QueryBuilder builder) {
-        QueryBuilder qb = builder == null ? QueryBuilders.matchAllQuery() : builder;
+        QueryBuilder qb = builder == null ? QueryBuilders.matchAllQuery().boost(0.0f) : builder;


I wonder if there is a follow up optimisation we could do here.
In the case where no QueryBuilder is available and we are using match_all, when we are adding the score block:

we would no longer need to retrieve the scores from Lucene (we know they should all be zero), so we no longer need to set the score mode to COMPLETE

we could be using ConstantDoubleVector to set the score block, which I guess would be using a constant amount of memory, no matter how many docs we retrieve.

we would no longer need to retrieve the scores from Lucene (we know they should all be zero), so we no longer need to set the score mode to COMPLETE

I believe a similar optimization is done by the MatchAllDocsQuery internally - it uses a ConstantScoreScorer which I don't think uses the COMPLETE score mode.

we could be using ConstantDoubleVector to set the score block, which I guess would be using a constant amount of memory, no matter how many docs we retrieve.

Yes, that's doable. Right now the Lucene operators do not take that into account. We can do it as a follow up.

ioanatia

tested this locally too - with profile enabled - looks good.

fang-xing-esql

Thank you @carlosdelest! I tried a few queries with _score that I can think of, and I didn't find anything questionable or obviously wrong. I added a couple of suggestions.

There is one thing that I can think of, it could be a potential candidate for future improvements, it is related to NOT. For example, if I have a query like below, the NOT resets the _score of each returned row to 0.

from books metadata _score 
| where not (match(name, "Space") or length(name) < 0)

Is my understanding correct, that the match query will still collect scores, and the NOT will discard them? If we can recognize the _score will be reset to 0 before pushing down the match query, we don't have to ask the match query to return scores back.

fang-xing-esql · 2025-03-13T23:27:56Z

...k/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/querydsl/query/Query.java

+    public final QueryBuilder toQueryBuilder() {
+        QueryBuilder builder = asBuilder();
+        if (scorable() == false) {
+            builder.boost(0.0f);


A minor suggestion, we can define a constant variable for 0.0f, call it something like ESQL_NON_FTF_DEFAULT_BOOST, 0.0f is also used in several places in the tests like PhysicalPlanOptimizerTests they can all reference to the same constant variable.

Done in afab198

fang-xing-esql · 2025-03-13T23:29:24Z

...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

-        var range = wrapWithSingleQuery(query, QueryBuilders.rangeQuery("salary").gt(1000), "salary", source);
-        var expected = QueryBuilders.boolQuery().must(range).must(exists);
-        assertThat(expected.toString(), is(esStatsQuery.query().toString()));
+        var range = wrapWithSingleQuery(query, QueryBuilders.rangeQuery("salary").gt(1000).boost(0.0f), "salary", source);


If there is a utility method to add boost(0.0f), it will be handy, it is repeated in quite a few places in the tests. Having it managed in a central place is less error prone.

That makes sense. I added a Query.unscore() method in afab198 that takes into account both of your suggestions here. LMKWYT!

… tests

carlosdelest · 2025-03-14T08:22:59Z

Is my understanding correct, that the match query will still collect scores, and the NOT will discard them? If we can recognize the _score will be reset to 0 before pushing down the match query, we don't have to ask the match query to return scores back.

That is correct. We can do that optimization as a follow up 👍

…ng-full-text-functions # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

fang-xing-esql

Thank you @carlosdelest! I just have one more comment in PushFiltersToSource, the rest looks good to me.

fang-xing-esql · 2025-03-14T17:33:18Z

...in/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushFiltersToSource.java

-            Queries.Clause combiningQueryClauseType = queryExec.hasScoring() ? Queries.Clause.MUST : Queries.Clause.FILTER;
-            var query = Queries.combine(combiningQueryClauseType, asList(queryExec.query(), planQuery));
+            QueryBuilder planQuery = queryDSL.toQueryBuilder();
+            var query = Queries.combine(Queries.Clause.MUST, asList(queryExec.query(), planQuery));


After thinking about it more, I'm wondering what is the implication here that we always use MUST query here for all pushdown queries. Is a MUST query more expensive than a FILTER query? If this change affects queries that don't have full text functions, and there is potential performance impact, if we can have some performance data before and after this change that will be great.

…ng-full-text-functions # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

fang-xing-esql

Thank you @carlosdelest , LGTM!

elasticsearchmachine · 2025-03-18T13:38:55Z

💔 Backport failed

Status	Branch	Result
❌	9.0	Commit could not be cherrypicked due to conflicts
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124540

(cherry picked from commit 7950751) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/QueryBuilderResolver.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryStringTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

carlosdelest · 2025-03-18T14:51:33Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x
✅	9.0

Questions ?

Please refer to the Backport tool documentation

(cherry picked from commit 7950751) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryBuilderResolver.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryStringTests.java

(cherry picked from commit 7950751) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/QueryBuilderResolver.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryStringTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

(cherry picked from commit 7950751) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryBuilderResolver.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryStringTests.java

carlosdelest added 4 commits March 7, 2025 13:07

Score for match all is 0.0 now

338e82b

Add scorable() method to Query class, and take it into account for cr…

af11e3b

…eating boolean queries

Add integration tests

0a2bb77

Use boost() in queries according to whether they're scorable or not

3fdce04

Update docs/changelog/124540.yaml

ddd2efb

elasticsearchmachine and others added 10 commits March 11, 2025 09:33

[CI] Auto commit changes from spotless

82b5432

Fix tests

1a2f653

Fix tests

4f7a308

Fix tests

86e7563

Fix tests

da0701a

Fix tests

30a052e

Spotless

bd7f4f6

Merge remote-tracking branch 'carlosdelest/bugfix/esql-fix-scoring-fu…

65c863c

…ll-text-functions' into bugfix/esql-fix-scoring-full-text-functions # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java

Fix changelog

48b4495

Add all full text functions to scoring tests

7fb6d10

carlosdelest requested a review from Copilot March 11, 2025 11:35

Copilot AI reviewed Mar 11, 2025

View reviewed changes

carlosdelest added 2 commits March 11, 2025 12:40

Minor changes

bf330f0

Fix scores in tests, add capability for bwc

0c9f80a

carlosdelest commented Mar 11, 2025

View reviewed changes

carlosdelest requested review from svilen-mihaylov-elastic and afoucret March 11, 2025 16:44

carlosdelest changed the title ~~ES\QL: Fix scoring for full text functions~~ ES\QL: Fix scoring for non full text functions Mar 11, 2025

ioanatia reviewed Mar 11, 2025

View reviewed changes

This was referenced Mar 12, 2025

ES|QL MATCH roadmap #116261

Open

Scoring support in ES|QL #116599

Open

ioanatia approved these changes Mar 12, 2025

View reviewed changes

fang-xing-esql reviewed Mar 13, 2025

View reviewed changes

Add Query.unscore() to remove scores from a QueryBuilder, use it from…

afab198

… tests

carlosdelest added 2 commits March 14, 2025 09:24

Merge remote-tracking branch 'origin/main' into bugfix/esql-fix-scori…

c7cd78c

…ng-full-text-functions # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

Fix bwc test capability

2a618e1

fang-xing-esql reviewed Mar 14, 2025

View reviewed changes

carlosdelest added 2 commits March 17, 2025 16:19

Merge remote-tracking branch 'origin/main' into bugfix/esql-fix-scori…

66935a0

…ng-full-text-functions # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

Use filter again in PushFiltersToSource when no scoring is needed

e641e38

carlosdelest requested a review from fang-xing-esql March 18, 2025 08:07

fang-xing-esql approved these changes Mar 18, 2025

View reviewed changes

carlosdelest removed the v8.18.1 label Mar 18, 2025

carlosdelest merged commit 7950751 into elastic:main Mar 18, 2025
17 checks passed

elasticsearchmachine added the backport pending label Mar 18, 2025

carlosdelest mentioned this pull request Mar 18, 2025

[8.x] ES\QL: Fix scoring for non full text functions (#124540) #125104

Merged

carlosdelest mentioned this pull request Mar 18, 2025

[9.0] ES\QL: Fix scoring for non full text functions (#124540) #125105

Merged

carlosdelest mentioned this pull request Mar 21, 2025

ES|QL QSTR roadmap #116930

Open

omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025

ES\QL: Fix scoring for non full text functions (elastic#124540)

8766b77

ES\QL: Fix scoring for non full text functions #124540

ES\QL: Fix scoring for non full text functions #124540

Uh oh!

Conversation

carlosdelest commented Mar 11, 2025

Uh oh!

elasticsearchmachine commented Mar 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioanatia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioanatia left a comment

Choose a reason for hiding this comment

Uh oh!

fang-xing-esql left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Mar 14, 2025

Uh oh!

fang-xing-esql left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fang-xing-esql left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 18, 2025

💔 Backport failed

Uh oh!

carlosdelest commented Mar 18, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Uh oh!