Skip to content

Adding MinScore support to Linear Retriever #126238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 56 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
cb4b8ff
First steps to add minscore in RankDocsQueryBuilder and RankDocsQuery
mridula-s109 Apr 2, 2025
b43694c
Cleaned up the previous edits and also changed rankDocsQuerytests, at…
mridula-s109 Apr 2, 2025
07fc703
Added transport version changes
mridula-s109 Apr 2, 2025
3a79399
cleaned up transport versioning changes
mridula-s109 Apr 2, 2025
5dbf8b6
Modified rankdocsquery to include filtering
mridula-s109 Apr 2, 2025
6a1d1d7
Added minscore to linear retriever
mridula-s109 Apr 2, 2025
7ec22aa
Modified minscore in builder to avoid parser issue
mridula-s109 Apr 3, 2025
d9f26c3
Modified minMaxScoreNormaliser to support the minscore implementation…
mridula-s109 Apr 3, 2025
239375d
Fixed the pagination issue and handled default scores during minscore…
mridula-s109 Apr 3, 2025
ac3f121
Introducing new integration tests for minscore in LinearRetrieverIT
mridula-s109 Apr 3, 2025
715de7a
Improved testing in LinearRetrieverIT
mridula-s109 Apr 3, 2025
8f6c137
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 3, 2025
ef154bf
Update docs/changelog/126238.yaml
mridula-s109 Apr 3, 2025
978d808
Merged remote
mridula-s109 Apr 4, 2025
4dc8eee
PIT issue in LinearRankWindowSize Integration test is fixed
mridula-s109 Apr 3, 2025
d615455
Fixed empty lines
mridula-s109 Apr 4, 2025
dfaad75
Modified the tests to reflect current behaviour
mridula-s109 Apr 4, 2025
00a6fb1
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 4, 2025
bd34261
The LinearRetrieverItTest is verified and made changes:
mridula-s109 Apr 4, 2025
0e5a44c
Trying to fix totalhits propagation
mridula-s109 Apr 4, 2025
74e834b
Modified added yaml file
mridula-s109 Apr 4, 2025
e95aed8
Added markdown
mridula-s109 Apr 4, 2025
641a3c3
Fixed file that was modified during merge
mridula-s109 Apr 4, 2025
b93a5ac
Merge branch 'main' into add_min_score_linear_retriever
mridula-s109 Apr 4, 2025
396b738
Edited the yaml file to match the negative score:"
mridula-s109 Apr 4, 2025
cd061b2
Merge branch 'main' into add_min_score_linear_retriever
mridula-s109 Apr 4, 2025
5ef28a8
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 4, 2025
4d3c5b5
Merge branch 'main' into add_min_score_linear_retriever
mridula-s109 Apr 8, 2025
6561b72
Debugging minscore failure
mridula-s109 Apr 8, 2025
7adae60
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 10, 2025
1063872
Commit modified retriever and linear test files
mridula-s109 Apr 15, 2025
7b1bef0
merged main
mridula-s109 Apr 28, 2025
a7b7907
Merge branch 'main' into add_min_score_linear_retriever
mridula-s109 Apr 28, 2025
c9bdb0c
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 28, 2025
4b1f912
Resolving bugs
mridula-s109 Apr 29, 2025
932c221
tests pass but a bug needs to be fixed
mridula-s109 Apr 29, 2025
0ba7721
[CI] Auto commit changes from spotless
elasticsearchmachine Apr 29, 2025
79a138a
Merged main and cleaned up retrievers.md
mridula-s109 May 1, 2025
d6a4928
cleaned up rankdocsretrieverbuilder
mridula-s109 May 1, 2025
74da88c
cleaned up linearretrieverbuilder
mridula-s109 May 1, 2025
633771b
editing bugs but cleaning it up
mridula-s109 May 1, 2025
bf69705
[CI] Auto commit changes from spotless
elasticsearchmachine May 1, 2025
13617b6
Merge branch 'main' into add_min_score_linear_retriever
mridula-s109 May 1, 2025
ce039a7
making changes to linear retriever
mridula-s109 May 1, 2025
bcc33b9
Fixing Linear Retriever Builder
mridula-s109 May 1, 2025
d58f25e
Fixed failing yaml change log
mridula-s109 May 1, 2025
cd2fe03
Modified linear retriever and the rankdocsquerybuildeR
mridula-s109 May 1, 2025
c18034b
Minscore is made sure it works at the lower level
mridula-s109 May 1, 2025
1f4d80b
Linear retriever modified to work as intended
mridula-s109 May 2, 2025
b49c77d
Introduced new unit tests to verify minscore working
mridula-s109 May 2, 2025
20b2820
Merge branch 'main' into add_min_score_linear_retriever
mridula-s109 May 2, 2025
288222a
[CI] Auto commit changes from spotless
elasticsearchmachine May 2, 2025
67fc2c2
Resolving the checkstyle issue
mridula-s109 May 2, 2025
16bd2d2
parking the progress in a commit
mridula-s109 May 2, 2025
bbe5496
UNit tests modified
mridula-s109 May 2, 2025
4924288
[CI] Auto commit changes from spotless
elasticsearchmachine May 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/126238.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 126238
summary: Adding `MinScore` support to Linear Retriever
area: Search
type: enhancement
issues: []
57 changes: 57 additions & 0 deletions docs/reference/elasticsearch/rest-apis/retrievers.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,12 +293,69 @@ See also [this hybrid search example](docs-content://solutions/search/retrievers

This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter.

`min_score`
: (Optional, float)

Minimum score threshold for documents to be included in the final result set. Documents with scores below this threshold will be filtered out. Must be greater than or equal to 0 if explicitly set. If not set, defaults to minimum float value, meaning no documents are filtered based on score .

`filter`
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))

Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications.

```console
GET /restaurants/_search
{
"retriever": {
"linear": { <1>
"retrievers": [ <2>
{
"retriever": { <3>
"standard": {
"query": {
"multi_match": {
"query": "Italian cuisine",
"fields": [
"description",
"cuisine"
]
}
}
}
},
"weight": 2.0, <4>
"normalizer": "minmax" <5>
},
{
"retriever": { <6>
"knn": {
"field": "vector",
"query_vector": [10, 22, 77],
"k": 10,
"num_candidates": 10
}
},
"weight": 1.0, <7>
"normalizer": "minmax" <8>
}
],
"rank_window_size": 50, <9>
"min_score": 1.5 <10>
}
}
}
```

1. Defines a retriever tree using the `linear` retriever type.
2. The array of retrievers to be combined.
3. A `standard` retriever used for traditional full-text search.
4. Weight applied to the score from the `standard` retriever.
5. Normalization method (`minmax`) applied to the `standard` retriever score.
6. A `knn` retriever used for vector-based similarity search.
7. Weight applied to the score from the `knn` retriever.
8. Normalization method (`minmax`) applied to the `knn` retriever score.
9. The number of top documents considered for scoring in the linear combination.
10. Minimum score threshold for the final result set — documents below this combined score will be excluded.


## RRF Retriever [rrf-retriever]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ static TransportVersion def(int id) {
public static final TransportVersion PINNED_RETRIEVER = def(9_068_0_00);
public static final TransportVersion ML_INFERENCE_SAGEMAKER = def(9_069_0_00);
public static final TransportVersion WRITE_LOAD_INCLUDES_BUFFER_WRITES = def(9_070_00_0);
public static final TransportVersion RANK_DOCS_QUERY_MIN_SCORE = def(9_071_0_00);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,22 @@
public class RankDocsQueryBuilder extends AbstractQueryBuilder<RankDocsQueryBuilder> {

public static final String NAME = "rank_docs_query";
public static final float DEFAULT_MIN_SCORE = Float.MIN_VALUE;

private final RankDoc[] rankDocs;
private final QueryBuilder[] queryBuilders;
private final boolean onlyRankDocs;
private final float minScore;

public RankDocsQueryBuilder(RankDoc[] rankDocs, QueryBuilder[] queryBuilders, boolean onlyRankDocs) {
this(rankDocs, queryBuilders, onlyRankDocs, DEFAULT_MIN_SCORE);
}

public RankDocsQueryBuilder(RankDoc[] rankDocs, QueryBuilder[] queryBuilders, boolean onlyRankDocs, float minScore) {
this.rankDocs = rankDocs;
this.queryBuilders = queryBuilders;
this.onlyRankDocs = onlyRankDocs;
this.minScore = minScore;
}

public RankDocsQueryBuilder(StreamInput in) throws IOException {
Expand All @@ -45,9 +52,13 @@ public RankDocsQueryBuilder(StreamInput in) throws IOException {
if (in.getTransportVersion().onOrAfter(TransportVersions.V_8_16_0)) {
this.queryBuilders = in.readOptionalArray(c -> c.readNamedWriteable(QueryBuilder.class), QueryBuilder[]::new);
this.onlyRankDocs = in.readBoolean();
this.minScore = in.getTransportVersion().onOrAfter(TransportVersions.RANK_DOCS_QUERY_MIN_SCORE)
? in.readFloat()
: DEFAULT_MIN_SCORE;
} else {
this.queryBuilders = null;
this.onlyRankDocs = false;
this.minScore = DEFAULT_MIN_SCORE;
}
}

Expand All @@ -70,7 +81,7 @@ protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) throws
changed |= newQueryBuilders[i] != queryBuilders[i];
}
if (changed) {
RankDocsQueryBuilder clone = new RankDocsQueryBuilder(rankDocs, newQueryBuilders, onlyRankDocs);
RankDocsQueryBuilder clone = new RankDocsQueryBuilder(rankDocs, newQueryBuilders, onlyRankDocs, minScore);
clone.queryName(queryName());
return clone;
}
Expand All @@ -88,6 +99,9 @@ protected void doWriteTo(StreamOutput out) throws IOException {
if (out.getTransportVersion().onOrAfter(TransportVersions.V_8_16_0)) {
out.writeOptionalArray(StreamOutput::writeNamedWriteable, queryBuilders);
out.writeBoolean(onlyRankDocs);
if (out.getTransportVersion().onOrAfter(TransportVersions.RANK_DOCS_QUERY_MIN_SCORE)) {
out.writeFloat(minScore);
}
}
}

Expand Down Expand Up @@ -115,7 +129,7 @@ protected Query doToQuery(SearchExecutionContext context) throws IOException {
queries = new Query[0];
queryNames = Strings.EMPTY_ARRAY;
}
return new RankDocsQuery(reader, shardRankDocs, queries, queryNames, onlyRankDocs);
return new RankDocsQuery(reader, shardRankDocs, queries, queryNames, onlyRankDocs, minScore);
}

@Override
Expand All @@ -135,12 +149,13 @@ protected void doXContent(XContentBuilder builder, Params params) throws IOExcep
protected boolean doEquals(RankDocsQueryBuilder other) {
return Arrays.equals(rankDocs, other.rankDocs)
&& Arrays.equals(queryBuilders, other.queryBuilders)
&& onlyRankDocs == other.onlyRankDocs;
&& onlyRankDocs == other.onlyRankDocs
&& minScore == other.minScore;
}

@Override
protected int doHashCode() {
return Objects.hash(Arrays.hashCode(rankDocs), Arrays.hashCode(queryBuilders), onlyRankDocs);
return Objects.hash(Arrays.hashCode(rankDocs), Arrays.hashCode(queryBuilders), onlyRankDocs, minScore);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,8 @@ public final RetrieverBuilder rewrite(QueryRewriteContext ctx) throws IOExceptio
if (entry.retriever.isCompound() && false == preFilterQueryBuilders.isEmpty()) {
entry.retriever.getPreFilterQueryBuilders().addAll(preFilterQueryBuilders);
}
// Propagate the minScore down to the child retriever
entry.retriever.minScore(this.minScore);
RetrieverBuilder newRetriever = entry.retriever.rewrite(ctx);
if (newRetriever != entry.retriever) {
newRetrievers.add(new RetrieverSource(newRetriever, null));
Expand Down Expand Up @@ -198,6 +200,7 @@ public void onFailure(Exception e) {
results::get
);
rankDocsRetrieverBuilder.retrieverName(retrieverName());
rankDocsRetrieverBuilder.minScore(this.minScore);
return rankDocsRetrieverBuilder;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ public void extractToSearchSourceBuilder(SearchSourceBuilder searchSourceBuilder
// if we have aggregations we need to compute them based on all doc matches, not just the top hits
// similarly, for profile and explain we re-run all parent queries to get all needed information
RankDoc[] rankDocResults = rankDocs.get();
float effectiveMinScore = this.minScore() != null ? this.minScore() : RankDocsQueryBuilder.DEFAULT_MIN_SCORE;

if (hasAggregations(searchSourceBuilder)
|| isExplainRequest(searchSourceBuilder)
|| isProfileRequest(searchSourceBuilder)
Expand All @@ -122,18 +124,29 @@ public void extractToSearchSourceBuilder(SearchSourceBuilder searchSourceBuilder
false
);
}
// Set top-level minScore only when not in onlyRankDocs mode
if (effectiveMinScore != RankDocsQueryBuilder.DEFAULT_MIN_SCORE) {
searchSourceBuilder.minScore(effectiveMinScore);
}
} else {
rankQuery = new RankDocsQueryBuilder(rankDocResults, null, false);
// Pass minScore down to RankDocsQueryBuilder and set onlyRankDocs = true to ensure pre-computed scores are used.
// Filter the results upfront if minScore is set
RankDoc[] finalRankDocs;
if (effectiveMinScore != RankDocsQueryBuilder.DEFAULT_MIN_SCORE) {
finalRankDocs = Arrays.stream(rankDocResults).filter(doc -> doc.score >= effectiveMinScore).toArray(RankDoc[]::new);
} else {
finalRankDocs = rankDocResults;
}
// Now pass the potentially filtered array and the original minScore
rankQuery = new RankDocsQueryBuilder(finalRankDocs, null, true, effectiveMinScore);
// Do NOT set top-level minScore here, filtering is done above, and RankDocsQuery handles score propagation.
}
rankQuery.queryName(retrieverName());
// ignore prefilters of this level, they were already propagated to children
searchSourceBuilder.query(rankQuery);
if (searchSourceBuilder.size() < 0) {
searchSourceBuilder.size(rankWindowSize);
}
if (sourceHasMinScore()) {
searchSourceBuilder.minScore(this.minScore() == null ? Float.MIN_VALUE : this.minScore());
}
if (searchSourceBuilder.size() + searchSourceBuilder.from() > rankDocResults.length) {
searchSourceBuilder.size(Math.max(0, rankDocResults.length - searchSourceBuilder.from()));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import org.apache.lucene.search.Scorer;
import org.apache.lucene.search.ScorerSupplier;
import org.apache.lucene.search.Weight;
import org.elasticsearch.common.lucene.search.function.MinScoreScorer;
import org.elasticsearch.search.rank.RankDoc;

import java.io.IOException;
Expand All @@ -32,6 +33,7 @@
import java.util.Objects;

import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
import static org.elasticsearch.index.query.RankDocsQueryBuilder.DEFAULT_MIN_SCORE;

/**
* A {@code RankDocsQuery} returns the top k documents in the order specified by the global doc IDs.
Expand Down Expand Up @@ -169,7 +171,7 @@ public float score() {
// so here we want to differentiate between this and all the tailQuery matches
// that would also produce a 0 score due to filtering, by setting the score to `Float.MIN_VALUE` instead for
// RankDoc matches.
return Math.max(docs[upTo].score, Float.MIN_VALUE);
return Math.max(docs[upTo].score, DEFAULT_MIN_SCORE);
}

@Override
Expand Down Expand Up @@ -234,6 +236,7 @@ public int hashCode() {
// RankDocs provided. This query does not contribute to scoring, as it is set as filter when creating the weight
private final Query tailQuery;
private final boolean onlyRankDocs;
private final float minScore;

/**
* Creates a {@code RankDocsQuery} based on the provided docs.
Expand All @@ -242,8 +245,16 @@ public int hashCode() {
* @param sources The original queries that were used to compute the top documents
* @param queryNames The names (if present) of the original retrievers
* @param onlyRankDocs Whether the query should only match the provided rank docs
* @param minScore The minimum score threshold for documents to be included in total hits
*/
public RankDocsQuery(IndexReader reader, RankDoc[] rankDocs, Query[] sources, String[] queryNames, boolean onlyRankDocs) {
public RankDocsQuery(
IndexReader reader,
RankDoc[] rankDocs,
Query[] sources,
String[] queryNames,
boolean onlyRankDocs,
float minScore
) {
assert sources.length == queryNames.length;
// clone to avoid side-effect after sorting
this.docs = rankDocs.clone();
Expand All @@ -260,13 +271,15 @@ public RankDocsQuery(IndexReader reader, RankDoc[] rankDocs, Query[] sources, St
this.tailQuery = null;
}
this.onlyRankDocs = onlyRankDocs;
this.minScore = minScore;
}

private RankDocsQuery(RankDoc[] docs, Query topQuery, Query tailQuery, boolean onlyRankDocs) {
private RankDocsQuery(RankDoc[] docs, Query topQuery, Query tailQuery, boolean onlyRankDocs, float minScore) {
this.docs = docs;
this.topQuery = topQuery;
this.tailQuery = tailQuery;
this.onlyRankDocs = onlyRankDocs;
this.minScore = minScore;
}

private static int binarySearch(RankDoc[] docs, int fromIndex, int toIndex, int key) {
Expand Down Expand Up @@ -299,7 +312,11 @@ public RankDoc[] rankDocs() {
@Override
public Query rewrite(IndexSearcher searcher) throws IOException {
if (tailQuery == null) {
return topQuery;
var topRewrite = topQuery.rewrite(searcher);
if (topRewrite != topQuery) {
return new RankDocsQuery(this.docs, topRewrite, null, this.onlyRankDocs, this.minScore);
}
return this;
}
boolean hasChanged = false;
var topRewrite = topQuery.rewrite(searcher);
Expand All @@ -310,22 +327,33 @@ public Query rewrite(IndexSearcher searcher) throws IOException {
if (tailRewrite != tailQuery) {
hasChanged = true;
}
return hasChanged ? new RankDocsQuery(docs, topRewrite, tailRewrite, onlyRankDocs) : this;
return hasChanged ? new RankDocsQuery(this.docs, topRewrite, tailRewrite, this.onlyRankDocs, this.minScore) : this;
}

@Override
public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException {
if (tailQuery == null) {
throw new IllegalArgumentException("[tailQuery] should not be null; maybe missing a rewrite?");
Query combinedQuery;
if (onlyRankDocs) {
combinedQuery = topQuery;
} else {
if (tailQuery == null) {
combinedQuery = topQuery;
} else {
var combined = new BooleanQuery.Builder().add(topQuery, BooleanClause.Occur.SHOULD)
.add(tailQuery, BooleanClause.Occur.FILTER)
.build();
combinedQuery = combined;
}
}
var combined = new BooleanQuery.Builder().add(topQuery, onlyRankDocs ? BooleanClause.Occur.MUST : BooleanClause.Occur.SHOULD)
.add(tailQuery, BooleanClause.Occur.FILTER)
.build();

var topWeight = topQuery.createWeight(searcher, scoreMode, boost);
var combinedWeight = searcher.rewrite(combined).createWeight(searcher, scoreMode, boost);
var combinedWeight = searcher.rewrite(combinedQuery).createWeight(searcher, scoreMode, boost);
return new Weight(this) {
@Override
public int count(LeafReaderContext context) throws IOException {
if (onlyRankDocs) {
return topWeight.count(context);
}
return combinedWeight.count(context);
}

Expand All @@ -346,7 +374,23 @@ public Matches matches(LeafReaderContext context, int doc) throws IOException {

@Override
public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOException {
return combinedWeight.scorerSupplier(context);
ScorerSupplier baseSupplier = onlyRankDocs ? topWeight.scorerSupplier(context) : combinedWeight.scorerSupplier(context);

if (minScore != DEFAULT_MIN_SCORE && baseSupplier != null) {
return new ScorerSupplier() {
@Override
public Scorer get(long leadCost) throws IOException {
Scorer scorer = baseSupplier.get(leadCost);
return scorer == null ? null : new MinScoreScorer(scorer, minScore);
}

@Override
public long cost() {
return baseSupplier.cost();
}
};
}
return baseSupplier;
}
};
}
Expand Down
Loading
Loading