Skip to content

Add index version for match_only_text stored field in binary format #130363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jordan-powers
Copy link
Contributor

Follow-up to #130049 to gate using the binary format for the stored field in match_only_text fields behind an index version (as we should have from the start).

@jordan-powers jordan-powers requested a review from martijnvg June 30, 2025 21:54
@jordan-powers jordan-powers self-assigned this Jun 30, 2025
@jordan-powers jordan-powers added >non-issue auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v9.2.0 v9.1.1 v8.19.1 labels Jun 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Comment on lines +576 to +581
if (storedFieldInBinaryFormat) {
final var bytesRef = new BytesRef(utfBytes.bytes(), utfBytes.offset(), utfBytes.length());
context.doc().add(new StoredField(fieldType().storedFieldNameForSyntheticSource(), bytesRef));
} else {
context.doc().add(new StoredField(fieldType().storedFieldNameForSyntheticSource(), value.string()));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure about this. Maybe we want to continue writing in binary format even for older indices? Since we already updated the mapper to handle mixed string and byteref values, we may as well take advantage of the throughput benefits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this way the logic is simpler? So maybe keep it like this? Indices will rollover at some point and then binary format will be used.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment LGTM otherwise.

@@ -178,6 +178,7 @@ private static Version parseUnchecked(String version) {
public static final IndexVersion UPGRADE_TO_LUCENE_10_2_2 = def(9_030_0_00, Version.LUCENE_10_2_2);
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT = def(9_031_0_00, Version.LUCENE_10_2_2);
public static final IndexVersion DEFAULT_DENSE_VECTOR_TO_BBQ_HNSW = def(9_032_0_00, Version.LUCENE_10_2_2);
public static final IndexVersion MATCH_ONLY_TEXT_STORED_AS_BYTES = def(9_033_0_00, Version.LUCENE_10_2_2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should already add the 8.19 version (including logic) to this PR, which should make back porting easier.

Comment on lines +576 to +581
if (storedFieldInBinaryFormat) {
final var bytesRef = new BytesRef(utfBytes.bytes(), utfBytes.offset(), utfBytes.length());
context.doc().add(new StoredField(fieldType().storedFieldNameForSyntheticSource(), bytesRef));
} else {
context.doc().add(new StoredField(fieldType().storedFieldNameForSyntheticSource(), value.string()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this way the logic is simpler? So maybe keep it like this? Indices will rollover at some point and then binary format will be used.

@jordan-powers jordan-powers merged commit a69c484 into elastic:main Jul 2, 2025
32 checks passed
jordan-powers added a commit to jordan-powers/elasticsearch that referenced this pull request Jul 2, 2025
…lastic#130363)

Follow-up to elastic#130049 to gate using the binary format for the stored field
in match_only_text fields behind an index version.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.1
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 130363

@jordan-powers
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Jul 2, 2025
…ormat (#130363) (#130416)

* Add index version for match_only_text stored field in binary format (#130363)

Follow-up to #130049 to gate using the binary format for the stored field
in match_only_text fields behind an index version.

(cherry picked from commit a69c484)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java

* Fix IndexVersion in testLoadSyntheticSourceFromStringOrBytesRef

* Trigger build for auto-merge
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 2, 2025
…lastic#130363)

Follow-up to elastic#130049 to gate using the binary format for the stored field
in match_only_text fields behind an index version.
elasticsearchmachine pushed a commit that referenced this pull request Jul 2, 2025
…rmat (#130363) (#130414)

* Add index version for match_only_text stored field in binary format (#130363)

Follow-up to #130049 to gate using the binary format for the stored field
in match_only_text fields behind an index version.

* Fix index version check
jordan-powers added a commit that referenced this pull request Jul 2, 2025
Accidentally used the wrong backport index version in #130363.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025
…lastic#130363)

Follow-up to elastic#130049 to gate using the binary format for the stored field
in match_only_text fields behind an index version.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025
Accidentally used the wrong backport index version in elastic#130363.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.1 v9.1.1 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants