Skip to content

Fix default index options when dimensions are unset for legacy indices #130540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 3, 2025

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Jul 3, 2025

In #129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:

[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.

Closes #130085

In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
```
[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]
```

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
@jimczi jimczi requested a review from pmpailis July 3, 2025 09:57
@jimczi jimczi added >test Issues or PRs that are addressing/adding tests :Search Relevance/Vectors Vector search v9.1.0 v9.2.0 labels Jul 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 3, 2025
Copy link
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks @jimczi !

@jimczi jimczi added the auto-backport Automatically create backport pull requests when merged label Jul 3, 2025
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@jimczi jimczi merged commit f91124a into elastic:main Jul 3, 2025
32 checks passed
@jimczi jimczi deleted the default_dense_vector_dims branch July 3, 2025 13:13
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 130540

@jimczi jimczi removed the auto-backport Automatically create backport pull requests when merged label Jul 3, 2025
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jul 3, 2025
elastic#130540)

In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
```
[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]
```

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport pending :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.1.0 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] MixedClusterEsqlSpecIT failling on main->9.0 bwc tests
4 participants