-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Update sparse_vector field mapping to include default setting for token pruning #126739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update sparse_vector field mapping to include default setting for token pruning #126739
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice progress! I've left a few comments.
@@ -17,7 +17,14 @@ PUT my-index | |||
"mappings": { | |||
"properties": { | |||
"text.tokens": { | |||
"type": "sparse_vector" | |||
"type": "sparse_vector", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide not to have multiple examples here?
@@ -242,6 +242,7 @@ static TransportVersion def(int id) { | |||
public static final TransportVersion WRITE_LOAD_INCLUDES_BUFFER_WRITES = def(9_070_00_0); | |||
public static final TransportVersion INTRODUCE_FAILURES_DEFAULT_RETENTION = def(9_071_0_00); | |||
public static final TransportVersion FILE_SETTINGS_HEALTH_INFO = def(9_072_0_00); | |||
public static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS = def(9_073_0_00); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to proactively add an 8.x transport version here as well, for an easier backport to 8.19. (Lessons we're learning now...)
} else if (numberObject instanceof Double doubleValue) { | ||
return ((Double) numberObject).floatValue(); | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this throw if it's not a number instead of doing multiple validation checks on null?
b.endObject(); | ||
} | ||
|
||
protected void mappingWithIndexOptionsPruningConfig(XContentBuilder b) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still not adding prune:true
here?
assertTrue(freq1 < freq2); | ||
} | ||
|
||
public void testWithIndexOptionsPruningConfigOnly() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think this should fail, because without sending in prune:true
to the query we will not prune.
|
||
- match: { status: 400 } | ||
- match: { error.type: "mapper_parsing_exception" } | ||
- match: { error.reason: "Failed to parse mapping: [index_options] field [pruning_config] should not be set if [prune] is false" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shortcut tip: Instead of catch: bad_request
you can shortcut this to something like catch: Failed to parse mapping: \[index_options\] field \[pruning_config\] should not be set if \[prune\] is false
- then you could still keep the status check but the other error checks are already taken care of in the catch.
- match: { error.reason: "Failed to parse mapping: [pruning_config] field [tokens_weight_threshold] field should be a number between 0.0 and 1.0" } | ||
|
||
--- | ||
"Check sparse_vector token pruning index_options in query": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please also add tests that override the default behavior?
For example, explicitly sending in a pruning config on queries where pruning is disabled in the mapping, and also explicitly sending in prune:false where pruning is set in the mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should pretty much be a copy of the multi_cluster file, didn't review this file in this round
@@ -89,6 +89,29 @@ setup: | |||
model_id: text_expansion_model | |||
wait_for: started | |||
|
|||
--- | |||
teardown: | |||
- requires: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have requires in the tear down? Isn't it enough to remove indices and ignore 404s?
- match: { error.reason: "Failed to parse mapping: [pruning_config] field [tokens_weight_threshold] field should be a number between 0.0 and 1.0" } | ||
|
||
--- | ||
"Check sparse_vector token pruning index_options in query": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this test is flakey because of shard counts?
Updates the
SparseVectorFieldMapper
type to include index options for pruning tokens and associated configuration values.Before this update, token pruning for sparse vector types is only available via the query (see parameters for the sparse vector query ).
With this PR, by default, any new indices with a
sparse_vector
field type will by default have token pruning turned on.Example: