-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Set num_candidates as an option parameter #125001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
* main: (95 commits) Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testLifecycleAppliedToFailureStore elastic#124999 Merge template mappings properly during validation (elastic#124784) [Build] Rework internal build plugin plugin to work with Isolated Projects (elastic#123461) [Build] Require reason for usesDefaultDistribution (elastic#124707) Mute org.elasticsearch.packaging.test.DockerTests test011SecurityEnabledStatus elastic#124990 Mute org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT testRolloverAction elastic#124987 Mute org.elasticsearch.packaging.test.BootstrapCheckTests test10Install elastic#124957 Mute org.elasticsearch.integration.DataStreamLifecycleServiceRuntimeSecurityIT testRolloverLifecycleAndForceMergeAuthorized elastic#124978 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQuery elastic#124977 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672 Mention zero-window state in networking docs (elastic#124967) Remove remoteAddress field from TransportResponse (elastic#120016) Include failures in partial response (elastic#124929) Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0 (elastic#124732) Re-enable analysis stemmer test (elastic#124961) Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocalNoRemotes elastic#124959 ESQL: Catch parsing exception (elastic#124958) ESQL: Improve error message for ( and [ (elastic#124177) Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {lookup-join.MvJoinKeyFromRow SYNC} elastic#124951 Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testErrorRecordingOnRetention elastic#124950 ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java # server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
To me, this falls within the "too many clauses" type of error. We make this limit to prevent dramatic cluster performance issues as certain scratch data is eagerly initialized. If this is indeed something that is desired (e.g. get more than 10k candidates), I wonder if we are better suited for a index wide setting that places the upper limit. I am not a huge fan of tacking on more parameters to an already complicated set of index options. |
yeah, I wanted to implement it in index settings, but I found it a little hard, and easy implement in the mapping. |
I can understand the difficulty as this will require moving validation logic and such around. Additionally, it should probably be a "dynamic", which then means you have a value that gets updated on setting updates. But this is the way it should go, instead of adding more configuration points to the mapped field. |
I find that the check can be add in the DfsPhase, it can get the index setting, and check the num_candidates value. |
Hi, @benwtrent , I add a new PR #125065 |
Now the num_candidates must be lower then 10000, In some case, use want to get more vectors.
I think the num_candidates can be a parameter like max_window_size, The use can change the default vaule.