Skip to content

Set num_candidates as an option parameter #125001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

weizijun
Copy link
Contributor

Now the num_candidates must be lower then 10000, In some case, use want to get more vectors.
I think the num_candidates can be a parameter like max_window_size, The use can change the default vaule.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 17, 2025
@henningandersen henningandersen added :Search Relevance/Vectors Vector search and removed needs:triage Requires assignment of a team area label labels Mar 17, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Mar 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

* main: (95 commits)
  Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testLifecycleAppliedToFailureStore elastic#124999
  Merge template mappings properly during validation (elastic#124784)
  [Build] Rework internal build plugin plugin to work with Isolated Projects (elastic#123461)
  [Build] Require reason for usesDefaultDistribution (elastic#124707)
  Mute org.elasticsearch.packaging.test.DockerTests test011SecurityEnabledStatus elastic#124990
  Mute org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT testRolloverAction elastic#124987
  Mute org.elasticsearch.packaging.test.BootstrapCheckTests test10Install elastic#124957
  Mute org.elasticsearch.integration.DataStreamLifecycleServiceRuntimeSecurityIT testRolloverLifecycleAndForceMergeAuthorized elastic#124978
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQuery elastic#124977
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672
  Mention zero-window state in networking docs (elastic#124967)
  Remove remoteAddress field from TransportResponse (elastic#120016)
  Include failures in partial response (elastic#124929)
  Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0  (elastic#124732)
  Re-enable analysis stemmer test (elastic#124961)
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocalNoRemotes elastic#124959
  ESQL: Catch parsing exception (elastic#124958)
  ESQL: Improve error message for ( and [ (elastic#124177)
  Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {lookup-join.MvJoinKeyFromRow SYNC} elastic#124951
  Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testErrorRecordingOnRetention elastic#124950
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
@benwtrent
Copy link
Member

Now the num_candidates must be lower then 10000, In some case, use want to get more vectors.

To me, this falls within the "too many clauses" type of error. We make this limit to prevent dramatic cluster performance issues as certain scratch data is eagerly initialized.

If this is indeed something that is desired (e.g. get more than 10k candidates), I wonder if we are better suited for a index wide setting that places the upper limit.

I am not a huge fan of tacking on more parameters to an already complicated set of index options.

@weizijun
Copy link
Contributor Author

I am not a huge fan of tacking on more parameters to an already complicated set of index options.

yeah, I wanted to implement it in index settings, but I found it a little hard, and easy implement in the mapping.
Maybe we can try to implement in the setting.

@benwtrent
Copy link
Member

yeah, I wanted to implement it in index settings, but I found it a little hard, and easy implement in the mapping.
Maybe we can try to implement in the setting.

I can understand the difficulty as this will require moving validation logic and such around. Additionally, it should probably be a "dynamic", which then means you have a value that gets updated on setting updates.

But this is the way it should go, instead of adding more configuration points to the mapped field.

@weizijun
Copy link
Contributor Author

But this is the way it should go, instead of adding more configuration points to the mapped field.

I find that the check can be add in the DfsPhase, it can get the index setting, and check the num_candidates value.
Is this the right way?

@weizijun
Copy link
Contributor Author

Hi, @benwtrent , I add a new PR #125065
It implement the feature in the index settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants