Skip to content

Fleet search using wait_for_checkpoints can fail if the node executing the search is recovering #130555

Open
@fcofdez

Description

@fcofdez

Currently searches can potentially be executed in an INITIALIZING shard:

private ShardIterator shardRoutings(
IndexShardRoutingTable indexShard,
@Nullable ResponseCollectorService collectorService,
@Nullable Map<String, Long> nodeCounts
) {
if (useAdaptiveReplicaSelection) {
return indexShard.activeInitializingShardsRankedIt(collectorService, nodeCounts);
} else {
return indexShard.activeInitializingShardsRandomIt();
}
}

This means that the shard could be going through recovery and the requested checkpoint through the wait_for_checkpoint might be greater than the current max seq no throwing the following exception:

Cannot wait for unissued seqNo checkpoint [wait_for_checkpoint=1299, max_issued_seqNo=0]

If the shard where the search request gets executed is INITIALIZING we should wait until it moves to STARTED or even consider if we should just avoid executing search requests with wait_for_checkpoints in such shards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions