Redis Cluster Node Enabled - Failed to read from master when replica is being replaced #3231
Labels
status: feedback-reminder
We've sent a reminder that we need additional information before we can continue
status: waiting-for-feedback
We need additional information before we can continue
Bug Report
We use lettuce client to connect to aws elastic cache (redis) with cluster mode enabled.
We have 5 shards (with 3 nodes each, onf of the 3 is master), replica node in shard 1 had degaraded performance due to which AWS triggered replacement for the same which took 7 mins, during this window, we were not able to read from primary thought master node was not impacted.
Current Behavior
Read/Write from master node fails while one of the replica's in shard is being reaplced.
// your stack trace here;
Java Application
Input Code
// your code here;
Expected no disruption in the read /write with master node
Environment
Possible Solution
Additional context
07:51 AM PST - redis-0001-003 Primary became unhealthy - we had some issue reading from it - this is expected from lettuce
07:55 AM PST - continued to provide Degraded experience from master node redis-0001-003
07:56 AM PST - Failover of master node performed by AWS redis-0001-002 - new master(No impact during time)
07:56 AM PST to 08:31 AM PST - redis-0001-003 was not available in the shard, however other 2 nodes in shard were active
08:31 AM PST - AWS triggered replacement for redis-0001-003 (replica) since it was still in degraded state.During this window, application was not able to read or write from master node
08:38 AM PST - Complete Application Recovery redis-0001-002 continued to be primary, we were able to read / write from the client
Also during this failure 8:31 to 8:38 we see logs trying to reconnect to redis-0001-003 from connectionWatchDog
Need to understand why read from master node failed while replica being replaced.
The text was updated successfully, but these errors were encountered: