Switchover (during a Node drain) fails randomly in synchronous mode

Please, answer some short questions which should help us to understand your problem / question better?

- **Which image of the operator are you using?** `registry.opensource.zalan.do/acid/postgres-operator:v1.8.2`
- **Where do you run it - cloud or metal? Kubernetes or OpenShift?** Bare Metal K8s
- **Are you running Postgres Operator in production?** yes
- **Type of issue?** Bug report

---

**Context**: This problem closely relates to #1686. In our scenario, we have a 3 nodes cluster, with one synchronous replica and another asynchronous.

**Symptom**: The problem arises when we perform a rolling upgrade of our K8s cluster, which upgrades nodes one by one, while they are drained. Draining the node where the PostgreSQL master is running is prevented (as expected) by the disruption budget defined by `postgres-operator`, and waits indefinitely.

We can observe these logs in the operator (and they loop, until we fix the situation by performing a manual failover):

```
time="2022-07-29T08:30:18Z" level=info msg="starting process to migrate master pod \"foo-auth/foo-postgres-0\"" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:18Z" level=debug msg="Waiting for any replica pod to become ready" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:18Z" level=debug msg="Found 2 running replica pods" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:18Z" level=info msg="check failed: pod \"foo-auth/foo-postgres-1\" is already on a live node" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:18Z" level=debug msg="switching over from \"foo-postgres-0\" to \"foo-auth/foo-postgres-1\"" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:18Z" level=debug msg="subscribing to pod \"foo-auth/foo-postgres-1\"" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:18Z" level=debug msg="making POST http request: http://10.233.247.14:8008/failover" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:19Z" level=debug msg="unsubscribing from pod \"foo-auth/foo-postgres-1\" events" cluster-name=foo-auth/foo-postgres pkg=cluster
time="2022-07-29T08:30:19Z" level=error msg="could not failover to pod \"foo-auth/foo-postgres-1\": could not switch over from \"foo-postgres-0\" to \"foo-auth/foo-postgres-1\": patroni returned 'Failover failed'" cluster-name=foo-auth/foo-postgres pkg=cluster
```

We get these corresponding logs in the master pod:

```
2022-07-29 08:30:18,147 INFO: received failover request with leader=foo-postgres-0 candidate=foo-postgres-1 scheduled_at=None
2022-07-29 08:30:18,157 INFO: Got response from foo-postgres-1 http://10.233.84.175:8008/patroni: {"state": "running", "postmaster_start_time": "2022-07-29 08:30:03.401290+00:00", "role": "replica", "server_version": 140004, "xlog": {"received_location": 218124904, "replayed_location": 218124904, "replayed_timestamp": "2022-07-29 08:24:01.646829+00:00", "paused": false}, "timeline": 1, "dcs_last_seen": 1659083415, "database_system_identifier": "7125413201024114757", "patroni": {"version": "2.1.4", "scope": "foo-postgres"}}
2022-07-29 08:30:18,297 INFO: Lock owner: foo-postgres-0; I am foo-postgres-0
2022-07-29 08:30:18,343 WARNING: Failover candidate=foo-postgres-1 does not match with sync_standbys=foo-postgres-2
2022-07-29 08:30:18,343 WARNING: manual failover: members list is empty
2022-07-29 08:30:18,343 WARNING: manual failover: no healthy members found, failover is not possible
2022-07-29 08:30:18,343 INFO: Cleaning up failover key
```

The Patroni view of the cluster is:

```
root@foo-postgres-0:/home/postgres# patronictl topology
+ Cluster: foo-postgres (7125413201024114757) ---------+---------+----+-----------+
| Member               | Host           | Role         | State   | TL | Lag in MB |
+----------------------+----------------+--------------+---------+----+-----------+
| foo-postgres-0       | 10.233.247.14  | Leader       | running |  1 |           |
| + foo-postgres-1     | 10.233.84.175  | Replica      | running |  1 |         0 |
| + foo-postgres-2     | 10.233.132.118 | Sync Standby | running |  1 |         0 |
+----------------------+----------------+--------------+---------+----+-----------+
```

Attempting a failover to the asynchronous replica fails (as expected):

```
root@foo-postgres-0:/home/postgres# patronictl failover foo-postgres --master foo-postgres-0 --candidate foo-postgres-1
When should the switchover take place (e.g. 2022-07-29T09:42 )  [now]: 
Current cluster topology
+ Cluster: foo-postgres (7125413201024114757) -------+---------+----+-----------+
| Member             | Host           | Role         | State   | TL | Lag in MB |
+--------------------+----------------+--------------+---------+----+-----------+
| foo-postgres-0     | 10.233.247.14  | Leader       | running |  1 |           |
| foo-postgres-1     | 10.233.84.175  | Replica      | running |  1 |         0 |
| foo-postgres-2     | 10.233.132.118 | Sync Standby | running |  1 |         0 |
+--------------------+----------------+--------------+---------+----+-----------+
Are you sure you want to switchover cluster foo-postgres, demoting current master foo-postgres-0? [y/N]: y
Switchover failed, details: 412, candidate name does not match with sync_standby
```

Failover to the synchronous replica does work (as expected):

```
root@foo-postgres-0:/home/postgres# patronictl failover foo-postgres --master foo-postgres-0 --candidate foo-postgres-2
When should the switchover take place (e.g. 2022-07-29T09:43 )  [now]: 
Current cluster topology
+ Cluster: foo-postgres (7125413201024114757) -------+---------+----+-----------+
| Member             | Host           | Role         | State   | TL | Lag in MB |
+--------------------+----------------+--------------+---------+----+-----------+
| foo-postgres-0     | 10.233.247.14  | Leader       | running |  1 |           |
| foo-postgres-1     | 10.233.84.175  | Replica      | running |  1 |         0 |
| foo-postgres-2     | 10.233.132.118 | Sync Standby | running |  1 |         0 |
+--------------------+----------------+--------------+---------+----+-----------+
Are you sure you want to switchover cluster foo-postgres, demoting current master foo-postgres-0? [y/N]: y
2022-07-29 08:43:30.57467 Successfully switched over to "foo-postgres-2"
+ Cluster: foo-postgres (7125413201024114757) --+---------+----+-----------+
| Member             | Host           | Role    | State   | TL | Lag in MB |
+--------------------+----------------+---------+---------+----+-----------+
| foo-postgres-0     | 10.233.247.14  | Replica | stopped |    |   unknown |
| foo-postgres-1     | 10.233.84.175  | Replica | running |  1 |         0 |
| foo-postgres-2     | 10.233.132.118 | Leader  | running |  1 |           |
+--------------------+----------------+---------+---------+----+-----------+
```

**Analysis**:

From our understanding, the fix for #1686, introduced in #1700, was only applied to the logic used when performing rolling updates of the StatefulSet.

In our situation, the disruption implied by the drain is not accompanied by an additional "free" node able to receive the PostgreSQL master. `postgres-operator` must then attempt to migrate the master to a replica, using the `Cluster.MigrateMasterPod` method. This method unfortunately still relies on "outdated" logic to select candidate replicas (`Cluster.masterCandidate`), which does not take the Patroni roles into account (namely, `sync_standby`). Ideally, `MigrateMasterPod` would use `getSwitchOverCandidate` (introduced in #1700) instead of `masterCandidate` to benefit from the same logic (unless we missed some corner cases).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switchover (during a Node drain) fails randomly in synchronous mode #1983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Switchover (during a Node drain) fails randomly in synchronous mode #1983

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions