Skip to content

Commit 658923d

Browse files
authored
Password rotation in secrets (zalando#1749)
* password rotation in K8s secrets * add db connection to syncSecrets * add user retention * add e2e test * cleanup on username mismatch if rotation was switched off * add unit test for syncSecrets + new updateSecret func
1 parent 95301c1 commit 658923d

24 files changed

+675
-63
lines changed

charts/postgres-operator/crds/operatorconfigurations.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,15 @@ spec:
122122
users:
123123
type: object
124124
properties:
125+
enable_password_rotation:
126+
type: boolean
127+
default: false
128+
password_rotation_interval:
129+
type: integer
130+
default: 90
131+
password_rotation_user_retention:
132+
type: integer
133+
default: 180
125134
replication_username:
126135
type: string
127136
default: standby

charts/postgres-operator/crds/postgresqls.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -551,6 +551,16 @@ spec:
551551
- SUPERUSER
552552
- nosuperuser
553553
- NOSUPERUSER
554+
usersWithPasswordRotation:
555+
type: array
556+
nullable: true
557+
items:
558+
type: string
559+
usersWithInPlacePasswordRotation:
560+
type: array
561+
nullable: true
562+
items:
563+
type: string
554564
volume:
555565
type: object
556566
required:

docs/administrator.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,84 @@ that are aggregated into the K8s [default roles](https://kubernetes.io/docs/refe
293293

294294
For Helm deployments setting `rbac.createAggregateClusterRoles: true` adds these clusterroles to the deployment.
295295

296+
## Password rotation in K8s secrets
297+
298+
The operator regularly updates credentials in the K8s secrets if the
299+
`enable_password_rotation` option is set to `true` in the configuration.
300+
It happens only for `LOGIN` roles with an associated secret (manifest roles,
301+
default users from `preparedDatabases`). Furthermore, there are the following
302+
exceptions:
303+
304+
1. Infrastructure role secrets since rotation should happen by the infrastructure.
305+
2. Team API roles that connect via OAuth2 and JWT token (no secrets to these roles anyway).
306+
3. Database owners since ownership on database objects can not be inherited.
307+
4. System users such as `postgres`, `standby` and `pooler` user.
308+
309+
The interval of days can be set with `password_rotation_interval` (default
310+
`90` = 90 days, minimum 1). On each rotation the user name and password values
311+
are replaced in the K8s secret. They belong to a newly created user named after
312+
the original role plus rotation date in YYMMDD format. All priviliges are
313+
inherited meaning that migration scripts should still grant and revoke rights
314+
against the original role. The timestamp of the next rotation is written to the
315+
secret as well. Note, if the rotation interval is decreased it is reflected in
316+
the secrets only if the next rotation date is more days away than the new
317+
length of the interval.
318+
319+
Pods still using the previous secret values which they keep in memory continue
320+
to connect to the database since the password of the corresponding user is not
321+
replaced. However, a retention policy can be configured for users created by
322+
the password rotation feature with `password_rotation_user_retention`. The
323+
operator will ensure that this period is at least twice as long as the
324+
configured rotation interval, hence the default of `180` = 180 days. When
325+
the creation date of a rotated user is older than the retention period it
326+
might not get removed immediately. Only on the next user rotation it is checked
327+
if users can get removed. Therefore, you might want to configure the retention
328+
to be a multiple of the rotation interval.
329+
330+
### Password rotation for single users
331+
332+
From the configuration, password rotation is enabled for all secrets with the
333+
mentioned exceptions. If you wish to first test rotation for a single user (or
334+
just have it enabled only for a few secrets) you can specify it in the cluster
335+
manifest. The rotation and retention intervals can only be configured globally.
336+
337+
```
338+
spec:
339+
usersWithSecretRotation:
340+
- foo_user
341+
- bar_reader_user
342+
```
343+
344+
### Password replacement without extra users
345+
346+
For some use cases where the secret is only used rarely - think of a `flyway`
347+
user running a migration script on pod start - we do not need to create extra
348+
database users but can replace only the password in the K8s secret. This type
349+
of rotation cannot be configured globally but specified in the cluster
350+
manifest:
351+
352+
```
353+
spec:
354+
usersWithInPlaceSecretRotation:
355+
- flyway
356+
- bar_owner_user
357+
```
358+
359+
This would be the recommended option to enable rotation in secrets of database
360+
owners, but only if they are not used as application users for regular read
361+
and write operations.
362+
363+
### Turning off password rotation
364+
365+
When password rotation is turned off again the operator will check if the
366+
`username` value in the secret matches the original username and replace it
367+
with the latter. A new password is assigned and the `nextRotation` field is
368+
cleared. A final lookup for child (rotation) users to be removed is done but
369+
they will only be dropped if the retention policy allows for it. This is to
370+
avoid sudden connection issues in pods which still use credentials of these
371+
users in memory. You have to remove these child users manually or re-enable
372+
password rotation with smaller interval so they get cleaned up.
373+
296374
## Use taints and tolerations for dedicated PostgreSQL nodes
297375
298376
To ensure Postgres pods are running on nodes without any other application pods,

docs/reference/cluster_manifest.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,22 @@ These parameters are grouped directly under the `spec` key in the manifest.
115115
create the K8s secret in that namespace. The part after the first `.` is
116116
considered to be the user name. Optional.
117117

118+
* **usersWithSecretRotation**
119+
list of users to enable credential rotation in K8s secrets. The rotation
120+
interval can only be configured globally. On each rotation a new user will
121+
be added in the database replacing the `username` value in the secret of
122+
the listed user. Although, rotation users inherit all rights from the
123+
original role, keep in mind that ownership is not transferred. See more
124+
details in the [administrator docs](https://github.com/zalando/postgres-operator/blob/master/docs/administrator.md#password-rotation-in-k8s-secrets).
125+
126+
* **usersWithInPlaceSecretRotation**
127+
list of users to enable in-place password rotation in K8s secrets. The
128+
rotation interval can only be configured globally. On each rotation the
129+
password value will be replaced in the secrets which the operator reflects
130+
in the database, too. List only users here that rarely connect to the
131+
database, like a flyway user running a migration on Pod start. See more
132+
details in the [administrator docs](https://github.com/zalando/postgres-operator/blob/master/docs/administrator.md#password-replacement-without-extra-users).
133+
118134
* **databases**
119135
a map of database names to database owners for the databases that should be
120136
created by the operator. The owner users should already exist on the cluster

docs/reference/operator_parameters.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,28 @@ under the `users` key.
174174
Postgres username used for replication between instances. The default is
175175
`standby`.
176176

177+
* **enable_password_rotation**
178+
For all `LOGIN` roles that are not database owners the operator can rotate
179+
credentials in the corresponding K8s secrets by replacing the username and
180+
password. This means, new users will be added on each rotation inheriting
181+
all priviliges from the original roles. The rotation date (in YYMMDD format)
182+
is appended to the names of the new user. The timestamp of the next rotation
183+
is written to the secret. The default is `false`.
184+
185+
* **password_rotation_interval**
186+
If password rotation is enabled (either from config or cluster manifest) the
187+
interval can be configured with this parameter. The measure is in days which
188+
means daily rotation (`1`) is the most frequent interval possible.
189+
Default is `90`.
190+
191+
* **password_rotation_user_retention**
192+
To avoid an ever growing amount of new users due to password rotation the
193+
operator will remove the created users again after a certain amount of days
194+
has passed. The number can be configured with this parameter. However, the
195+
operator will check that the retention policy is at least twice as long as
196+
the rotation interval and update to this minimum in case it is not.
197+
Default is `180`.
198+
177199
## Major version upgrades
178200

179201
Parameters configuring automatic major version upgrades. In a

e2e/tests/k8s_api.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,9 @@ def get_cluster_leader_pod(self, labels='application=spilo,cluster-name=acid-min
321321
def get_cluster_replica_pod(self, labels='application=spilo,cluster-name=acid-minimal-cluster', namespace='default'):
322322
return self.get_cluster_pod('replica', labels, namespace)
323323

324+
def get_secret_data(self, username, clustername='acid-minimal-cluster', namespace='default'):
325+
return self.api.core_v1.read_namespaced_secret(
326+
"{}.{}.credentials.postgresql.acid.zalan.do".format(username.replace("_","-"), clustername), namespace).data
324327

325328
class K8sBase:
326329
'''

e2e/tests/test_e2e.py

Lines changed: 119 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@
44
import timeout_decorator
55
import os
66
import yaml
7+
import base64
78

8-
from datetime import datetime
9+
from datetime import datetime, date, timedelta
910
from kubernetes import client
1011

1112
from tests.k8s_api import K8s
@@ -579,6 +580,7 @@ def verify_role():
579580
"Parameters": None,
580581
"AdminRole": "",
581582
"Origin": 2,
583+
"IsDbOwner": False,
582584
"Deleted": False
583585
})
584586
return True
@@ -600,7 +602,6 @@ def test_lazy_spilo_upgrade(self):
600602
but lets pods run with the old image until they are recreated for
601603
reasons other than operator's activity. That works because the operator
602604
configures stateful sets to use "onDelete" pod update policy.
603-
604605
The test covers:
605606
1) enabling lazy upgrade in existing operator deployment
606607
2) forcing the normal rolling upgrade by changing the operator
@@ -695,7 +696,6 @@ def test_logical_backup_cron_job(self):
695696
Ensure we can (a) create the cron job at user request for a specific PG cluster
696697
(b) update the cluster-wide image for the logical backup pod
697698
(c) delete the job at user request
698-
699699
Limitations:
700700
(a) Does not run the actual batch job because there is no S3 mock to upload backups to
701701
(b) Assumes 'acid-minimal-cluster' exists as defined in setUp
@@ -1074,6 +1074,122 @@ def test_overwrite_pooler_deployment(self):
10741074
self.eventuallyEqual(lambda: k8s.count_running_pods("connection-pooler=acid-minimal-cluster-pooler"),
10751075
0, "Pooler pods not scaled down")
10761076

1077+
@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
1078+
def test_password_rotation(self):
1079+
'''
1080+
Test password rotation and removal of users due to retention policy
1081+
'''
1082+
k8s = self.k8s
1083+
leader = k8s.get_cluster_leader_pod()
1084+
today = date.today()
1085+
1086+
# enable password rotation for owner of foo database
1087+
pg_patch_inplace_rotation_for_owner = {
1088+
"spec": {
1089+
"usersWithInPlaceSecretRotation": [
1090+
"zalando"
1091+
]
1092+
}
1093+
}
1094+
k8s.api.custom_objects_api.patch_namespaced_custom_object(
1095+
"acid.zalan.do", "v1", "default", "postgresqls", "acid-minimal-cluster", pg_patch_inplace_rotation_for_owner)
1096+
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"}, "Operator does not get in sync")
1097+
1098+
# check if next rotation date was set in secret
1099+
secret_data = k8s.get_secret_data("zalando")
1100+
next_rotation_timestamp = datetime.fromisoformat(str(base64.b64decode(secret_data["nextRotation"]), 'utf-8'))
1101+
today90days = today+timedelta(days=90)
1102+
self.assertEqual(today90days, next_rotation_timestamp.date(),
1103+
"Unexpected rotation date in secret of zalando user: expected {}, got {}".format(today90days, next_rotation_timestamp.date()))
1104+
1105+
# create fake rotation users that should be removed by operator
1106+
# but have one that would still fit into the retention period
1107+
create_fake_rotation_user = """
1108+
CREATE ROLE foo_user201031 IN ROLE foo_user;
1109+
CREATE ROLE foo_user211031 IN ROLE foo_user;
1110+
CREATE ROLE foo_user"""+(today-timedelta(days=40)).strftime("%y%m%d")+""" IN ROLE foo_user;
1111+
"""
1112+
self.query_database(leader.metadata.name, "postgres", create_fake_rotation_user)
1113+
1114+
# patch foo_user secret with outdated rotation date
1115+
fake_rotation_date = today.isoformat() + ' 00:00:00'
1116+
fake_rotation_date_encoded = base64.b64encode(fake_rotation_date.encode('utf-8'))
1117+
secret_fake_rotation = {
1118+
"data": {
1119+
"nextRotation": str(fake_rotation_date_encoded, 'utf-8'),
1120+
},
1121+
}
1122+
k8s.api.core_v1.patch_namespaced_secret(
1123+
name="foo-user.acid-minimal-cluster.credentials.postgresql.acid.zalan.do",
1124+
namespace="default",
1125+
body=secret_fake_rotation)
1126+
1127+
# enable password rotation for all other users (foo_user)
1128+
# this will force a sync of secrets for further assertions
1129+
enable_password_rotation = {
1130+
"data": {
1131+
"enable_password_rotation": "true",
1132+
"password_rotation_interval": "30",
1133+
"password_rotation_user_retention": "30", # should be set to 60
1134+
},
1135+
}
1136+
k8s.update_config(enable_password_rotation)
1137+
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"},
1138+
"Operator does not get in sync")
1139+
1140+
# check if next rotation date and username have been replaced
1141+
secret_data = k8s.get_secret_data("foo_user")
1142+
secret_username = str(base64.b64decode(secret_data["username"]), 'utf-8')
1143+
next_rotation_timestamp = datetime.fromisoformat(str(base64.b64decode(secret_data["nextRotation"]), 'utf-8'))
1144+
rotation_user = "foo_user"+today.strftime("%y%m%d")
1145+
today30days = today+timedelta(days=30)
1146+
1147+
self.assertEqual(rotation_user, secret_username,
1148+
"Unexpected username in secret of foo_user: expected {}, got {}".format(rotation_user, secret_username))
1149+
self.assertEqual(today30days, next_rotation_timestamp.date(),
1150+
"Unexpected rotation date in secret of foo_user: expected {}, got {}".format(today30days, next_rotation_timestamp.date()))
1151+
1152+
# check if oldest fake rotation users were deleted
1153+
# there should only be foo_user, foo_user+today and foo_user+today-40days
1154+
user_query = """
1155+
SELECT rolname
1156+
FROM pg_catalog.pg_roles
1157+
WHERE rolname LIKE 'foo_user%';
1158+
"""
1159+
self.eventuallyEqual(lambda: len(self.query_database(leader.metadata.name, "postgres", user_query)), 3,
1160+
"Found incorrect number of rotation users", 10, 5)
1161+
1162+
# disable password rotation for all other users (foo_user)
1163+
# and pick smaller intervals to see if the third fake rotation user is dropped
1164+
enable_password_rotation = {
1165+
"data": {
1166+
"enable_password_rotation": "false",
1167+
"password_rotation_interval": "15",
1168+
"password_rotation_user_retention": "30", # 2 * rotation interval
1169+
},
1170+
}
1171+
k8s.update_config(enable_password_rotation)
1172+
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"},
1173+
"Operator does not get in sync")
1174+
1175+
# check if username in foo_user secret is reset
1176+
secret_data = k8s.get_secret_data("foo_user")
1177+
secret_username = str(base64.b64decode(secret_data["username"]), 'utf-8')
1178+
next_rotation_timestamp = str(base64.b64decode(secret_data["nextRotation"]), 'utf-8')
1179+
self.assertEqual("foo_user", secret_username,
1180+
"Unexpected username in secret of foo_user: expected {}, got {}".format("foo_user", secret_username))
1181+
self.assertEqual('', next_rotation_timestamp,
1182+
"Unexpected rotation date in secret of foo_user: expected empty string, got {}".format(next_rotation_timestamp))
1183+
1184+
# check roles again, there should only be foo_user and foo_user+today
1185+
user_query = """
1186+
SELECT rolname
1187+
FROM pg_catalog.pg_roles
1188+
WHERE rolname LIKE 'foo_user%';
1189+
"""
1190+
self.eventuallyEqual(lambda: len(self.query_database(leader.metadata.name, "postgres", user_query)), 2,
1191+
"Found incorrect number of rotation users", 10, 5)
1192+
10771193
@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
10781194
def test_patroni_config_update(self):
10791195
'''

manifests/complete-postgres-manifest.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ spec:
1616
zalando:
1717
- superuser
1818
- createdb
19+
foo_user: []
20+
# usersWithSecretRotation: "foo_user"
21+
# usersWithInPlaceSecretRotation: "flyway,bar_owner_user"
1922
enableMasterLoadBalancer: false
2023
enableReplicaLoadBalancer: false
2124
enableConnectionPooler: false # enable/disable connection pooler deployment

manifests/configmap.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ data:
4444
# enable_init_containers: "true"
4545
# enable_lazy_spilo_upgrade: "false"
4646
enable_master_load_balancer: "false"
47+
enable_password_rotation: "false"
4748
enable_pgversion_env_var: "true"
4849
# enable_pod_antiaffinity: "false"
4950
# enable_pod_disruption_budget: "true"
@@ -92,6 +93,8 @@ data:
9293
# pam_configuration: |
9394
# https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees
9495
# pam_role_name: zalandos
96+
# password_rotation_interval: "90"
97+
# password_rotation_user_retention: "180"
9598
pdb_name_format: "postgres-{cluster}-pdb"
9699
# pod_antiaffinity_topology_key: "kubernetes.io/hostname"
97100
pod_deletion_wait_timeout: 10m

manifests/operatorconfiguration.crd.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,15 @@ spec:
120120
users:
121121
type: object
122122
properties:
123+
enable_password_rotation:
124+
type: boolean
125+
default: false
126+
password_rotation_interval:
127+
type: integer
128+
default: 90
129+
password_rotation_user_retention:
130+
type: integer
131+
default: 180
123132
replication_username:
124133
type: string
125134
default: standby

0 commit comments

Comments
 (0)