Skip to content

Zero downtime password rotations #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Zero downtime password rotations #390

wants to merge 12 commits into from

Conversation

levkk
Copy link
Contributor

@levkk levkk commented Mar 30, 2023

Rebase of #389

Problem

Sometimes passwords leak. Also sometimes security teams want infrastructure teams to rotate passwords. With Postgres, that's impossible currently without taking down the application or using third-party tools like Vault. If one was to change the password today, all new connections will be denied, causing a production incident.

Solution

This PR introduces the ability to use multiple passwords (called secrets) to connect to PgCat while one secret is being deprecated and replaced with the other. Each database <--> user <--> secret triplet gets their own connection pool (before, it was only database <--> user, like PgBouncer).

Creating separate pools is a good idea because it allows us to:

  1. Separate clients with old password from clients with new password in admin, so we can track the progression of the password rotation
  2. Forcibly disconnect clients that are using an old password by shutting down their pool.

Implementation caveats

All Postgres authentication mechanisms except plain text obfuscate the secret (password) being used, so without knowing more, we need to test all configured passwords. Additionally, we can't (I think) come up with a unique pool identifier using a hashed password, since the hashing has to be deterministic, which defeats the purpose of password hashing (they are random, e.g. md5 creates a different hash every time because of random salt).

So, for this feature to work, we need to use plain text authentication. Of course that will set off all kinds of alarm flags with most people, since this method is not secure by itself (neither is MD5, but that's out of scope at the moment). So, we only allow this mechanism to work if PgCat is configured to use TLS connections. Using TLS and plain text passwords together is safe and used everywhere across the Internet today. If it's good enough for the banks, it's good enough for us.

Postgres docs on plain auth: https://www.postgresql.org/docs/15/auth-password.html

Changes

pgcat.toml

Additional secrets = [ "one", "two", "three" ] option is added to [users] section. This configures multiple passwords (and pools) for the user. The password option is used to connect to Postgres.

admin db

An additional secret column is added (redacted) to differentiate pool statistics.

pgcat=> show users;
     name      |  pool_mode  |   secret    
---------------+-------------+-------------
 simple_user   | session     | <no secret>
 sharding_user | transaction | ****_one
 sharding_user | transaction | ****_two
 sharding_user | transaction | <no secret>
 other_user    | transaction | <no secret>
(5 rows)
pgcat=> show pools;
  database  |     user      |   secret    |  pool_mode  | cl_idle | cl_active | cl_waiting | cl_cancel_req | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us 
------------+---------------+-------------+-------------+---------+-----------+------------+---------------+-----------+---------+---------+-----------+----------+---------+------------
 sharded_db | sharding_user | ****_two    | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 sharded_db | sharding_user | <no secret> | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 sharded_db | sharding_user | ****_one    | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 sharded_db | other_user    | <no secret> | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 simple_db  | simple_user   | <no secret> | session     |       0 |         0 |          0 |             0 |         0 |       2 |       0 |         0 |        0 |       0 |          0
(5 rows)

Ops

To use this feature:

  1. Add new secret to secrets for the user, reload the config.
  2. Change the password in all apps and redeploy.
  3. Wait for deploy to finish, remove old secret from secrets, reload the config.
  4. In quick succession: a) ALTER ROLE ... in Postgres to change the password, b) change password in config and reload.

Step 4 can be done with 0 errors if min_size for the pool is set to max_size, opening all connections in advance. This ensures no new connection to Postgres is made during step 4. Existing connections using the old password are not affected by ALTER ROLE.

@levkk levkk changed the title Levkk auth mod rebased Zero downtime password rotations Mar 31, 2023
@levkk levkk marked this pull request as ready for review March 31, 2023 00:42
@JelteF
Copy link

JelteF commented May 10, 2023

In postgres zero downtime password rotations can be implemented by using 2 users that are both part of the same group:

  1. Clients use user 1
  2. Clients start using only user 2
  3. password of user 1 is changed
  4. Clients start using only user 1
  5. password of user 2 is changed

It's not very user friendly, but it's quite possible.

@hi019
Copy link
Contributor

hi019 commented Aug 29, 2023

Hey, is this still being worked on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants