Skip to content

Zero downtime password rotation #389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Zero downtime password rotation #389

wants to merge 7 commits into from

Conversation

levkk
Copy link
Contributor

@levkk levkk commented Mar 30, 2023

Problem

Sometimes passwords leak. Also sometimes security teams want infrastructure teams to rotate passwords. With Postgres, that's impossible currently without taking down the application or using third-party tools like Vault. If one was to change the password today, all new connections will be denied, causing a production incident.

Solution

This PR introduces the ability to use multiple passwords (called secrets) to connect to PgCat while one secret is being deprecated and replaced with the other. Each database <--> user <--> secret triplet gets their own connection pool (before, it was only database <--> user, like PgBouncer).

Creating separate pools is a good idea because it allows us to:

  1. Separate clients with old password from clients with new password in admin, so we can track the progression of the password rotation
  2. Forcibly disconnect clients that are using an old password by shutting down their pool.

Implementation caveats

All Postgres authentication mechanisms except plain text obfuscate the secret (password) being used, so without knowing more, we need to test all configured passwords. Additionally, we can't (I think) come up with a unique pool identifier using a hashed password, since the hashing has to be deterministic, which defeats the purpose of password hashing (they are random, e.g. md5 creates a different hash every time because of random salt).

So, for this feature to work, we need to use plain text authentication. Of course that will set off all kinds of alarm flags with most people, since this method is not secure by itself (neither is MD5, but that's out of scope at the moment). So, we only allow this mechanism to work if PgCat is configured to use TLS connections. Using TLS and plain text passwords together is safe and used everywhere across the Internet today. If it's good enough for the banks, it's good enough for us.

Postgres docs on plain auth: https://www.postgresql.org/docs/15/auth-password.html

Changes

pgcat.toml

Additional secrets = [ "one", "two", "three" ] option is added to [users] section. This configures multiple passwords (and pools) for the user. The password option is used to connect to Postgres.

admin db

An additional secret column is added (redacted) to differentiate pool statistics.

pgcat=> show users;
     name      |  pool_mode  |   secret    
---------------+-------------+-------------
 simple_user   | session     | <no secret>
 sharding_user | transaction | ****_one
 sharding_user | transaction | ****_two
 sharding_user | transaction | <no secret>
 other_user    | transaction | <no secret>
(5 rows)
pgcat=> show pools;
  database  |     user      |   secret    |  pool_mode  | cl_idle | cl_active | cl_waiting | cl_cancel_req | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us 
------------+---------------+-------------+-------------+---------+-----------+------------+---------------+-----------+---------+---------+-----------+----------+---------+------------
 sharded_db | sharding_user | ****_two    | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 sharded_db | sharding_user | <no secret> | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 sharded_db | sharding_user | ****_one    | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 sharded_db | other_user    | <no secret> | transaction |       0 |         0 |          0 |             0 |         0 |       6 |       0 |         0 |        0 |       0 |          0
 simple_db  | simple_user   | <no secret> | session     |       0 |         0 |          0 |             0 |         0 |       2 |       0 |         0 |        0 |       0 |          0
(5 rows)

Ops

To use this feature:

  1. Add new secret to secrets for the user, reload the config.
  2. Change the password in all apps and redeploy.
  3. Wait for deploy to finish, remove old secret from secrets, reload the config.
  4. In quick succession: a) ALTER ROLE ... in Postgres to change the password, b) change password in config and reload.

Step 4 can be done with 0 errors if min_size for the pool is set to max_size, opening all connections in advance. This ensures no new connection to Postgres is made during step 4. Existing connections using the old password are not affected by ALTER ROLE.

@levkk levkk closed this Mar 31, 2023
@levkk levkk deleted the levkk-auth-mod branch March 31, 2023 00:43
@levkk
Copy link
Contributor Author

levkk commented Mar 31, 2023

#390

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant