Skip to content

Conversation

@abc3
Copy link
Contributor

@abc3 abc3 commented Jul 31, 2024

In this PR, Supavisor has been adapted for Erlang 27 and Elixir 1.17. The metrics backend has been changed from PromEx to Peep, and the most significant change is the introduction of ProxyHandler for processing transaction mode.

The ProxyHandler first authenticates the incoming connection through auth_query. If there is no connection pool, it creates one and also launches a Ranch on a free port, saving the port and host in the corresponding Syn record.

If the pool is on the same node as the client connection, we locally change the owner of the TCP socket for each transaction to the direct connection with the database. After the transaction is completed, the client process returns ownership of the TCP socket to the DbHandler.

If the pool is on a different node, the client process connects to the Ranch of the corresponding pool on the other node, switches to session mode, and forwards everything (works like proxy). The local Ranch operates the same way as when the client connection is on the same node as the connection pool.

With this handler, erldist is used only for synchronizing Syn. Forwarding query data between nodes is done via TCP sockets (Ranch)

@abc3 abc3 changed the title Supavisor V2 [WIP] Supavisor V2 Jul 31, 2024
@abc3
Copy link
Contributor Author

abc3 commented Jul 31, 2024

Hey, @supabase/dashbit, we have prepared improvements for Supavisor and would love to hear your feedback. There are a lot of changes, so feel free to review whatever you like. We are particularly interested in your opinion on the ProxyHandler. The corresponding implementation is here:

@abc3 abc3 requested a review from a team July 31, 2024 19:36
@josevalim
Copy link
Contributor

Exciting work! Do you have any numbers or results into the performance of this new version?

After the transaction is completed, the client process returns ownership of the TCP socket to the DbHandler.

I haven't reviewed the code yet, so I will try to verify this as well, but remember that, if the client crashes for some reason, you most likely can't assert for certain the state of the socket: for example, was something only partially written to it? So you may need to drop the connection altogether or do some sort of cleanup (if possible), like make sure to abort transactions and other things. I am not sure how much state can be carried around.

@josevalim
Copy link
Contributor

Two additional questions for now:

  1. I also remember we discussed improvements to poolboy or your fork to poolboy. I don't remember what they were exactly though. I assume those may already be in place?

  2. Is it correct that this pull request still preserves the old code for talking to the database? Is the plan to remove it in the future or are there issues in unifying everything through the proxy handler?

@josevalim
Copy link
Contributor

Two more:

The ProxyHandler first authenticates the incoming connection through auth_query. If there is no connection pool, it creates one and also launches a Ranch on a free port, saving the port and host in the corresponding Syn record.

  1. Is the maximum number of ports (65k) a potential concern here? If so, could we use a single Ranch port but pass some other information (such as a preamble when connecting via gen_tcp) to choose with pool to use?

If the pool is on a different node, the client process connects to the Ranch of the corresponding pool on the other node, switches to session mode, and forwards everything (works like proxy).

  1. How much do you handoff to the other node? From looking at the code, if it is on a different node, you are doing both auth and ssl handshakes. Does it mean you are doing them twice? On in the current node and then once you have to proxy? I wonder if it is worth doing a "special handshake" between those nodes, so when you join this special Ranch port (which should not be exposed), you can assume auth and ssl have been handled in the first node and you don't have to redo it. Maybe that's already the case, but I wanted to double check.

@abc3
Copy link
Contributor Author

abc3 commented Aug 1, 2024

Exciting work!

😊😊😊

Do you have any numbers or results into the performance of this new version?

We are still measuring performance for different scenarios. When the pooler is on the same machine as the database, Supavisor outperforms or performs the same as PgBouncer. However, it is slightly slower when a Supavisor cluster of two nodes is in the same AWS zone as the database.

I haven't reviewed the code yet, so I will try to verify this as well, but remember that, if the client crashes for some reason, you most likely can't assert for certain the state of the socket: for example, was something only partially written to it? So you may need to drop the connection altogether or do some sort of cleanup (if possible), like make sure to abort transactions and other things. I am not sure how much state can be carried around.

We keep a linked DbHandler which originally created the socket, and if the client goes down, DbHandler will be respawned with a new direct connection.

I also remember we discussed improvements to poolboy or your fork to poolboy. I don't remember what they were exactly though. I assume those may already be in place?

I spent some time finding bottlenecks, and the biggest ones were metrics collection (shout out to @hauleth who fixed that) and the synchronous calling between ClientHandler and DbHandler to avoid overwhelming the mailboxes. So far, we haven't added anything new to Poolboy, except our previous changes for idle timeout.

Is it correct that this pull request still preserves the old code for talking to the database? Is the plan to remove it in the future or are there issues in unifying everything through the proxy handler?

DbHandler is still responsible for establishing direct connections in transaction mode, reconnecting in idle state, and it has been extended with handler_call to change ownership of the TCP socket. However, the connection logic for talking with the database during transactions is located in ProxyDb. In the future, we will remove the old module.

Is the maximum number of ports (65k) a potential concern here? If so, could we use a single Ranch port but pass some other information (such as a preamble when connecting via gen_tcp) to choose with pool to use?

Not at the moment, but it could potentially become a problem. The update is already quite complex, so I don't want to add further complications. We can always add this feature later

How much do you handoff to the other node? From looking at the code, if it is on a different node, you are doing both auth and ssl handshakes. Does it mean you are doing them twice? On in the current node and then once you have to proxy? I wonder if it is worth doing a "special handshake" between those nodes, so when you join this special Ranch port (which should not be exposed), you can assume auth and ssl have been handled in the first node and you don't have to redo it. Maybe that's already the case, but I wanted to double check.

Hmm, it should not use SSL for proxying to another node. The main reason for authentication was to provide an additional security guard. It's not very resource-intensive but can prevent accidental connections if the client has some issues with authentication.

@josevalim
Copy link
Contributor

Fantastic. I have already reviewed the code and I will do another pass later. My only suggestion so far would be to have a custom tcp handshake for the proxy, so you have only a single special port instead of a pool of them and there is no need for additional auth. For security, you can put a 20-40 bytes random token on Syn instead of the Ranch port, and the new handshake uses a key plus this token for connecting. But as you said, I don't think it is a necessary change at this moment.

We are still measuring performance for different scenarios. When the pooler is on the same machine as the database, Supavisor outperforms or performs the same as PgBouncer. However, it is slightly slower when a Supavisor cluster of two nodes is in the same AWS zone as the database.

This is great! As far as I know, no other "bouncer" offers clustering anyway, and slightly slower would certainly be expected.

@abc3
Copy link
Contributor Author

abc3 commented Aug 1, 2024

My only suggestion so far would be to have a custom tcp handshake for the proxy, so you have only a single special port instead of a pool of them and there is no need for additional auth. For security, you can put a 20-40 bytes random token on Syn instead of the Ranch port, and the new handshake uses a key plus this token for connecting

a good call, thanks!

Copy link
Contributor

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another review of the proxy modules and I have dropped only two additional comments. :)

@abc3 abc3 marked this pull request as ready for review August 30, 2024 13:43
@abc3 abc3 requested a review from a team as a code owner August 30, 2024 13:43
@abc3 abc3 changed the title [WIP] Supavisor V2 Supavisor V2 Aug 30, 2024
@hauleth hauleth merged commit 7494cf9 into main Jan 14, 2025
@hauleth hauleth deleted the v2 branch January 14, 2025 19:09
@hauleth
Copy link
Contributor

hauleth commented Jan 14, 2025

Moved v2 to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants