Skip to content

Intermittent SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)") when accessing pgcat through AWS NLB in EKS cluster #400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Cluas opened this issue Apr 11, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@Cluas
Copy link
Contributor

Cluas commented Apr 11, 2023

Description:

I have deployed pgcat in an EKS cluster, and the client is also in the same EKS cluster. When the client pod accesses pgcat through an AWS NLB, the logs intermittently show the following error with a high frequency:

[2023-04-11T07:15:39.839836Z WARN pgcat] Client disconnected with error SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)")
Here's an example of the log entries:

2023-04-11 15:15:39	
[2023-04-11T07:15:39.839836Z WARN  pgcat] Client disconnected with error SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)")Show context
...
2023-04-11 14:55:07
[2023-04-11T06:55:07.206480Z WARN  pgcat] Client disconnected with error SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)")

Environment:

  • pgcat v1
  • pgcat deployed on EKS
  • Client pod deployed on the same EKS cluster
  • pgcat accessed via AWS NLB

I am looking for a solution to resolve these intermittent errors. Any help or guidance would be appreciated.

@levkk
Copy link
Contributor

levkk commented Apr 11, 2023

It's the NLB healthcheck opening up and closing a TCP connection. We should add an option to ignore these and don't log anything (PgBouncer has a similar option).

@levkk levkk added the enhancement New feature or request label Apr 11, 2023
@Patrick0308
Copy link

@levkk This issue seems that NLB resetting the connection because no data through it longer than the NLB idle timeout.
See detail on https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout

For each TCP request that a client makes through a Network Load Balancer, the state of that connection is tracked. If no data is sent through the connection by either the client or target for longer than the idle timeout, the connection is closed. If a client or target sends data after the idle timeout period elapses, it receives a TCP RST packet to indicate that the connection is no longer valid.

We set the idle timeout value for TCP flows to 350 seconds. You can't modify this value. Clients or targets can use TCP keepalive packets to reset the idle timeout. Keepalive packets sent to maintain TLS connections can't contain data or payload.

As recommended, tcp keepalive should be enabled on pgcat listener for a stable long-lived connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants