Intermittent SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)") when accessing pgcat through AWS NLB in EKS cluster #400

Cluas · 2023-04-11T07:33:25Z

Description:

I have deployed pgcat in an EKS cluster, and the client is also in the same EKS cluster. When the client pod accesses pgcat through an AWS NLB, the logs intermittently show the following error with a high frequency:

[2023-04-11T07:15:39.839836Z WARN pgcat] Client disconnected with error SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)")
Here's an example of the log entries:

2023-04-11 15:15:39	
[2023-04-11T07:15:39.839836Z WARN  pgcat] Client disconnected with error SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)")Show context
...
2023-04-11 14:55:07
[2023-04-11T06:55:07.206480Z WARN  pgcat] Client disconnected with error SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)")

Environment:

pgcat v1
pgcat deployed on EKS
Client pod deployed on the same EKS cluster
pgcat accessed via AWS NLB

I am looking for a solution to resolve these intermittent errors. Any help or guidance would be appreciated.

The text was updated successfully, but these errors were encountered:

levkk · 2023-04-11T07:50:44Z

It's the NLB healthcheck opening up and closing a TCP connection. We should add an option to ignore these and don't log anything (PgBouncer has a similar option).

Patrick0308 · 2023-04-11T08:08:28Z

@levkk This issue seems that NLB resetting the connection because no data through it longer than the NLB idle timeout.
See detail on https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout

For each TCP request that a client makes through a Network Load Balancer, the state of that connection is tracked. If no data is sent through the connection by either the client or target for longer than the idle timeout, the connection is closed. If a client or target sends data after the idle timeout period elapses, it receives a TCP RST packet to indicate that the connection is no longer valid.

We set the idle timeout value for TCP flows to 350 seconds. You can't modify this value. Clients or targets can use TCP keepalive packets to reset the idle timeout. Keepalive packets sent to maintain TLS connections can't contain data or payload.

As recommended, tcp keepalive should be enabled on pgcat listener for a stable long-lived connection.

levkk added the enhancement New feature or request label Apr 11, 2023

Cluas mentioned this issue Apr 12, 2023

feat: set keepalive for pgcat server itself #402

Merged

Cluas closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)") when accessing pgcat through AWS NLB in EKS cluster #400

Intermittent SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)") when accessing pgcat through AWS NLB in EKS cluster #400

Cluas commented Apr 11, 2023

levkk commented Apr 11, 2023

Patrick0308 commented Apr 11, 2023

Intermittent SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)") when accessing pgcat through AWS NLB in EKS cluster #400

Intermittent SocketError("Error reading message code from socket - Error Kind(UnexpectedEof)") when accessing pgcat through AWS NLB in EKS cluster #400

Comments

Cluas commented Apr 11, 2023

levkk commented Apr 11, 2023

Patrick0308 commented Apr 11, 2023