-
Notifications
You must be signed in to change notification settings - Fork 71
"FATAL: invalid frontend message type 137" #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Well, it should be safe to share postgresql-simple sockets between threads without locking... but there could well be bugs. Anyway, keep me informed when you find out more info. |
Another idea, is you could run a packet capture on a postgres connection until this issue occurs, and then go back and examine the capture afterwards. Wireshark has a postgres protocol dissector, which would be quite useful for analyzing the capture. |
Hmm, what version of postgresql-simple are you running? This didn't happen to start with postgresql-simple-0.4.3.0, did it? Because looking at your #postgresql conversation, the resource vanished exception message from |
I'm not sure when it's started, but I've certainly seen the |
I have managed to get a minimal Wireshark capture. Here's the packet dump for just the connection that fails: https://ocharles.org.uk/tmp/invalid-frontend-message.pcapng Packet 3944 is the response "Invalid frontend message type 137" You should be able to import this into Wireshark. I also have a much larger dump of the entire application, if you want to see what the other connections were doing. I'll do some reading on this dump now and see if I can spot anything suspect, but I wanted to get this shared as soon as possible (and also to ensure I don't lose it!) |
It appears that "we" are somehow sending the following bytes:
I have no idea where this is coming from, but 89 is hexadecimal for 137 and that packet occurs immediately before PostgreSQL gets confused and closes the connection - so it would suggest this is the problem. |
Well, I don't know of anything in postgresql-simple that would be violating the libpq abstraction and be sending that sort of data. So either it's a "bug" in libpq, or perhaps you have some C code somewhere that's sending some junk on the wrong file descriptor somehow? (Just spitballing here.) I suppose it's possible to send junk on the file descriptor from your application without dropping down to C, via the Actually, connecting via SSL should be a decent way of diagnosing whether or not this string is being sent from libpq, or by another bit of code that should not be touching that file descriptor in that way. If that string is originating from something other than libpq, then this should result in an SSL error, and is good evidence that the data is not being sent by On the other hand, if you still get the invalid frontend message after turning on SSL, then this is strong evidence that that string is passing through (I do bring this up because, for a variety of reasons, I've thought about implementing the frontend/backend protocol in Haskell off and on since before the first release of postgresql-simple... but I really don't want to put the effort into supporting all the connection and authentication options. So I've wondered about using libpq to connect and authenticate and then basically take over the connection myself. Thanks in part to @rjmac's contributions to #25, I know how to do this now, and have written some (very preliminary and mostly unusable) code to that effect already.) |
It seems that it's probably My guess is the "ping" thread in Thanks for teaching me about the pgsql protocol in wireshark though - I don't think I could have solved it without that! |
Do you think that the same websockets-snap pinger issue could cause
I'm seeing lots of these in the logs of our application which uses postgresql-simple and snaplet-postgresql-simple for PG connection pools. websockets-snap are also used. It looks almost as if the resource-pool inside postgresql-simple provides connections in non-OK status (http://doxygen.postgresql.org/fe-exec_8c.html#af9a6a3980b0142764b437eabc6da1a3c) |
@dzhus, are you aware of this issue? mightybyte/snaplet-postgresql-simple#6 If you suspect something like what ocharles observed, I would try to take a packet capture of database connections until you catch this error in action. |
@lpsmith Thanks for pointing this out, it seems to be at least tangential to the issue we're having. |
We're seeing this almost daily in our server logs, but unfortunately I have no idea to how to really reproduce it. It seems that under higher load, it seems to happen more frequently. #postgresql suggest it would be a concurrency problem, and our application does indeed have quite a lot of concurrency - I don't believe we are sharing connections between threads, but I'm going to verify that claim now.
The text was updated successfully, but these errors were encountered: