-
Notifications
You must be signed in to change notification settings - Fork 71
Resource vanished (failed to fetch file descriptor) problem #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Well, the issue is that in the first case, libpq is receiving some junk on the socket it's connecting to, and terminating the connection, whereas the other side of the connection is terminating the connection in the second case. (Perhaps because it received some "junk"?) Are you sure you are connecting to a postgresql server? You aren't connecting to something that is not a postgres server, are you? I'd quickly examine the connection strings that are reaching If you don't spot something obvious rather immediately on that count, then it's definitely time to perform a packet capture (make sure SSL is disabled, if at all possible) and view it using wireshark. |
Anyway, thank you for your answer, I think it's near to impossible for you to help me further, so I'm closing the issue. |
Oh, ok, so this is always a message from the backend. There was a similar problem in #117 which is a use-after-close file descriptor fault that's never been completely resolved, to my knowledge. This certainly seems like it could be another use-after-close fault in some other haskell package. I'd take a closer look at the messages leading up to this. It also seems as though this capture encompasses multiple connections, so you want to learn how to filter the traffic that's specific to this connection, and also be sure that you are not filtering traffic that doesn't look like postgres traffic. Incidentally, #117 is what convinced me to add use-after-close fault protection to So while you are correct that it's unlikely I can help you too much further, I would appreciate being kept in the loop as you learn more about this problem. And there is actually a decent chance that can have further unexpected positive benefits. |
Sure. It will be interesting to understand what's going on and why it works with |
Yeah, you definitely want to adjust your filtering, because you want to see what the backend received that caused it to terminate the connection. Multithreaded apps are decidedly more vulnerable to use-after-close descriptor faults. |
Another differential diagnosis for use-after-close faults is enabling SSL; if you start getting SSL errors instead of this error, then you know it's just about has to be use-after-close. |
I'm seeing this a lot in Opaleye property tests. It occurs something like once in every ten runs. Each run creates its own independent database and connection, and does roughly 10,000 database queries, so I am seeing this one in every 100,000 queries, approximately. There's nothing obvious that I changed that I suspect triggered this, although it did start happening after I started increasing the complexity of the random queries that I generate. |
I don't use threading (as far as I know -- perhaps QuickCheck does). I doubt the connection is being closed: I don't even close the connection in the test code at all, I just let program exit clean up. |
Unfortunately, it doesn't seem likely that the surprisingly quick fix to tomjaguarpaw/haskell-opaleye#474 is directly relevant here: as the packet capture shows a error message being passed back to |
We have been using the
persistent
library that's built on top ofpostgresql-simple
for some time now without problems. However, recently we tried to switch to a newer resolver and our program stopped working.Experimenting with different resolvers, I have found out that with
nightly-2016-08-12
the program works just fine, while withnightly-2016-08-13
it produces the following error messages:and sometimes
Repeatedly switching between the resolvers I confirmed that it's not an accident (the process takes 3 hours and finishes normally with
nightly-2016-08-12
, withnightly-2016-08-13
it fails after 1 second) and something has definitely changed between these releases and introduced that problem.Now if we look at the diff between these resolvers, there is nothing related to
persistent
orpostgresql-simple
:https://www.stackage.org/diff/nightly-2016-08-12/nightly-2016-08-13
So I'm a bit confused, what could cause these error messages, can you comment on these and maybe point out a direction for further investigation?
Thanks.
The text was updated successfully, but these errors were encountered: