Skip to content

Data.Pool makes certain transactions unusable. #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sopvop opened this issue Nov 7, 2012 · 29 comments
Closed

Data.Pool makes certain transactions unusable. #6

sopvop opened this issue Nov 7, 2012 · 29 comments

Comments

@sopvop
Copy link
Contributor

sopvop commented Nov 7, 2012

Using withTransaction in my code I get lots of 'NOTICE: no transaction in progress' from libpq on any error in transaction.

The problem is withResource function from Data.Pool which says in doc If the action throws an exception of any type, the resource is destroyed, and not returned to the pool. And postgresql-simple uses exceptions to report any errors.

So, on any error Connection resource gets killed and withTransaction issues 'rollback' to new connection created by pool.

So, say you do something like:

BEGIN;
-- ...  do something ...
SAVEPOINT part1ok;
-- ... do smothing else which possibly raises integrity violation or whatnot
ROLLBACK TO part1ok;
COMMIT;

This is not going to work, since connection will be closed and transaction aborted on violation in "do something else" by withResource.

Right now I don't use savepoints and rollbacks so killing connection on violation error is ok and that notice to stderr can be turned off.

Any Ideas how to add ability to handle exceptions and still be able to use resource pool safely?

@gregorycollins
Copy link

Catch the exception you're expecting inside the action you pass to withResource?

@sopvop
Copy link
Contributor Author

sopvop commented Nov 7, 2012

Every call to postgres is wrapped in withResource, so your handler will be invoked after pool kills connection.

withPG :: (HasPostgres m)
       => (P.Connection -> IO b) -> m b
withPG f = do
    s <- getPostgresState
    let pool = pgPool s
    liftIO $ withResource pool f

query :: (HasPostgres m, ToRow q, FromRow r)
      => P.Query -> q -> m [r]
query q params = withPG (\c -> P.query c q params)

Currently snaplet provides very clean interface, and It would be nice to add ability of catching exceptions without making it ugly.

@gregorycollins
Copy link

What's generating the exception? Postgres itself?

@sopvop
Copy link
Contributor Author

sopvop commented Nov 7, 2012

postgres returns error and postgresql-simple throws SqlError. In code above it is the P.query function.

@gregorycollins
Copy link

Maybe:

queryWithErrorHandler :: (HasPostgres m, ToRow q, FromRow r) =>
                         P.Query -> q -> (SqlError -> P.Connection -> IO b) -> m [r]

?

@sopvop
Copy link
Contributor Author

sopvop commented Nov 7, 2012

That will work, but won't look nice.

queryWithErrorHandler, queryWithErrorHandler_, executeWithErrorHandler, executeWithErrorHandler_, beginWithErrorHandler, commitWithErrorHandler, rollbackWithErrorHandler...

And change 'withTransaction' to never rollback, only commit.

@gregorycollins
Copy link

The current behaviour (tosses the resource on an exception) is, I think, the right thing to do by default, although I suppose I could be convinced otherwise. I would consider this other use case somewhat "niche".

@ghost
Copy link

ghost commented Nov 7, 2012

I don't think it is really a different use case, the behaviour is still "wrong" even when you can ignore the consequences without breaking anything. Given that the snaplet exports the same API that postgresql-simple does, it is pretty unintuitive that it behaves differently, especially in a way that breaks transactions that work using postgresql-simple directly. Is there a nicer way to handle this than having withPG catch SqlErrors, sneak them past withResource and then re-throw them?

@gregorycollins
Copy link

That might work here. SqlError is probably the only exception here for which receiving it shouldn't necessarily kill the connection --- all of the others I could imagine receiving are things like "ThreadKilled" or "the connection died" which you wouldn't want to recover from.

A question here -- if you call query twice in a row, are you guaranteed to get the same connection object both times? Because that doesn't look clear to me just from reading those two functions (withPG and query). It that case, the discussion is moot: there's no guarantee you can catch the SqlError outside of the "query" function and recover using "ROLLBACK TO" because you're not guaranteed to have the same connection object in your hands, the exception must therefore be caught and dealt with inside "withPG" which would mean we'd have to change the query api.

If that's true, as far as that being different from what postgresql-simple does, that's a consequence of automagical connection pooling :-/

@sopvop
Copy link
Contributor Author

sopvop commented Nov 7, 2012

Looking at the withResource source it seems what same resource is not guaranteed, but with certain settings (stripes =1, max resource = 1) it may always give same object. But may also kill resource if you don't touch it for too long.

As for the different api (possibly exported from different module) I was thinking about something like

type PgAction m a = ReaderT Connection m a

withPostgres :: (HasPostgres m) => PgAction m a -> m a
withPostgres act = do
  pool <- liftM pgPool getPostgresState
  withResource pool $ runReaderT act -- or something like that

query :: (ToRow a, FromRow b) => Query -> a -> PgAction m [b]

That way resource will at least get killed when withPostgres returns, and you are guaranteed to have same connection object.

@ghost
Copy link

ghost commented Nov 7, 2012

I assumed it held the same connection, and was only getting a new one if the one it was holding got destroyed. If the withTransaction* functions can't count on having the same connection, then you just plain can't use transactions at all right? Or else you could do begin in one connection, some queries in another, and then commit in a third.

@mightybyte
Copy link
Owner

Sorry guys, I have been roughly following this discussion. Like Greg, my initial impression is that the current behavior is reasonable. It seems to me that you can solve this problem by using withPG and then passing an action that does its own exception handling and uses query/execute/etc from postgresql-simple directly rather than snaplet-postgresql-simple's wrappers.

Assuming that solves your immediate problem, the next question is could we add convenience API to make this easier. I can see some value in sopvop's two-tiered monad approach. It seems to me that the issue here is the interaction between connection pooling and exception handling. Maybe this does suggest a different API design that takes this into account. I'm quite busy with other stuff right now. Anyone want to take a stab at this?

@ghost
Copy link

ghost commented Nov 16, 2012

The current behavior means that transactions are not reliable at all, race conditions allow data that was meant to be in an aborted transaction to end up being committed. Shouldn't the withTransaction* functions be removed until a solution can be worked out? Having them there leads people to assume they can be used safely, which is not the case.

@mightybyte
Copy link
Owner

Thanks for this pull request. It looks good. I haven't merged it yet because I'm thinking about just replacing the current API with this one. I wanted to think about it a little more and make sure everything is reasonable.

@mightybyte
Copy link
Owner

The issue here is that the HasPostgres type class is not really specific to being a snaplet. It's more of a general interface applicable to postgresql-simple. Ultimately I'd like this interface to be merged into postgresql-simple, so that's a nice incentive to try to get it right.

@mightybyte
Copy link
Owner

What do you guys think about this interface?

https://github.com/mightybyte/postgresql-simple/blob/master/src/Database/PostgreSQL/Simple/Class.hs

I think the new formulation of withPG :: (P.Connection -> m a) -> m a should fix the problem. But even though I think it solves the problem, I still removed the withTransaction functions to eliminate the MonadCatchIO dependency. Those functions should be pretty easy to add after the fact using your preferred exception library.

@ozataman
Copy link
Collaborator

Unfortunately, the rollback, beginMode, commit, and all similar "stateful" functions are still out of place in the current design. The reason is that for any real-world production use, you are still going to use Data.Pool for pooling and have your typeclass instance grab a connection from the pool to execute withPG. If you have a series of withPG calls like:

begin
...bla..blaa
commit

each such function can potentially run with a different underlying connection.

For any such command to work as intended, the entire sequence has to live inside the same withPG block. For example for transactions, as long as all queries of the same transaction occur inside the same withPG block, the current withPG design will guarantee that they work reliably, even if Data.Pool is used for connection pooling in acquiring the connection.

We should remove all these "stateful" functions and instead add a single function (like):

pgTransaction :: (MonadPostgres m, MonadCatchIO m) => (P.Connection -> IO a) -> m a

and a bunch of variants that can make different assumptions about errors, etc. The downside is that the user now has to use IO versions of the original postgresql-simple functions. I'll think a bit to see if there's a nicer option here.

One option may be to define a Transaction monad, or some type-level trick like that to make for a nicer design, but I'll have to think about it a bit to propose something concrete.

In general though, I think the MonadPostgres API pattern is a good one if we can get it working with stateful functions, as I've been using it for quite some time in my cassy Cassandra library and it has proven very comfortable.

@mightybyte
Copy link
Owner

Ahhh, yes. This interaction between the pool generalization and transactions is tricky. I'll have to think on it more.

@sopvop
Copy link
Contributor Author

sopvop commented Mar 18, 2013

There already is PostgresMonad instance for reader.

withTransaction :: (MonadPostgres m) => ReaderT Connection m a -> m a
withTransaction act = withPG (\c -> runReaderT c act)

You can even grab another connection from pool with lift, but maybe it is a bad thing.

This way snaplet should only provide MonadPostgres instance and withTransaction function. All exception handling should be done inside that Reader action. Queries are done with MonadPostgres, so you can reuse your db functions if they have generic MonadPostgres constraint.

Am I missing something here?

@mightybyte
Copy link
Owner

Just had a discussion about this with @ozataman.

One simple solution to this problem would be to decree that MonadPostgres only works with pools and abandon the idea that it should work with single connections. The downside to this is that @lpsmith might not want to include such a solution upstream in postgresql-simple (which is really where it should go) because it would incur another dependency and force a particular choice of pool library on the user.

We're thinking that an easy solution to this problem would be to use a free monad under the hood that has a separate evaluation method for running with a pool vs a single connection. I'm not big on throwing free monads at problems indiscriminately, but this seems like the type of situation that free monads are perfect for.

@lpsmith
Copy link
Contributor

lpsmith commented Aug 6, 2013

Yes, but transactions certainly are not incompatible with connection pooling. E.g. pg_bouncer can (optionally) support transactions, depending on how it's configured.

The key is that you have to have control over when the connection is taken from and returned to the pool, which bos's interface supports. Once you have that, you can rewrite your transaction code appropriately. Honestly, exporting something you don't even come close to implementing correctly is kinda bad form.

My suggestion for the moment is to simply remove the functions that don't work, and exporting withPG which isn't quite ideal from an API point of view, but at least it's usable.

Really, doing this correctly shouldn't be that hard, though last time I took a crack at this issue myself I didn't really figure things out due to extraneous issues. Perhaps I could try to implement a prototype cutting out all the snap abstractions, and see if that would be enough for you to figure out how to do this in the context of snaplet-postgresql-simple.

@mightybyte
Copy link
Owner

@lpsmith

I don't have time to work on this right now, and haven't wanted to release a new major version that I know is sub-optimal. My view is that Snap.Snaplet.PostgresqlSimple contains no snap abstractions. So if you implement an interface that is sufficient, I'll switch to it in a heartbeat.

@lpsmith
Copy link
Contributor

lpsmith commented Aug 6, 2013

Here's a quick proof of concept. I haven't tested it, just compiled it, but it should work fine.

https://github.com/lpsmith/postgresql-simple-implicit

This might not be exactly the interface you want, but we can work on that. And we could probably move some of case statements into the type system with the indexed reader monad and a bit of work. (I don't know if it would matter much, but it would eliminate some work at run time.)

@mietek
Copy link

mietek commented Apr 7, 2014

@lpsmith has helpfully restated the issue in stronger terms:

I really cannot recommend the use of snaplet-postgresql-simple at the present time, because it does not handle transactions correctly at all. (…)

The problem is that snaplet-postgresql-simple adds a connection pool of the type you'd typically want for a public-facing webapp, and (although the pool allows for it) it doesn't expose any way to reserve a connection for the duration of the transaction, and have all the intervening queries issued on that connection.

Thus, when you use a purported transaction, your BEGIN might go out on one connection, your intervening queries might go out on another, and the COMMIT or ROLLBACK might go out on a third. And other queries from concurrent requests might be issued on the connection that's actually in a transaction.

So basically, in my stripped down version of snaplet-postgresql-simple, all I have is a single withPG :: (HasPostgres m) => (Connection -> IO a) -> m a function that grabs a connection and does something with it.

It's not quite as convenient, but it is correct. Unfortunately restricting the handler to IO also means you can't really use this method to stream data from the database to web users, but I didn't need that and that avoids potential exception-handling complications.

I have a sketch of a proper solution (that also attempts to retain the full convenience factor) as postgresql-simple-implicit, it needs to be rewritten as a snaplet and fleshed out a bit, but it should work.

@gregorycollins
Copy link

Unfortunately restricting the handler to IO also means you can't really use this method to stream data from the database to web users

Snap 1.0 won't have this problem.

@mightybyte
Copy link
Owner

@lpsmith and I have worked up what we think is a fix for this issue. I'd like to get some more feedback before I release it. You can see the final state of our fix at https://github.com/mightybyte/snaplet-postgresql-simple/tree/1d32a2823cc8f2b1ea6284b7b6573c46be0ae2a1/src/Snap/Snaplet. Thoughts?

@sopvop
Copy link
Contributor Author

sopvop commented Oct 21, 2014

I would prefer something a bit more explicit (at type level) about connections, but I have no idea how to make it as easy to use as this. 👍

@sopvop
Copy link
Contributor Author

sopvop commented Nov 17, 2014

Should this be closed now when 0.6 is on hackage?

@mightybyte
Copy link
Owner

Yep, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants