PYTHON-436 Add support for a maximum number of open connections #163

reversefold · 2013-03-17T08:21:42Z

This patch adds a semaphore to keep track of sockets opened and released from the pool in order to be able to enforce a maximum number of open connections. I've also fixed a few places where sockets were being leaked from the pool (never returned) and an extra socket return in a test.

Timeout support for the semaphore is added in this patch to support connection timeouts properly.

Changes have been made to make sure that internal functions can always get a socket from the pool. even if it's at max, so it's possible for more than the configured max to be opened, but these "forced" connections are still tracked and should always be returned.

https://jira.mongodb.org/browse/PYTHON-436

… pool. Passes most tests but causes 19 errors and 7 failures.

…ing it is an error

…t tests pass I'll fix the tests to work with a Queue instead of a set

…stead of a set, now all current tests pass without the hacks

…erted it. All tests still passing

…s properly

behackett · 2013-03-19T19:01:15Z

Hi Justin,

Thanks for working on this. I've milestoned the related ticket for PyMongo 2.6 (2.5 will ship this week). Would you be willing to do more work on this for that release?

reversefold · 2013-03-20T04:24:44Z

I'd be happy to finish this off, but I'd like some guidance on what else is needed for the pull request to be accepted. I'm looking into the additional parameters that PYTHON-436 references but this pull request is complete in that max_open_sockets works as max_pool_size is expected to.

behackett · 2013-03-28T22:38:21Z

Here is a list of things we think should change to finish this off:

We've decided to just change the behavior of max_pool_size to be what people expect, instead of adding a new parameter (I'll update the related ticket). In the process we should bump the default to 100 to match other drivers and add a big warning to the docs about the behavior change. This should be OK since users generally find the current behavior very surprising.
Semaphore should just be backported from CPython 3.3 to thread_util.py. We also have to deal with Jython but I'm happy to do that myself.
The force option is only used by the replica set monitor to get a single socket to use for "ismaster" calls every 30 seconds. I'm not convinced SynchronizedCounter is necessary to deal with this. Without it the mutex acquire isn't required and neither are the new calls to maybe_return_socket (unless I'm missing something).
You're using connectTimeoutMS for the semaphore acquire timeout. There should be a separate config option for this called waitQueueTimeoutMS (see http://docs.mongodb.org/manual/reference/connection-string/#connection-pool-options). The idea is that, if we have more threads than the size of the pool, threads will wait for waitQueueTimeOutMS millis before an exception is raised. In comparison, connectTimeoutMS defines how long we are willing to wait for the host to respond to a connection attempt (it's a socket level setting).
PyMongo supports CPython back to 2.4 so you can't use things like assertIsNotNone in unittests.
There is a change to an assert in mongo_client.py that seems unnecessary.
There is some dead code, unused imports, and TODOs that should be dealt with.

Thanks again for working on this. We really appreciate it. If this is more work than you have time to handle we're happy to merge what you've already sent and make the modifications. You'll get credit in git history for this commit. Let us know.

reversefold · 2013-03-29T04:37:53Z

Ok, I'll just switch the parameter to max_pool_size and get a note in the docblock.
Ah, I didn't think it would just be python code, I'll look into grabbing it. Would you want a copy of the whole thing or just a subclass which changes the methods needed?
The altered and added calls for maybe_return_socket are to make sure that sockets aren't leaked. I believe if those bits aren't patched then the new max_pool_size is going to end up leaking, then blocking even though there are "unused" connections. We need to make absolutely sure we're keeping track of all connections for this to work properly. As for force, I just put it in the places that seemed to make sense. If all that is cared about is ismaster I'll try to move them there. For the counter, I put it in to be absolutely sure that there was no possibility of leaking or miscounting sockets. If we can ensure that there is no possible way that force will be used concurrently we can remove it but I'd be worried of edge cases, especially under heavy use/load.
When I wrote the patch I didn't realize there was the additional parameter and did what made sense to me. Personally I like the idea of having a single timeout as it means you're only ever waiting for that one time period whether or not you're getting a new connection or one from the pool. Since the caller has no way to know which will happen it makes sense for the driver to abstract it away and let the caller only have to worry about a single timeout rather than one of 2. Then again, if we're looking for parity with what else is out there, multiple parameters is slightly less complicated to implement. I worry, though, that it's possible for both timeouts to to take effect in a single call, which would make planning for overall timeout harder. (waitQueueTimeoutMS to wait for the semaphore, then connectTimeoutMS to connect a new socket if the released socket was closed instead of returned for some reason.) Perhaps another venue would be better for this discussion, just let me know where.
I forgot about CPython 2.4. I've fixed the assertIsNotNone and with self.assertRaises. Does anything else stand out to you that isn't 2.4 compatible? (So unfortunate not to have contextmanagers to work with :-( )
I changed the assert as I noticed that there was a duplication of the struct.unpack() call in the test and the message, so I moved that to a variable. Totally not related directly to the pull request so I can revert if needed.
Unused import fixed. The one bit of dead code I see is the TODO, is there something else I'm missing?

Thanks for the feedback!

Conflicts: doc/contributors.rst

for max_size to 100. Some tests now fail and need to be carefully cleaned up or code fixed.

behackett · 2013-03-29T22:07:04Z

Would you want a copy of the whole thing or just a subclass which changes the methods needed?

Let's do a local copy in thread_util.py so we don't have to worry about implementation differences in the various different interpreters / versions PyMongo supports.

If we can ensure that there is no possible way that force will be used concurrently we can remove...

Definitely need to make sure force is only used by the replica set monitor. It's the only valid use case.

Perhaps another venue would be better for this discussion, just let me know where.

Sure, my email address is in the readme. May be better to explain it all through email. The short version is connectTimeoutMS is meant to keep your application from hanging forever if the host is responding very slowly or not at all (DNS failure, network partition, flaky firewall dropping packets, the machine / vm is down, etc.) and is passed to socket.settimeout. waitQueueTimeoutMS is used with waitQueueMultiple to specify how long a thread is allowed to wait on a semaphore for a socket to be available before an exception is raised. For example, let's say you have a maxPoolSize of 10 and a waitQueueMultiple of 5. In that case the first 10 threads would immediately get a socket, the next 40 would be able to wait on a semaphore for waitQueueTimeoutMS millis before an exception is raised. Any thread beyond the first 50 would immediately get an exception.

Unused import fixed. The one bit of dead code I see is the TODO, is there something else I'm missing?

I haven't done a thorough review yet but I don't think so.

Thanks again for working on this. =)

before opening a new connection.

* Copy Condition as well to allow Semaphore to support greenlet/gevent pools without patching thread. * Patch Condition with Python 2.x timeout support. * Rename NoopSemaphore to DummySemaphore (gevent has one named this) * Add greenlet/gevent support to Semaphore, BoundedSemaphore, Condition, and SynchronizedCounter.

… Monitor

required by it in favor of marking sock_info as forced and letting that help us keep the semaphore clean.

emulate threading.Lock exactly.

Python 2.6 and on_thread_died not being called

with Python 2.6, but don't fail the test for Python < 2.7 for now

reversefold · 2013-04-11T23:19:47Z

At this point I think that the pull request is about as good as it's going to get. There is still a request socket leak issue which appears to be a race condition in Python 2.6 when the time between accessing threadlocals and thread death is too small, leading me to think that the issue occurs when there is no context switch between these two events (or possibly just when no other thread accesses the threadlocal between these events). I still believe that this is an existing failure case with pymongo and Python 2.6 and is not related to the changes in this pull request, so merging this pull request should not introduce any more potential failure cases with regards to socket leakage.

ajdavis · 2013-04-15T19:55:57Z

Justin, I think we're ready to merge this. Could you please undo the "install" line in .travis.yml, and remove the "import atexit" statement from pool.py? Then squash this whole series of commits so we can bring it in as a single commit.

Thanks!

reversefold · 2013-04-15T20:23:47Z

Great, I'll do the squash in another branch and open a new pull request then.

As for the .travis.yml change, I needed that in order to get Travis CI working for me. Do you not want this merged because it is irrelevant to this patch or will it break your internal CI? ;-)

ajdavis · 2013-04-15T20:43:47Z

Oh, weird; if the Travis change is useful go ahead and include it. We don't
use Travis much, our focus is on our internal Jenkins cluster.

On Mon, Apr 15, 2013 at 4:23 PM, Justin Patrin [email protected]:

Great, I'll do the squash in another branch and open a new pull request
then.

As for the .travis.yml change, I needed that in order to get Travis CI
working for me. Do you not want this merged because it is irrelevant to
this patch or will it break your internal CI? ;-)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/163#issuecomment-16409172
.

behackett · 2013-04-15T21:38:11Z

pymongo/mongo_client.py

@@ -273,6 +281,9 @@ def __init__(self, host=None, port=None, max_pool_size=10,

        self.__net_timeout = options.get('sockettimeoutms')
        self.__conn_timeout = options.get('connecttimeoutms')
+        self.__wait_queue_timeout = options.get('waitqueuetimeoutms')
+        self.__wait_queue_multiple = options.get('waitqueuemultiple')


You need to add these waitqueue* options to pymongo.common.VALIDATORS. waitqueutimeoutms can use validate_timeout_or_none and waitqueuemultiple can use validate_positive_integer_or_none. You should also add tests for this. As it is now neither of these options can be passed as keyword params or in a URI. Using either of them raises ConfigurationError.

Done. Thanks.

reversefold · 2013-04-15T22:29:50Z

Yeah, Travis was failing due to bad permissions on shared memory, so the
multiprocessing tests always failed. Of course now it's failing because,
for some runs, mongo isn't responding. But that's a Travis issue, I think.

On Mon, Apr 15, 2013 at 1:43 PM, A. Jesse Jiryu Davis <
[email protected]> wrote:

Oh, weird; if the Travis change is useful go ahead and include it. We
don't
use Travis much, our focus is on our internal Jenkins cluster.

On Mon, Apr 15, 2013 at 4:23 PM, Justin Patrin [email protected]:

Great, I'll do the squash in another branch and open a new pull request
then.

As for the .travis.yml change, I needed that in order to get Travis CI
working for me. Do you not want this merged because it is irrelevant to
this patch or will it break your internal CI? ;-)

—
Reply to this email directly or view it on GitHub<
https://github.com/mongodb/mongo-python-driver/pull/163#issuecomment-16409172>

.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/163#issuecomment-16410244
.

Justin Patrin
Developer Extraordinaire

…ctions rather than just the idle connections in the pool. Also add support for waitQueueTimeoutMS and waitQueueMultiple.

reversefold · 2013-04-16T00:37:16Z

Closing this pull request, will open a new one with the squashed PYTHON-436 branch.

Justin Patrin added 12 commits March 15, 2013 15:09

PYTHON-436 Initial support for capping the number of connections in a…

c523245

… pool. Passes most tests but causes 19 errors and 7 failures.

PYTHON-436 Fix double-return of the socket in _send_message

3286702

PYTHON-436 end_request already returns the socket to the pool, re-add…

079f819

…ing it is an error

PYTHON-436 All tests (that don't skip) pass with these hacks. Now tha…

c31e09d

…t tests pass I'll fix the tests to work with a Queue instead of a set

PYTHON-436 Remove the hacks and fix the tests to work with a Queue in…

e728b26

…stead of a set, now all current tests pass without the hacks

PYTHON-436 Remove incorrect super

9b27f83

Add max_open_sockets option to MongoClient

ccef8c1

PYTHON-436 Add tests for max_open_sockets in Pool

45e48fb

Add reversefold (me) to contributors

48562ce

PYTHON-436 It turns out the switch to Queue wasn't needed so I've rev…

f87ee21

…erted it. All tests still passing

PYTHON-436 Fix potential regression which wouldn't catch socket error…

1f7f764

…s properly

Merge branch 'master' of git://github.com/mongodb/mongo-python-driver

e016d13

Justin Patrin added 2 commits March 28, 2013 21:24

PYTHON-436 Fix tests to be Python 2.4 compatible

bab46d6

PYTHON-436 Remove unused import

e732fb4

Justin Patrin added 4 commits March 28, 2013 21:42

Merge branch 'master' of git://github.com/mongodb/mongo-python-driver

5d6843d

Conflicts: doc/contributors.rst

PYTHON-436 Backport Semaphore and BoundedSemaphore from CPython 3.2

f7abb5a

Remove unused import

73b6445

PYTHON-436 Replace max_size with max_open_connections. Change default

4ba2cb6

for max_size to 100. Some tests now fail and need to be carefully cleaned up or code fixed.

Justin Patrin added 7 commits March 29, 2013 15:09

PYTHON-436 When checking the request socket, acquire the semaphore

074d579

before opening a new connection.

PYTHON-436 use assertTrue instead of assert_

88a6d66

PYTHON-436 Add warning:: and fix version #

985ca64

PYTHON-436 only use force=True for pool.get_socket when called by the…

e5eb5a3

… Monitor

PYTHON-436 remove SynchronizedCounter and the uses of self._lock

dec88be

required by it in favor of marking sock_info as forced and letting that help us keep the semaphore clean.

PYTHON-436 use gevent.thread.allocate_lock() instead of RLock to

d2d542f

emulate threading.Lock exactly.

Justin Patrin added 2 commits April 11, 2013 15:04

PYTHON-436 remove the unneeded call to the ident, fixing issues with

e948e76

Python 2.6 and on_thread_died not being called

PYTHON-436 add no-rendezvous tests which expose a race condition issue

7ac4a56

with Python 2.6, but don't fail the test for Python < 2.7 for now

Justin Patrin added 8 commits April 11, 2013 16:44

PYTHON-436 better skip error

35cb327

Fix tests on Travis CI which require mutiprocessing

a5cc7e6

THON-436 remove temporary tests that weren't meant to be committed

b07c18a

make sure that mongodb is installed and running on travis ci

a93f55b

Keep existing conf file

355f10c

forgot sudo for mv

10c9e68

Apparently mongodb is already installed, just make sure it's started

58f1cb4

Never mind, mongo just appears to be broken sometimes on Travis CI

b4926b2

behackett reviewed Apr 15, 2013
View reviewed changes

Justin Patrin added 7 commits April 15, 2013 15:56

Merge branch 'master' of git://github.com/mongodb/mongo-python-driver

4d4e578

PYTHON-436 add wait_queue options to validators

753dd28

PYTHON-436 unused import

59d7d3e

PYTHON-436 Add missing :

a354932

PYTHON-436 queries -> operations

fa0e7a0

PYTHON-436 Change max_pool_size to limit the maximum concurrent conne…

28a6952

…ctions rather than just the idle connections in the pool. Also add support for waitQueueTimeoutMS and waitQueueMultiple.

Merge branch 'PYTHON-436'

a1beeaa

reversefold closed this Apr 16, 2013

reversefold mentioned this pull request Apr 16, 2013

PYTHON-436 Change max_pool_size to limit the maximum concurrent connections #174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PYTHON-436 Add support for a maximum number of open connections #163

PYTHON-436 Add support for a maximum number of open connections #163

Uh oh!

reversefold commented Mar 17, 2013

Uh oh!

behackett commented Mar 19, 2013

Uh oh!

reversefold commented Mar 20, 2013

Uh oh!

behackett commented Mar 28, 2013

Uh oh!

reversefold commented Mar 29, 2013

Uh oh!

behackett commented Mar 29, 2013

Uh oh!

reversefold commented Apr 11, 2013

Uh oh!

ajdavis commented Apr 15, 2013

Uh oh!

reversefold commented Apr 15, 2013

Uh oh!

ajdavis commented Apr 15, 2013

Uh oh!

behackett Apr 15, 2013

Uh oh!

reversefold Apr 15, 2013

Uh oh!

reversefold commented Apr 15, 2013

Uh oh!

reversefold commented Apr 16, 2013

Uh oh!

Uh oh!

PYTHON-436 Add support for a maximum number of open connections #163

PYTHON-436 Add support for a maximum number of open connections #163

Uh oh!

Conversation

reversefold commented Mar 17, 2013

Uh oh!

behackett commented Mar 19, 2013

Uh oh!

reversefold commented Mar 20, 2013

Uh oh!

behackett commented Mar 28, 2013

Uh oh!

reversefold commented Mar 29, 2013

Uh oh!

behackett commented Mar 29, 2013

Uh oh!

reversefold commented Apr 11, 2013

Uh oh!

ajdavis commented Apr 15, 2013

Uh oh!

reversefold commented Apr 15, 2013

Uh oh!

ajdavis commented Apr 15, 2013

Uh oh!

behackett Apr 15, 2013

Choose a reason for hiding this comment

Uh oh!

reversefold Apr 15, 2013

Choose a reason for hiding this comment

Uh oh!

reversefold commented Apr 15, 2013

Uh oh!

reversefold commented Apr 16, 2013

Uh oh!

Uh oh!