Skip to content

PYTHON-436 Add support for a maximum number of open connections #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 74 commits into from

Conversation

reversefold
Copy link
Contributor

This patch adds a semaphore to keep track of sockets opened and released from the pool in order to be able to enforce a maximum number of open connections. I've also fixed a few places where sockets were being leaked from the pool (never returned) and an extra socket return in a test.

Timeout support for the semaphore is added in this patch to support connection timeouts properly.

Changes have been made to make sure that internal functions can always get a socket from the pool. even if it's at max, so it's possible for more than the configured max to be opened, but these "forced" connections are still tracked and should always be returned.

https://jira.mongodb.org/browse/PYTHON-436

@behackett
Copy link
Member

Hi Justin,

Thanks for working on this. I've milestoned the related ticket for PyMongo 2.6 (2.5 will ship this week). Would you be willing to do more work on this for that release?

@reversefold
Copy link
Contributor Author

I'd be happy to finish this off, but I'd like some guidance on what else is needed for the pull request to be accepted. I'm looking into the additional parameters that PYTHON-436 references but this pull request is complete in that max_open_sockets works as max_pool_size is expected to.

@behackett
Copy link
Member

Here is a list of things we think should change to finish this off:

  • We've decided to just change the behavior of max_pool_size to be what people expect, instead of adding a new parameter (I'll update the related ticket). In the process we should bump the default to 100 to match other drivers and add a big warning to the docs about the behavior change. This should be OK since users generally find the current behavior very surprising.
  • Semaphore should just be backported from CPython 3.3 to thread_util.py. We also have to deal with Jython but I'm happy to do that myself.
  • The force option is only used by the replica set monitor to get a single socket to use for "ismaster" calls every 30 seconds. I'm not convinced SynchronizedCounter is necessary to deal with this. Without it the mutex acquire isn't required and neither are the new calls to maybe_return_socket (unless I'm missing something).
  • You're using connectTimeoutMS for the semaphore acquire timeout. There should be a separate config option for this called waitQueueTimeoutMS (see http://docs.mongodb.org/manual/reference/connection-string/#connection-pool-options). The idea is that, if we have more threads than the size of the pool, threads will wait for waitQueueTimeOutMS millis before an exception is raised. In comparison, connectTimeoutMS defines how long we are willing to wait for the host to respond to a connection attempt (it's a socket level setting).
  • PyMongo supports CPython back to 2.4 so you can't use things like assertIsNotNone in unittests.
  • There is a change to an assert in mongo_client.py that seems unnecessary.
  • There is some dead code, unused imports, and TODOs that should be dealt with.

Thanks again for working on this. We really appreciate it. If this is more work than you have time to handle we're happy to merge what you've already sent and make the modifications. You'll get credit in git history for this commit. Let us know.

@reversefold
Copy link
Contributor Author

  • Ok, I'll just switch the parameter to max_pool_size and get a note in the docblock.
  • Ah, I didn't think it would just be python code, I'll look into grabbing it. Would you want a copy of the whole thing or just a subclass which changes the methods needed?
  • The altered and added calls for maybe_return_socket are to make sure that sockets aren't leaked. I believe if those bits aren't patched then the new max_pool_size is going to end up leaking, then blocking even though there are "unused" connections. We need to make absolutely sure we're keeping track of all connections for this to work properly. As for force, I just put it in the places that seemed to make sense. If all that is cared about is ismaster I'll try to move them there. For the counter, I put it in to be absolutely sure that there was no possibility of leaking or miscounting sockets. If we can ensure that there is no possible way that force will be used concurrently we can remove it but I'd be worried of edge cases, especially under heavy use/load.
  • When I wrote the patch I didn't realize there was the additional parameter and did what made sense to me. Personally I like the idea of having a single timeout as it means you're only ever waiting for that one time period whether or not you're getting a new connection or one from the pool. Since the caller has no way to know which will happen it makes sense for the driver to abstract it away and let the caller only have to worry about a single timeout rather than one of 2. Then again, if we're looking for parity with what else is out there, multiple parameters is slightly less complicated to implement. I worry, though, that it's possible for both timeouts to to take effect in a single call, which would make planning for overall timeout harder. (waitQueueTimeoutMS to wait for the semaphore, then connectTimeoutMS to connect a new socket if the released socket was closed instead of returned for some reason.) Perhaps another venue would be better for this discussion, just let me know where.
  • I forgot about CPython 2.4. I've fixed the assertIsNotNone and with self.assertRaises. Does anything else stand out to you that isn't 2.4 compatible? (So unfortunate not to have contextmanagers to work with :-( )
  • I changed the assert as I noticed that there was a duplication of the struct.unpack() call in the test and the message, so I moved that to a variable. Totally not related directly to the pull request so I can revert if needed.
  • Unused import fixed. The one bit of dead code I see is the TODO, is there something else I'm missing?

Thanks for the feedback!

@behackett
Copy link
Member

Would you want a copy of the whole thing or just a subclass which changes the methods needed?

Let's do a local copy in thread_util.py so we don't have to worry about implementation differences in the various different interpreters / versions PyMongo supports.

If we can ensure that there is no possible way that force will be used concurrently we can remove...

Definitely need to make sure force is only used by the replica set monitor. It's the only valid use case.

Perhaps another venue would be better for this discussion, just let me know where.

Sure, my email address is in the readme. May be better to explain it all through email. The short version is connectTimeoutMS is meant to keep your application from hanging forever if the host is responding very slowly or not at all (DNS failure, network partition, flaky firewall dropping packets, the machine / vm is down, etc.) and is passed to socket.settimeout. waitQueueTimeoutMS is used with waitQueueMultiple to specify how long a thread is allowed to wait on a semaphore for a socket to be available before an exception is raised. For example, let's say you have a maxPoolSize of 10 and a waitQueueMultiple of 5. In that case the first 10 threads would immediately get a socket, the next 40 would be able to wait on a semaphore for waitQueueTimeoutMS millis before an exception is raised. Any thread beyond the first 50 would immediately get an exception.

Unused import fixed. The one bit of dead code I see is the TODO, is there something else I'm missing?

I haven't done a thorough review yet but I don't think so.

Thanks again for working on this. =)

Justin Patrin added 7 commits March 29, 2013 15:09
* Copy Condition as well to allow Semaphore to support
  greenlet/gevent pools without patching thread.
* Patch Condition with Python 2.x timeout support.
* Rename NoopSemaphore to DummySemaphore (gevent has one named this)
* Add greenlet/gevent support to Semaphore, BoundedSemaphore,
  Condition, and SynchronizedCounter.
required by it in favor of marking sock_info as forced and letting
that help us keep the semaphore clean.
Justin Patrin added 2 commits April 11, 2013 15:04
Python 2.6 and on_thread_died not being called
with Python 2.6, but don't fail the test for Python < 2.7 for now
@reversefold
Copy link
Contributor Author

At this point I think that the pull request is about as good as it's going to get. There is still a request socket leak issue which appears to be a race condition in Python 2.6 when the time between accessing threadlocals and thread death is too small, leading me to think that the issue occurs when there is no context switch between these two events (or possibly just when no other thread accesses the threadlocal between these events). I still believe that this is an existing failure case with pymongo and Python 2.6 and is not related to the changes in this pull request, so merging this pull request should not introduce any more potential failure cases with regards to socket leakage.

@ajdavis
Copy link
Member

ajdavis commented Apr 15, 2013

Justin, I think we're ready to merge this. Could you please undo the "install" line in .travis.yml, and remove the "import atexit" statement from pool.py? Then squash this whole series of commits so we can bring it in as a single commit.

Thanks!

@reversefold
Copy link
Contributor Author

Great, I'll do the squash in another branch and open a new pull request then.

As for the .travis.yml change, I needed that in order to get Travis CI working for me. Do you not want this merged because it is irrelevant to this patch or will it break your internal CI? ;-)

@ajdavis
Copy link
Member

ajdavis commented Apr 15, 2013

Oh, weird; if the Travis change is useful go ahead and include it. We don't
use Travis much, our focus is on our internal Jenkins cluster.

On Mon, Apr 15, 2013 at 4:23 PM, Justin Patrin [email protected]:

Great, I'll do the squash in another branch and open a new pull request
then.

As for the .travis.yml change, I needed that in order to get Travis CI
working for me. Do you not want this merged because it is irrelevant to
this patch or will it break your internal CI? ;-)


Reply to this email directly or view it on GitHubhttps://github.com//pull/163#issuecomment-16409172
.

@@ -273,6 +281,9 @@ def __init__(self, host=None, port=None, max_pool_size=10,

self.__net_timeout = options.get('sockettimeoutms')
self.__conn_timeout = options.get('connecttimeoutms')
self.__wait_queue_timeout = options.get('waitqueuetimeoutms')
self.__wait_queue_multiple = options.get('waitqueuemultiple')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add these waitqueue* options to pymongo.common.VALIDATORS. waitqueutimeoutms can use validate_timeout_or_none and waitqueuemultiple can use validate_positive_integer_or_none. You should also add tests for this. As it is now neither of these options can be passed as keyword params or in a URI. Using either of them raises ConfigurationError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@reversefold
Copy link
Contributor Author

Yeah, Travis was failing due to bad permissions on shared memory, so the
multiprocessing tests always failed. Of course now it's failing because,
for some runs, mongo isn't responding. But that's a Travis issue, I think.

On Mon, Apr 15, 2013 at 1:43 PM, A. Jesse Jiryu Davis <
[email protected]> wrote:

Oh, weird; if the Travis change is useful go ahead and include it. We
don't
use Travis much, our focus is on our internal Jenkins cluster.

On Mon, Apr 15, 2013 at 4:23 PM, Justin Patrin [email protected]:

Great, I'll do the squash in another branch and open a new pull request
then.

As for the .travis.yml change, I needed that in order to get Travis CI
working for me. Do you not want this merged because it is irrelevant to
this patch or will it break your internal CI? ;-)


Reply to this email directly or view it on GitHub<
https://github.com/mongodb/mongo-python-driver/pull/163#issuecomment-16409172>

.


Reply to this email directly or view it on GitHubhttps://github.com//pull/163#issuecomment-16410244
.

Justin Patrin
Developer Extraordinaire

@reversefold
Copy link
Contributor Author

Closing this pull request, will open a new one with the squashed PYTHON-436 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants