Performance fix (Python 2.7 only): Make sure a buffer is added to socket file object #277
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By default httplib doesn't use buffering when converting sockets to file objects for reading.
httplib also reads in headers from the HTTP response with one read() per byte in the headers.
With the default behaviour of httplib, this means that there can be (for example under Linux) one recvfrom() syscall per byte in the header. The overhead of using the default unbuffered sockets is that it can add around 1 millisecond of time per byte in the headers to do a request with httplib.
This blows out the time for any riak call over HTTP by roughly 400ms just in python's httplib. riak itself is sending all HTTP response headers in one TCP frame, so none of this is latency on the wire.
This is covered by python bug http://bugs.python.org/issue4879
To resolve this, in Python 2.7 you can now specify 'buffering=True' to set a buffer when making a file object from the socket.
Unfortunately for Python 2.6 or earlier this is going to fail, and httplib is hardcoded to use an unbuffered file object. From the HTTPResponse constructor in python2.6/httplib.py:330
self.fp = sock.makefile('rb', 0)
so there isn't really a clean solution that I can see using the built-in httplib library with earlier python versions. To allow it to at least be faster in python2.7, surrounding it with a try/catch for NameError would allow the code to at least run on earlier versions (no slower than it is now)