-
Notifications
You must be signed in to change notification settings - Fork 183
Implement threaded parallel fetches. #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As just a comment, is it possible not using the python |
__all__ = ['bm', 'bmbm'] | ||
|
||
|
||
def bmbm(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cryptic!
@shuhaowu I'm already using |
Yeah. I was concerned about this as using |
""" | ||
|
||
def __init__(self, size=POOL_SIZE): | ||
self._inq = Queue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that this Queue is unbounded. Although, given the efficacy of this feature combined with a typical usage pattern of volume of keys this might only bite someone in the absolute worst case.
👍 |
Implement threaded parallel fetches.
The implementation is based on a static worker pool that is started when the first multiget operation is performed. Workers can be reused across multiget operations and feed their responses back to the requestor via a queue. This is similar to the
ThreadPoolExecutor
idea in Java. There are opportunities to make this pool configurable, but it is not for the moment, instead using the CPU count as a measure for how many workers to start.This also adds a little benchmark utility that mimics Ruby's
benchmark.rb
. Unfortunately the benchmarks indicate that HTTP gets no benefit from parallel fetch, in fact, it suffers. I am unable to understand how this is possible unless the payload size is too small, such that generating the requests and parsing the responses dominates network latency. Another possibility would be to allow the pool to usemultiprocessing
instead ofthreading
, which gets around the GIL but incurs the cost of crossing process boundaries.Addresses #225.