Implement threaded parallel fetches. #263

seancribbs · 2013-06-25T19:17:27Z

The implementation is based on a static worker pool that is started when the first multiget operation is performed. Workers can be reused across multiget operations and feed their responses back to the requestor via a queue. This is similar to the ThreadPoolExecutor idea in Java. There are opportunities to make this pool configurable, but it is not for the moment, instead using the CPU count as a measure for how many workers to start.

This also adds a little benchmark utility that mimics Ruby's benchmark.rb. Unfortunately the benchmarks indicate that HTTP gets no benefit from parallel fetch, in fact, it suffers. I am unable to understand how this is possible unless the payload size is too small, such that generating the requests and parsing the responses dominates network latency. Another possibility would be to allow the pool to use multiprocessing instead of threading, which gets around the GIL but incurs the cost of crossing process boundaries.

$ python -m riak.client.multiget
Benchmarking multiget:
      CPUs: 8
   Threads: 8
      Keys: 10000

             user         system       ( real         )
populate         2.460000     0.210000 (    10.640000 )


Rehearsal -------------------------------------------------
http seq        11.750000     4.160000 (    26.500000 )
http multi      21.650000    20.810000 (    31.190000 )
pbc seq          2.010000     0.180000 (     6.530000 )
pbc multi        4.080000     1.990000 (     4.840000 )
-----------------------------------------------------------

             user         system       ( real         )
http seq        13.040000     4.290000 (    28.080000 )
http multi      21.340000    20.430000 (    30.660000 )
pbc seq          2.110000     0.200000 (     6.570000 )
pbc multi        3.890000     1.920000 (     4.580000 )

Addresses #225.

shuhaowu · 2013-06-25T20:25:17Z

As just a comment, is it possible not using the python threading?

evanmcc · 2013-06-25T22:07:26Z

riak/benchmark.py

+__all__ = ['bm', 'bmbm']
+
+
+def bmbm():


seancribbs · 2013-06-26T13:55:50Z

@shuhaowu I'm already using threading.

shuhaowu · 2013-06-27T05:00:34Z

Yeah. I was concerned about this as using threading to parallelize operations in Python usually cause a performance degrade (unless the IO wait is really long, which it shouldn't be in this case).

mgodave · 2013-07-23T17:20:57Z

riak/client/multiget.py

+    """
+
+    def __init__(self, size=POOL_SIZE):
+        self._inq = Queue()


It appears that this Queue is unbounded. Although, given the efficacy of this feature combined with a typical usage pattern of volume of keys this might only bite someone in the absolute worst case.

mgodave · 2013-07-23T17:31:55Z

👍

Implement threaded parallel fetches.

Sean Cribbs added 5 commits June 25, 2013 12:50

Add multiget implementation with a static worker pool.

fca898b

Add multiget benchmark.

a518e90

Expose multiget operations on client and bucket objects.

0294e49

Use CPU count as the pool size, no difference apparent above that.

dac8336

Add some tests for multiget.

753a0a4

evanmcc reviewed Jun 25, 2013
View reviewed changes

riak/benchmark.py

__all__ = ['bm', 'bmbm']

def bmbm():

Copy link

Contributor

evanmcc Jun 25, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cryptic!

Rename bmbm and friends and fix minor bugs in multiget.

3fdc469

ghost assigned seancribbs Jun 26, 2013

mgodave reviewed Jul 23, 2013
View reviewed changes

seancribbs pushed a commit that referenced this pull request Jul 23, 2013

Merge pull request #263 from basho/gh225-multi-get

f8cd2e5

Implement threaded parallel fetches.

seancribbs merged commit f8cd2e5 into master Jul 23, 2013

seancribbs deleted the gh225-multi-get branch July 23, 2013 17:53

seancribbs removed their assignment May 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement threaded parallel fetches. #263

Implement threaded parallel fetches. #263

Uh oh!

seancribbs commented Jun 25, 2013

Uh oh!

shuhaowu commented Jun 25, 2013

Uh oh!

evanmcc Jun 25, 2013

Uh oh!

seancribbs commented Jun 26, 2013

Uh oh!

shuhaowu commented Jun 27, 2013

Uh oh!

mgodave Jul 23, 2013

Uh oh!

mgodave commented Jul 23, 2013

Uh oh!

Uh oh!

Implement threaded parallel fetches. #263

Implement threaded parallel fetches. #263

Uh oh!

Conversation

seancribbs commented Jun 25, 2013

Uh oh!

shuhaowu commented Jun 25, 2013

Uh oh!

evanmcc Jun 25, 2013

Choose a reason for hiding this comment

Uh oh!

seancribbs commented Jun 26, 2013

Uh oh!

shuhaowu commented Jun 27, 2013

Uh oh!

mgodave Jul 23, 2013

Choose a reason for hiding this comment

Uh oh!

mgodave commented Jul 23, 2013

Uh oh!

Uh oh!