-
Notifications
You must be signed in to change notification settings - Fork 20
Speed up Gauss3.gauss() by using the imglib2 "Parallelization" class #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This simplifies the code, but also improves the performance, because Gauss3.gauss(sigma, source, target) will no longer create it's own ExecutorService. This also means that the multi threading behaviour of Gauss3 can now conviniently be controled using the Parallelization class. For example: Parallelization.runSingleThreaded(() -> { Gauss3.gauss(sigma, source, target); });
c851682
to
425b057
Compare
@@ -126,10 +127,9 @@ | |||
*/ | |||
public static < S extends NumericType< S >, T extends NumericType< T > > void gauss( final double[] sigma, final RandomAccessible< S > source, final RandomAccessibleInterval< T > target ) throws IncompatibleTypeException | |||
{ | |||
final int numthreads = Runtime.getRuntime().availableProcessors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the javadoc:
"Computation is multi-threaded with as many threads as processors available."
should be something like
"...Computation may be multi-threaded, according to the current {@link Parallelization} context. (By default, runs on the common ForkJoinPool ...)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I updated the javadoc.
Very nice! Could you put a bit more work into
|
I played a bit with it, and the speedup seems to be not only due to the creation/shutdown of the ExecutorService. It seems to be also due to using the common |
@tpietzsch Thanks for the review. I updated the javadoc as you suggested. Strange that the ForkJoinPool is overall faster than the FixedThreadPool. If have no idea why. |
Is there something else to do or is the PR ready for merge? |
If I understand correctly, the ForkJoinPool will start new threads to maintain its parallelism level if other threads are stalled. I think that must be it, but I don't really see where that would help the convolution |
Cool it's merged 👍 ✨ I have a slightly different understanding of the ForkJoinPool: Usually when using a thread pool. It goes like this: All the tasks are submitted to the pool and thread who submitted the tasks waits for them to finish. During this time this thread is basically blocked. This is different when the ForkJoinPool is used together with ForkJoinTasks. Here the thread submitting the tasks, will not only wait for the tasks to finish but it will as well help processing them. That's the trick. |
I don't think that's the only thing. The waiting thread doesn't consume any CPU really, so it would be just a matter of adding more threads to the FixedThreadPool. That actually helps. On my computer, the common pool parallelism is 11, numProcessors is 12. If I use a FixedThreadPool with 24 threads, I get much closer to common pool performance than with a FixedThreadPool of 12 threads. But common pool is still a bit faster. It could be that this is due to other effects like different number of tasks, etc. |
On my computer the story is completely different. The CPU has 4 cores and does Hyperthreading, so it can actively run 8 threads in parallel. There's not much of a performance difference between a FixedThreadPool with 4 or 8 threads, and ForkJoinPool also shows similar performance... |
This PR changes how multi threading is done in the class
Gauss3
and the packagenet.imglib2.algorithm.convolution
. The code now simply uses the multi threading that's build intoLoopBuilder
.This noticeably increases the performance. The reason being that
Gauss3.gauss(sigma, source, target)
no longer creates it's ownExecutorService
. It instead uses imglib2's Parallelization class which by default uses the globalForkJoinPool
. This reduces the number of threads that the code creates, and thereby improves the performance.For Labkit's pixel classification the speed difference is roughly 30%. (Labkit makes heavy use of imglib2 convolutions)
A simple benchmark also shows the difference. Blurring a small 100x100 pixel image with
Gauss3.gauss
now takse 0.22 ms on my machine. The old code required 0.58 ms.