Speed up Gauss3.gauss() by using the imglib2 "Parallelization" class #93

maarzt · 2021-11-05T14:07:42Z

This PR changes how multi threading is done in the class Gauss3 and the package net.imglib2.algorithm.convolution. The code now simply uses the multi threading that's build into LoopBuilder.

This noticeably increases the performance. The reason being that Gauss3.gauss(sigma, source, target) no longer creates it's own ExecutorService. It instead uses imglib2's Parallelization class which by default uses the global ForkJoinPool. This reduces the number of threads that the code creates, and thereby improves the performance.

For Labkit's pixel classification the speed difference is roughly 30%. (Labkit makes heavy use of imglib2 convolutions)
A simple benchmark also shows the difference. Blurring a small 100x100 pixel image with Gauss3.gauss now takse 0.22 ms on my machine. The old code required 0.58 ms.

This simplifies the code, but also improves the performance, because Gauss3.gauss(sigma, source, target) will no longer create it's own ExecutorService. This also means that the multi threading behaviour of Gauss3 can now conviniently be controled using the Parallelization class. For example: Parallelization.runSingleThreaded(() -> { Gauss3.gauss(sigma, source, target); });

tpietzsch · 2021-11-08T16:09:36Z

src/main/java/net/imglib2/algorithm/gauss3/Gauss3.java

@@ -126,10 +127,9 @@
 	 */
 	public static < S extends NumericType< S >, T extends NumericType< T > > void gauss( final double[] sigma, final RandomAccessible< S > source, final RandomAccessibleInterval< T > target ) throws IncompatibleTypeException
 	{
-		final int numthreads = Runtime.getRuntime().availableProcessors();


Please update the javadoc:
"Computation is multi-threaded with as many threads as processors available."
should be something like
"...Computation may be multi-threaded, according to the current {@link Parallelization} context. (By default, runs on the common ForkJoinPool ...)"

Good point. I updated the javadoc.

tpietzsch · 2021-11-08T16:13:54Z

Very nice!

Could you put a bit more work into Gauss3? I would

deprecate all static methods that specify number of threads or ExecutorService, and
suggest in the javadoc of those methods how to get the same behaviour with Parallelization settings instead.

tpietzsch · 2021-11-08T16:49:56Z

I played a bit with it, and the speedup seems to be not only due to the creation/shutdown of the ExecutorService.
There is overhead that scales also to bigger images (i.e. 100x100x100), where I would otherwise expect the creation/shutdown overhead to disappear relative to the actual computations.

It seems to be also due to using the common ForkJoinPool instead of a FixedThreadPool.
I can basically get the same performance by passing Parallelization.getExecutorService() to the old code.
Just passing a FixedThreadPool with the number of threads == parallelism level of the ForkJoinPool does not work.
I can get closer by putting a FixedThreadPool with more threads, but not as good as the common pool.
Very interesting.

maarzt · 2021-11-09T09:02:27Z

@tpietzsch Thanks for the review. I updated the javadoc as you suggested.

Strange that the ForkJoinPool is overall faster than the FixedThreadPool. If have no idea why.

maarzt · 2021-11-09T09:02:55Z

Is there something else to do or is the PR ready for merge?

tpietzsch · 2021-11-09T09:20:55Z

Strange that the ForkJoinPool is overall faster than the FixedThreadPool. If have no idea why.

If I understand correctly, the ForkJoinPool will start new threads to maintain its parallelism level if other threads are stalled. I think that must be it, but I don't really see where that would help the convolution

maarzt · 2021-11-09T11:03:56Z

Cool it's merged 👍 ✨

I have a slightly different understanding of the ForkJoinPool: Usually when using a thread pool. It goes like this: All the tasks are submitted to the pool and thread who submitted the tasks waits for them to finish. During this time this thread is basically blocked.

This is different when the ForkJoinPool is used together with ForkJoinTasks. Here the thread submitting the tasks, will not only wait for the tasks to finish but it will as well help processing them. That's the trick.

tpietzsch · 2021-11-09T11:09:26Z

This is different when the ForkJoinPool is used together with ForkJoinTasks. Here the thread submitting the tasks, will not only wait for the tasks to finish but it will as well help processing them. That's the trick.

I don't think that's the only thing. The waiting thread doesn't consume any CPU really, so it would be just a matter of adding more threads to the FixedThreadPool. That actually helps. On my computer, the common pool parallelism is 11, numProcessors is 12. If I use a FixedThreadPool with 24 threads, I get much closer to common pool performance than with a FixedThreadPool of 12 threads. But common pool is still a bit faster. It could be that this is due to other effects like different number of tasks, etc.

maarzt · 2021-11-09T13:01:51Z

On my computer the story is completely different. The CPU has 4 cores and does Hyperthreading, so it can actively run 8 threads in parallel. There's not much of a performance difference between a FixedThreadPool with 4 or 8 threads, and ForkJoinPool also shows similar performance...

maarzt added 2 commits November 5, 2021 15:14

Add benchmark for Gauss3 on a small image

425b057

maarzt force-pushed the gauss3-parallelization branch from c851682 to 425b057 Compare November 5, 2021 14:22

maarzt mentioned this pull request Nov 5, 2021

Replace several preview.net.imglib2 classes by their equivalents from net.imglib2 juglab/labkit-pixel-classification#6

Merged

tpietzsch reviewed Nov 8, 2021

View reviewed changes

Gauss3: update javadoc and mark some methods as deprecated.

41e55dc

tpietzsch merged commit aae1f4d into master Nov 9, 2021

tpietzsch deleted the gauss3-parallelization branch November 9, 2021 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up Gauss3.gauss() by using the imglib2 "Parallelization" class #93

Speed up Gauss3.gauss() by using the imglib2 "Parallelization" class #93

Uh oh!

maarzt commented Nov 5, 2021 •

edited

Loading

Uh oh!

tpietzsch Nov 8, 2021

Uh oh!

maarzt Nov 9, 2021

Uh oh!

tpietzsch commented Nov 8, 2021

Uh oh!

tpietzsch commented Nov 8, 2021

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

tpietzsch commented Nov 9, 2021

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

tpietzsch commented Nov 9, 2021 •

edited

Loading

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

Uh oh!

Speed up Gauss3.gauss() by using the imglib2 "Parallelization" class #93

Speed up Gauss3.gauss() by using the imglib2 "Parallelization" class #93

Uh oh!

Conversation

maarzt commented Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpietzsch Nov 8, 2021

Choose a reason for hiding this comment

Uh oh!

maarzt Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

tpietzsch commented Nov 8, 2021

Uh oh!

tpietzsch commented Nov 8, 2021

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

tpietzsch commented Nov 9, 2021

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

tpietzsch commented Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maarzt commented Nov 9, 2021

Uh oh!

Uh oh!

maarzt commented Nov 5, 2021 •

edited

Loading

tpietzsch commented Nov 9, 2021 •

edited

Loading