Parallel Code Library - Terrible Real-World Performance

Parallel Code Library - Terrible Real-World Performance
I have some time consuming code which I thought would be an ideal candidate for the parallel code library. The code simplifies the boundaries of zip codes i.e. takes out all the small zip-zags. This involves a lot of perpendicular-distance-to-line type of calculations. There is a little memory allocation since for each zip code I need to create a list of possible intersecting lines. 

A non parallel version completed the transformation in 4 minutes 56 secs on my i7 desktop (4 core plus hyper-threading).  The parallel implementation was simple. I used the Parallel.For loop with a stride of about 1000. To my surprise this completed in 5 minutes 12 seconds.  Yes the parallel version is slower than the simple for loop!! When I look at the task manager the CPU utilization of the parallel version is about 20% while the routine is running. 

I then switched to ScaleMM2 as the memory manager. This version takes only 1 minute 34 seconds and CPU utilization is 100% for most of the time. 

This routine isn't all about memory allocation. It's mostly floating point maths. My conclusion is FastMM4 renders the parallel code library virtually useless for any task which allocates memory, even the smallest amount. I'd really prefer to not use a third party memory manager. FastMM4 has served us well for five years but I think it needs an overhaul.

Comments