Recursive Parallel For Loops

- August 28, 2017

Recursive Parallel For Loops
I have an algorithm which isn't a sorting algorithm but conceptually works similar to the Quicksort algorithm i.e. it partitions a list and then works on the two partitions recursively. I'd like to utilize multi cores and have separate threads for each partition. However, everything just locks up. It seems the Parallel Programming Library creates too many thread, all waiting on one another. In my case there are approximately 100 threads created. I would have thought the PPL and the OS would have given resources to the active threads but that doesn't seem to be the case. I seem to have the same problem as in this thread (https://goo.gl/GVJ1R2).

The solutions proposed on Stack Overflow aren't helpful. They suggest forking at the root (or first depth) but I'd like to utilize 16+ cores when available.

Is there any approach I can take to solving this problem? Would the OmniThread Library offer a better solution than the PPL?

Comments

Uwe RaabeAugust 28, 2017 at 3:45 PM
Hard to say from here without seeing any code. There are too many things to keep in mind when doing multi-threading. No library will do that for you.
Divide and conquer tends to fire back when you divide too much and the overhead becomes the limiting factor.
ReplyDelete
Replies
Ришат БикчантаевAugust 28, 2017 at 5:55 PM
Charges when executing (Tparallel.For) greater than simple (For ... Loop) on small to one million values. You should study the sorting material using CUDA, and there will be more benefits.
ReplyDelete
Replies
Alex EgorovAugust 28, 2017 at 10:02 PM
Ришат Бикчантаев Any examples to use CUDA with Delphi?
ReplyDelete
Replies
Primož GabrijelčičAugust 29, 2017 at 12:42 AM
With OTL, you can attack partitioning algorithms with Fork/Join: otl.17slon.com - 6. How-to

Keep in mind, though, that fork/join has quite high overhead so you should find a sweet spot - the best subproblem size where you should start working on the problem single-threadedly.
ReplyDelete
Replies
Steve MaughanAugust 29, 2017 at 11:53 AM
Thanks for the input. I'm going to look at Primož Gabrijelčič's OTL in a bit more detail. I also came up with another solution. Give the number of available cores as parameter. After the partition only use the threaded version if there are 2 or more available cores (and divide the number of cores by two when calling the threaded version), else use the single threaded version.

Here's some pseudo code:

procedure QuickSort(low, high, cores: integer)
begin
PartitionList;
if cores > 1 then
Do_In_Parallel(
QuickSort(low, mid - 1, cores div 2);
QuickSort(mid + 1, high, cores div 2);
)
else
begin
QuickSort(low, mid - 1, cores div 2);
QuickSort(mid + 1, high, cores div 2);
end;
end;

All this requires is an initial estimate of the number of cores.
ReplyDelete
Replies
David HeffernanAugust 30, 2017 at 12:21 AM
I don't think the point is that OTL is inherently to be preferred. That's not how I read Primož Gabrijelčič's comment at all. He's pointing to an example of an algorithmic approach. That approach can be transferred onto any decent parallel library. The identity of the library is of secondary relevance.
ReplyDelete
Replies
Ришат БикчантаевAugust 30, 2017 at 12:37 AM
Alex Egorov
gist.github.com - bitonic_sort.cu
ReplyDelete
Replies
Steve MaughanAugust 30, 2017 at 3:36 AM
David Heffernan Thanks. I haven't come across the Fork/Join pattern using the PPL. From what I can see it looks to be more low level than "algorithmic". I'll do some more research, but if you can point me to anything I'd appreciate it.
ReplyDelete
Replies
Mike MargerumAugust 30, 2017 at 2:19 PM
I'm starting to drink the functional programming kool aid. Paralellizing code becomes trivial when everything is a function and the language supports immutability at its core. clojuredocs.org - pmap - clojure.core | ClojureDocs - Community-Powered Clojure Documentation and Examples
ReplyDelete
Replies

Add comment

Search This Blog

Delphi Developers Archive

Recursive Parallel For Loops

Comments

Post a Comment