I hope there are some multithreading pros around :)

- November 14, 2014

I hope there are some multithreading pros around :)

http://stackoverflow.com/questions/26936088/best-way-to-synchronize-two-threads-with-each-other-in-delphi
http://stackoverflow.com/questions/26936088/best-way-to-synchronize-two-threads-with-each-other-in-delphi

Comments

Asbjørn HeidNovember 14, 2014 at 2:03 PM
Perhaps I'm overly dense here now but I don't get what your sample is supposed to do. Seems like a call to Pong is missing somewhere?
ReplyDelete
Replies
Asbjørn HeidNovember 14, 2014 at 2:07 PM
And where's the second thread?
ReplyDelete
Replies
Stefan GlienkeNovember 14, 2014 at 2:09 PM
Pong is called from the thread base class Execute.
ReplyDelete
Replies
Asbjørn HeidNovember 14, 2014 at 2:10 PM
Duh, yeah ok. So you're spawning one and using the main thread.
ReplyDelete
Replies
Asbjørn HeidNovember 14, 2014 at 2:17 PM
In the actual application, is the idea that the threads do a bit of work in parallel, then does some work serialized, then parallel etc? If not, why two threads?
ReplyDelete
Replies
Stefan GlienkeNovember 14, 2014 at 2:21 PM
Coroutine (on platforms <> windows because no fiber API)
ReplyDelete
Replies
Asbjørn HeidNovember 14, 2014 at 3:03 PM
Stefan Glienke Yeah ok, it sounded very much like coroutines is what you wanted.

I'm pretty sure that if the tasks each thread does takes relatively little time, performance will suck no matter what. Context switching overhead is normally pretty huge (up to microseconds) on desktop CPUs, though I don't have experience with ARM CPUs.
ReplyDelete
Replies
A. BouchezNovember 14, 2014 at 4:15 PM
Libevent on POSIX? See how the events are implemented in http://svn.freepascal.org/svn/fpcprojects/lnet/trunk/lib/
ReplyDelete
Replies
Stefan GlienkeNovember 14, 2014 at 4:31 PM
I am now using a TEvent with a spinwait for 10ms before going to the TEvent.WaitFor which works kinda ok.
ReplyDelete
Replies
David MillingtonNovember 15, 2014 at 8:10 AM
Do they have to exactly alternate (ie always thread 1, 2, 1, 2...) or can they just run approximately the same number of times each over a long timescale (1, 1, 1, 2, 1, 2, 2, 2, 1, 2...)?

Apart from the time spent context switching, on Windows an event is the best way to make a thread wait / pause without consuming CPU. Sharing a single event would result in the second form of behaviour, while having two events that alternate would be a decent way of doing the first. If you really want them to alternate exactly, I'm not sure I'd use a spinwait at all - just toggle two separate events. It's an approach open to sync issues itself, but careful ordering of when they are reset and waited upon would remove that.
ReplyDelete
Replies
Asbjørn HeidNovember 15, 2014 at 8:22 AM
FWIW SignalObjectAndWait on Windows makes a big difference when using Event objects. But of course, that doesn't matter on non-Windows platforms...

Is it always two threads, or will there be more, if so how are they scheduled? If there are more it may be worth looking into native queue solutions (epoll, kqueue afaik).
ReplyDelete
Replies
Stefan GlienkeNovember 15, 2014 at 10:03 AM
David Millington Yes they have to alternate. And as I wrote in my question the approach of just using an event / tmonitor was resulting in horrible performance if the switching was performed very fast. That is why I looked into the spinwait which performed pretty good but as I wrote blew CPU if the switching was not so fast. Hence the combination I am using right now.

Asbjørn Heid Well for the coroutine implementation there are always two threads switching back and forth but in theory you can have multiple coroutines which are all threads but I yet have to look into that if everything works (with fibers it does).
ReplyDelete
Replies
Lars FosdalNovember 16, 2014 at 1:34 AM
I usually only use threads if there is a significant amount of work that has to be done, and sync with messages. In what type of apps do you guys use fibers for short lived threads?
ReplyDelete
Replies
Stefan GlienkeNovember 16, 2014 at 3:13 AM
Lars Fosdal Read about coroutines. Then come back :)
ReplyDelete
Replies
Lars FosdalNovember 16, 2014 at 3:59 AM
Stefan Glienke - Ok, got it. It looks useful - but potentially a tad tricky.
ReplyDelete
Replies
Eric GrangeNovember 17, 2014 at 1:20 AM
Getting co-routines to behave with a thread-based implementation will be tricky, as the switching and signaling is going to be pretty major.
The problem is that you can't tell what the co-routine is doing, if it's waiting on I/O (such as a DB) or doing long computations, spinwaiting will gobble up CPU, if it's simple CPU tasks, using an event will slow everything down as you found out.

You might have some luck with an adaptative strategy (that switches dynamically between spinlock and event based on runtime behavior), though that adds complexity, and will be vulnerable.
For instance a co-routine fetching from a DB will alternate between I/O bound calls (when it actually fetches a buffer from the DB) and light CPU calls (when it fetches the next record from the buffer).

There are special cases like when the co-routine is stateless or independant from it's call site, then the co-routine thread can "work ahead" and queue results. This can reduce the thread-switching to a minimum and make it practical and quite efficient, but you're still having to be careful, as it throws an extra thread-safe FIFO queue in the mix.

Also debugging any of the above can quickly turn nightmarish, prepare to be cursed by whoever has to maintain your code ;-)
ReplyDelete
Replies
Stefan GlienkeNovember 17, 2014 at 1:47 AM
Eric Grange Thanks for your input.

I am using a combination of a spin wait and an event (I added that to the question on SO if you want to look into it). So for fast switching it does not run into the event overhead but as soon as it has to wait 10ms (default setting) it goes into the event.

I did not do much more than some number sequences yet so I cannot say how it behaves with IO operations but it should not differ.
ReplyDelete
Replies
Eric GrangeNovember 17, 2014 at 3:00 AM
I/O operation will just be "very slow or very fast with a whole lot of variance" :-)

10 ms is quite a long time for a spinwait, it means that if your co-routines is at a rate of a hundreds per seconds, you'll basically be tying down a whole CPU all the time.

More reasonable might be a matter of thousandths of CPU cycles, ie. 10 micro-seconds ballpark.

For the exact cut-off, you could time how fast an event-based switching on the target platform... that timing is complicated by modern CPUs have varying clock frequencies, but you should be able to get an appropriate order-of magnitude.
ReplyDelete
Replies
Stefan GlienkeNovember 17, 2014 at 3:13 AM
Eric Grange Yes I noticed that with fast switches it burns much CPU. So I guess instead of using GetTickCount I should just use a counter for how often I do the TThread.SpinWait?
ReplyDelete
Replies
Eric GrangeNovember 17, 2014 at 3:32 AM
Yes, use a counter, and do not use TThread.SpinWait (ever).
Just do your own spin wait.

n := nbIterations;
while n>0 do begin
if conditionReached then break;
Dec(n);
end;

with "conditionReached" the read access to a field you'll use as flag (you may want to read through a pointer to that field, to guarantee it won't ever be "optimized", Delphi32/64 won't optimize it, but LLVM might)

Your other thread will just look like

...do work...
conditionReached := True;

The CPU guaranties consistency, so conditionReached won't be true until the work is done.

Also another alternative to events and spinwait on Windows would the IO Completion Ports, which are basically a very high performance, kernel-optimized queues. You can use them to pass messages across threads and wait on those messages in a very efficient fashion (the event/spinwait logic is built-in). It's platform-specific though.
ReplyDelete
Replies
David MillingtonNovember 17, 2014 at 3:37 AM
Stefan Glienke Why coroutines instead of threads? (I didn't see this asked by anyone above; Lars Fosdal touched on it.)

I'm just wondering if there is a more standard, thread-based solution applicable to your problem, which would give comparable program behaviour with less CPU usage or other overhead.

Also I know there are some Delphi coroutine implementations, and even some TP ones. Could some of those be useful to you?
ReplyDelete
Replies
Lars FosdalNovember 17, 2014 at 3:50 AM
Without having read through CoRoutine use cases, we obviously have Generators as one case, another is Iterators that do some sort of prefetch, evaluation or calculation to prepare for next in loop. Not sure if parallel loop processing could be a possible third. Parallel capable parsers or tree-walking analyzers or search functions could also be a candidate;

My initial thoughts was that I would have to do a lot of thinking about the possible side effects of each yield.
ReplyDelete
Replies
Stefan GlienkeNovember 17, 2014 at 4:03 AM
David Millington I will not explain the difference of coroutines and threads here. You can google that yourself and find articles that probably explain that better than I could.

I think I know all public coroutine implementations in Delphi because I have been on that subject for quite some time now on and off. They all are implemented with x86 assembler which does not help me as I am aiming for cross platform. On windows I am using the fiber api that is available since XE/2003.

I am not desperate enough yet to look into Boost and C++ code or even try to use that from Delphi on mobile platforms.

Lars Fosdal The problem with iterators is that you then have to write classes and MoveNext methods in a different ways than you can do with coroutines. See my Fibonacci example I posted. Of course that is not the usecase because you can easily write that differently without a coroutine but it shows how clean and iterative the code looks without implementing the state machine yourself.

Eric Grange Why not use TThread.SpinWait and use what instead? Just a loop decrementing the counter? I read this article (http://joeduffyblog.com/2006/08/22/priorityinduced-starvation-why-sleep1-is-better-than-sleep0-and-the-windows-balance-set-manager/) and was about to implement it in a similar way - would that be something?
Also for windows I am already using the fiber API so the thread implementation basically is for mobile and OSX.
ReplyDelete
Replies
David MillingtonNovember 17, 2014 at 4:33 AM
Stefan Glienke I know what a coroutine is, I just am curious why you are using them. They're rare. Perhaps something similar can be done using a more common multitasking system.
ReplyDelete
Replies
Stefan GlienkeNovember 17, 2014 at 4:37 AM
David Millington It's not about the problem they can solve. It's about implementing them. :)
ReplyDelete
Replies

Add comment

Search This Blog

Delphi Developers Archive

I hope there are some multithreading pros around :)

Comments

Post a Comment