Since there was this offtopic argument going on about performance of for-in versus for-to.

Since there was this offtopic argument going on about performance of for-in versus for-to.
http://delphisorcery.blogspot.com/2018/07/loop-wars-measuring-performance.html

Comments

  1. Great write up.

    Now I can remove this off my to do list. I've had a pending item for over 2 months to blog about this.

    I'm all for using for..in for code safety. Even with the "default" performance hit its only noticeable in very large loops, or on servers which collectively perform a lot of loop iterations.

    I'm not for wasting CPU, but I strongly prefer safe code even if a little CPU must be sacrificed. But when its a lot, then its a different story.

    ReplyDelete
  2. Alas as you noticed the solutions are well known, and have been for a long while... :/

    Also of note is that using TStrings imposes a penalty (regardless of loop flavor), because a function returning a string will trigger other compiler shortcomings with reference counting, exception frames and such...

    If performance critical, use a structure where the string field can be accessed "directly". In some cases (when filtering and cascading filters) using a thin string boxing class can have benefits by reducing the reference counting over manipulating the string directly (under the non-ARC compilers of course).
    This even more important when the strings are actually immutable and you're heavily multithreaded: the refcounting is then just a sort of pointless mini-mutex.

    ReplyDelete
  3. Also I didnt post this in the original thread, but the code I posted was scaled back. I did several tests to simulate both tight loops and ones with overhead.

    Most loop contents have enough overhead that the difference of for..in vs for even with the full performance hit is negligible in context. Its like measuring the amount of bathwater spilled over by dropping a marble in a full bath tub. Sure its more than a smaller marble, but compared to the contents of a full tub, nothing in context.

    Tight loops with a high count, or tight loops on servers with many threads or other executions however there can be a noticeable difference in high performance applications.

    ReplyDelete
  4. You forgot the most visible conclusion of all: for in can be almost as fast as a simple for loop if your compiler does a superb job, but it will be never faster.

    ReplyDelete
  5. Alexandre Machado It can indeed be faster when is just a shifting pointer over the source.

    ReplyDelete
  6. Short of handcoded assembly (or serious hacks) and a very optimized corner case I don't see that happening.

    And thats more like the linked list I talked about this in the other thread. Things that normally arent indexed easily or dont index well (speed), generally dont with for..to loops anyway.

    ReplyDelete
  7. I don't see it happening not even in .NET. There are several detailed comparisons out there. In real world, "for" is faster 99% of the times. In the other 1% "foreach" (or for in) can match the speed. This is an old one from Jon Skeet:
    codeblog.jonskeet.uk - For vs Foreach on arrays and lists

    ReplyDelete
  8. Alexandre Machado We are comparing apples and oranges here - .NET, C# and JIT are generating amazingly well optimized code that Delphi can only dream of (yes, even though it compiles to native code! rolleyes) - anyway look at these results (which were posted in the comments of Jons article) - http://cc.davelozinski.com/c-sharp/for-vs-foreach-vs-while

    Notice the difference between the for and foreach loop of the int array and his assumption: "I can only think that this is because it takes longer for a For loop to calculate and access a particular array index than it does a ForEach loop to assign the next value to allocated memory."

    What does that mean?

    Well the compiler can turn a foreach/for-in loop over an array into this:

    p := @nums[0];
    for i := 0 to High(nums) do
    begin
    if p^ = 0 then;
    Inc(p);
    end;

    If you measure that against

    for i := 0 to High(nums) do
    if nums[i] = 0 then;

    You will see that the pointer shifting wins because it trades the second loop variable (still using the count to zero one) for a pointer.

    Loops.dpr.87: if nums[i] = 0 then;
    004DC975 8B45FC mov eax,[ebp-$04]
    004DC978 833CB000 cmp dword ptr [eax+esi*4],$00
    004DC97C 46 inc esi
    Loops.dpr.86: for i := 0 to High(nums) do
    004DC97D 4B dec ebx
    004DC97E 75F5 jnz $004dc975

    vs

    Loops.dpr.102: if p^ = 0 then;
    004DCA2B 833800 cmp dword ptr [eax],$00
    Loops.dpr.103: Inc(p);
    004DCA2E 83C004 add eax,$04
    Loops.dpr.100: for i := 0 to High(nums) do
    004DCA31 4B dec ebx
    004DCA32 75F7 jnz $004dca2b

    While having said that - I think a compiler could even turn a for-to loop into that...

    But still these results tell nothing about real application performance - I have used the pointer shifting instead of index access before because in some microbenchmark it performed better but in some real code it had zero affect.

    ReplyDelete
  9. Stefan Glienke I see several "suspicious" things in that article you have just posted:
    1- His While loop is 2x faster (List), 4x faster (Dictionary) and 6x faster (DataTable) than a simple for (#loops = 100). Two things come to my mind: (a) his own code creates distortions or (b) .NET jit compiler is severely flawed and is unable to optimize a simple for loop like that, comparing to a while loop.
    Given that I consider that (b) is not true, the only other viable explanation is (a). If you have other explanation for that, I'm all ears.
    2- Unfortunately, his source code is broken and won't compile in VS 2017. There are dozens of errors (even after replacing < and > 's) and I don't have enough time to fix it right now, so it is actually impossible to determine myself if his data is correct (I ran Jon Skeet benchmark myself, which compiles just fine as is). Maybe that's the reason why there is not a single comment confirming/denying his findings... His code style, which reminds me something I did more than 30 years ago, doesn't make me want to spend time fixing it either.
    3- Dictionaries are not intended to be accessed the way he does. Delphi's own generic TDictionary doesn't even allow that! Dictionary benchmark should just be ignored, IMO.
    4- If you create a benchmark comparing arrays of simple types (int and double) and lists (I'll deliberately pretend Dictionaries were not included) and in the end you find out that foreach is faster when iterating through arrays and declare that foreach is faster in most cases, the conclusion is very likely to be wrong in real world applications. Arrays are not as versatile as lists and are not ideal for several common tasks in real world apps where lists are more likely to be used. PS: This also applies to Jon Skeet's benchmark.

    ReplyDelete
  10. "While having said that - I think a compiler could even turn a for-to loop into that..."

    Exactly. I don't see why that kind of optimization can't be applied in both cases.

    ReplyDelete
  11. It would be nice if a for..in could return a typed pointer to an element instead of the value. I guess it can be faster in some cases, and it’ll allow you to change the value that it points to.

    The compiler could switch to that behavior if the loop variable is a ^reference to the element type.


    ReplyDelete

Post a Comment