Roman Yankovsky I think it's you the guy behind FixInsight, right?

Roman Yankovsky I think it's you the guy behind FixInsight, right?

I was thinking that the following piece of code should be signaled with a warning of non-portability:

for I := 1 to Length( S ) do
begin
  C:-= S[[ I ];
 // stuff here done with C
end;

and the suggested replacement should be:

for C in S do
begin
  // stuff done with C
end;

This would allow to flag all string cycles that aren't portable to FMX and the ARC compilers.

What do you people think?

Agree/Disagree? I think the for in has been
around for enough versions that this is actually doable and desirable.

A

Comments

  1. I think looking for the pattern 

    for := 1 to Length() do

    would be enough even if nothing is done with the single characters inside the loop body.

    Not sure how far FI can go but a true static code analysis would be able to find out if anywhere there is possible range violation inside the code (by evaluating all possible code path as far as possible).

    ReplyDelete
  2. What if the index variable is used inside the loop?  Delphi enumerators don't give you access to the index.  It'd be pretty hard to know via code analysis if the variable needed to be used or not - ie if it really could be replaced with a for..in.

    But I like the idea.  Any 0 to Count-1 or Pred(Count) or whatever, over a type that is enumerable, could show this same warning.

    ReplyDelete
  3. "Any 0 to Count-1 or Pred(Count) or whatever, over a type that is enumerable, could show this same warning."

    Err, no because that code is portable. Andrea is pointing to the fact that the code above does not work for ZBS.

    ReplyDelete
  4. The other thing is that for..in is slow.  I've had to remove it in performance-critical code before.  Makes a noticeable difference.

    Now, if the compiler was smarter... I believe the C# compiler can, in many situations, take a for..in statement equivalent (foreach) and reduce it down to the same code as a plain loop.

    ReplyDelete
  5. "It'd be pretty hard to know via code analysis if the variable needed to be used or not"

    You think so? Check out what code analysis can do:
    https://www.youtube.com/watch?v=C-_dw9iEzhA

    "I believe the C# compiler can, in many situations, take a for..in statement equivalent (foreach) and reduce it down to the same code as a plain loop."

    Yes, it does, but that was not always the case:
    http://www.codeproject.com/Tips/531893/For-Vs-Foreach-Benchmark

    ReplyDelete
  6. For cases where it is slow, you can always keep it as a for, but knowing that what you are doing is not portable. We make compromises like this all the time - it's also why I said it should be a warning instead of an "error" if you get my drift.

    Something like this would also help identify all places where such code exists and allow at the very least to use compiler directives to adjust it.

    I think the wins would far outweigh the losses.

    A

    ReplyDelete
  7. David Millington Stefan Glienke  C# has a peephole optimization for that, meaning if the optimizer looks for a very specific construct, if you deviate, the optimization does not trigger.

    Andrea Raimondi cases where "for c in string" apply are quite corner-case however: even when the index is not required for the loop code itself, it is often required for error reporting...

    ReplyDelete
  8. Eric Grange I argue that given the direction where Delphi is going, For In makes more sense even if you need a supporting variable to keep track of the char count. And as I said, for cases where you need to make a decision, you at least know why it is there and what problems you have in porting it. It can be documented and the knowledge retained.

    ReplyDelete
  9. Good idea. I agree with Stefan Glienke 's first comment. It is pretty easy to implement.

    ReplyDelete
  10. Maybe it worth adding a new type of rules - portability checks.

    ReplyDelete
  11. It may be perfectly valid to start at index 1 even with 0 based strings. We may have handled the first char already. Such a warning may lead to a set of false positives. IMHO the fact that string indexing may vary WAS the issue. Not a poor for loop by itself.

    ReplyDelete
  12. A. Bouchez It's not only the fact that you are starting at 1 but also that you going to Length. Unless you do an i-1 inside the loop its more likely that it does not work with ZBS.

    ReplyDelete
  13. I'd prefer FixInsight to report if the ZBS is enabled instead. That thing is an abomination and should be strictly ignored.

    ReplyDelete
  14. Stefan Glienke Using length()-1 may also occur in a perfectly legitimate use case, if you want to exclude the last char. Only if you set for i := 0 to length(s) (for ZBS) or for i := 1 to length(s)+1 it may emit an useful warning.

    ReplyDelete
  15. A. Bouchez Dude, from 1 to Length(s) is off by one on ZBS. That's what this is about.

    ReplyDelete
  16. David Millington for in is not intrinsically slow. The performance depends very much on the iterators. I believe that for strings and arrays, the built in iterators are reasonable. But a lot of collection iterators, for instance the ones that Emba provide, are class based and so involve heap allocation. That's a perf issue.

    Of course, it's trivial to write these iterators using value types (records) and so avoid heap allocation. But I don't think that the Emba devs that are deployed in this area really appreciate that issue. 

    I would expect that good quality third party libraries provide value type based iterators. I guess Stefan Glienke knows whether or not that is the case.

    ReplyDelete
  17. Actually, looking at the code emitted for 

        for ch in SomeStr do

    it doesn't look as simple as that for
       
        for i := 1 to Length(SomeStr) do

    so I might have to retract some of my previous comment.

    At least in principle, for in loops should be possible to compile efficiently........

    ReplyDelete
  18. David Heffernan That's an interesting tidbit that I never realized.  Is it possible to reintroduce the GetEnumerator function and return a record for any list?

    ReplyDelete
  19. David Heffernan If IEnumerable has the method GetEnumerator: IEnumerator I don't see a value type there. Value types can only be used of you are not working on a type hierarchy. Especially if the iterator is the one implementing the logic (as most extensions in Spring4D do to support lazy evaluation and streaming)

    ReplyDelete
  20. David Heffernan The performance problems I saw were to do with calling the methods, not the heap allocation. Multiple calls to (effectively virtual, through the interface) Current and MoveNext were slow.

    I'd love to see a compiler hint attribute that lets a type support enumeration via existing Count and [] properties. Something like [Enumerable Low=0 High=Count-1 Access=Items] or something.  Then the compiler could auto-generate a for loop for for..in syntax.

    ReplyDelete
  21. David Millington Have  GetEnumerator return a record rather than an interface

    ReplyDelete
  22. I slapped something together - you can see that the difference between record and object is minimal. The difference between object and interface is because of virtual calls.

    http://pastebin.com/5NSiwhag

    Imo nothing to worry about - it's the price you pay for indirection.

    By using records you lose the possibility to have IEnumerator implement forward, backwards, skipping every 2nd items, only returning those that match filter and so on.

    ReplyDelete
  23. Stefan Glienke For IEnumerable, and its GetEnumerator: IEnumerator, sure. For a GetEnumerator that you declare on your own type, you have the freedom to have GetEnumerator return a value type. I can see that Spring4D has other design considerations would not admit that.

    ReplyDelete
  24. David Millington The method calls add on overhead because of the call and another because they reduce register optimization opportunities. That said, the worst hit is when Current return something that needs record cloning or reference counting (this is an area where GC can really leave ARC in the dust).

    Stefan Glienke performance becomes somewhat irrelevant when an enumerator is designed to perform post-filtering anyway, because the post-filtering itself is going to be quite a lot more inefficient than micro-optimizations at the enumerator level.

    ReplyDelete
  25. Stefan Glienke In a multi-threaded environment, the difference between automatic allocation and heap allocation is often not minimal.

    What's more, the overhead of enumerator creation is going to be a lot more significant for collections with small numbers of elements. Try your test again with a single digit item count.

    ReplyDelete
  26. +Lars You could try. As I understand the compiler, for in is implemented by transforming the source code to the equivalent while loop and then passing that to the compiler. So long as name resolution finds your GetEnumerator, you'll be able to impose your will.

    In practice you'll want to sub class because the collections of interest are typically generic and class helpers are no use. Again, we are let down by the language. Helpers in their current form just don't quite get it done.

    ReplyDelete
  27. It would be better to replace it with for
    i:= low(s) to high(s) do ...
    and then it will work for every string (and every array as well).

    ReplyDelete

Post a Comment