I think I found a compiler issue, could you please vote for it?

I think I found a compiler issue, could you please vote for it?

For "if AChar <= #255 then" compiler (XE6) gently generates the following code:
005D734A 66817DFA4F04     cmp word ptr [ebp-$06],$044f

$044f does not equal 255!

But It works fine when Ord function is used:

if Ord(AChar) <= 255  then
005D7352 668B45FA         mov ax,[ebp-$06]
005D7356 663DFF00         cmp ax,$00ff
http://qc.embarcadero.com/wc/qcmain.aspx?d=124402

Comments

  1. Actually it's the old Char horror introduced with Delphi 2009.

    #255 is interpreted as character 255 in your system codepage, and then converted to Unicode (the outcome depends of course on your system codepage). It has many variants, not all involve the # notation.

    You have to use #$00FF if you want to specify a 255

    ReplyDelete
  2. It looks like a bug to me, and I do that sort of comparison all the time (not using XE6 though).  Did you test with other values (like #13 or #10 which are common tests)?   Voted.

    ReplyDelete
  3. Eric Grange oh, thanks. This makes sense. But it works just fine for code like "if AChar in [#1, #255] then" - in this code #255 = #$ff

    ReplyDelete
  4. Please attach a compiling unit test (preferably a DUnit unit containing the unit test) with regressions for #255, #$FF, #$00FF, Chr(255), Chr($FF), Chr($00FF) and similar permutations for Char(...) and I'll upvote and tag it for promotion to the internal bug system. 
    For more background on the why of these permutations: http://wiert.me/2010/01/18/delphi-highcharunicode-directive-delphi-rad-studio/

    ReplyDelete
  5. Roman Yankovsky Actually no, it doesn't work, it just looks like it works, but that's contextual
    http://www.delphitools.info/2013/11/18/unicode-leftover-bug-from-hell/

    ReplyDelete
  6. This is yet another rule to check for SourceOddity (SpaceOddity theme for Delphi), I guess :)

    ReplyDelete
  7. As Jeroen hinted there is a HIGHCHARUNICODE directive for compatibility. Do you see a difference when you turn it on and off? If so I'd say this is expected. The idea is if you are using #80 you can the Euro symbol no matter what (not that Unicode number, so it gets converted for you)... helping the code move over.

    ReplyDelete
  8. This error is since Delphi 2009. :-(

    take
    if AChar <= char(255)

    #$FF -> AnsiChar <> Unicode = #$00FF

    ReplyDelete
  9. Marco Cantù the behavior introduced with D2009 did not provide backward compatibility. All explicit character codes here broke during our migration because of implicit conversion based on system codepage (which of course depends on the system).

    Pre-D2009 the compiler took a "hands off" approach to numeric charcodes, so the code compiled in the same predictable way regardless of the system codepage it was compiled on.

    ReplyDelete
  10. Marco Cantù  As an illustration, from the doc http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/devcommon/compdirshighcharunicode_xml.html
    the backward compatible behavior would be to have a way to have both A & W be $80, but none of the HIGHCHARUNICODE modes correspond to that

    ReplyDelete
  11. No, to me the backward compatible behavior is that if you assigned a char you get the same char. This isn't a byte, but a char! Any way, I tried the following

    {$HIGHCHARUNICODE ON}

    procedure TForm3.Button1Click(Sender: TObject);
    var
      ch: Char;
    begin
      ch := #128;
      ShowMessage (ch);
      ShowMessage (Ord(ch).ToString);
    end;

    And it does work as you'd expect (in XE6, at least) the ordinal value is indeed 128. With that OFF, you get the Euro symbol (at least in Western European code page, I know) which is 8163 (or something similar).

    ReplyDelete
  12. Marco Cantù I disagree with this. If you specify a char by number in two different ways in any Delphi version it should come out the same.

    ReplyDelete
  13. Marco Cantù What you describe is what the 2009 doc says, but not what previous Delphi versions did, hence it's not backward compatible - by definition : )

    Also in the grand scheme of things, the only two purposes of explicit numeric charcodes are control characters (below 32) and codepage-agnosticity, otherwise you might as well type the literal character in the source code directly.

    ReplyDelete
  14. Jeroen Wiert Pluimers Please specify what you mean by "the same": if you display a char on the screen, should it be the same character? Or if you do some processing, should it be the same numeric value? You cannot have both because some of the numeric values between 127 and 255 are different in Windows code page (not ISO) and Unicode. In any case, Delphi have a compiler option that let's you decide what "the same" means for you, I'm not sure I understand what the suggestion. Is it changing the default value for that compiler option? Removing it?

    ReplyDelete
  15. Eric Grange It tries to be backward compatible with previous code, trying to "read" the programmer intent, which is not always possible. But there is a compiler flag to adjust the behavior. If the compiler flag works as documented, it might be a nuisance, but you cannot qualify it as a bug.
    I fully agree with you that explicit numeric char codes is quite a bad idea in the first place. But they are used by developers. Given this applies to the range 127 to 255 and there is only one "real" control char there on windows (namely 255) and a bunch of other literals Microsoft pushed into some "unused space" between 80 and 96 (hex), this is more likely the developer intent. And there is a reason developers don't use those literals: many of them don't show up in keyboards, with the Euro symbol added only more recently...
    So I do understand you point, and it is true things did change, but I still think CodeGear (back than) implemented the best migration strategy in regard to this issue.

    ReplyDelete
  16. Marco Cantù And why is char(128) <> #128?

    ReplyDelete
  17. Marco Cantù You misunderstood: for numeric charcodes, the compiler shouldn't try to be "smart" IMHO, pagecoding regular characters is okay, pagecoding numeric charcodes isn't.

    Numeric characters are not a "bad idea", they are a requirement for control characters and special cases, such as to detect the range (ASCII vs not ASCII) or for non-visible codes (no-break-space, unicode diacritics, etc.), and in all those cases you just don't want the compiler to interpret anything.

    WRT to the issue they actually implemented the worst possible strategy: not only you don't have any backward-compatible option, but you get two hard-to-predict options, both with side-effects. The result is that you have to workaround the compiler when you need to specify special Unicode chars, and you have to be extra wary of any single-character strings/chars constants (as there is another terrible design choice that was made there).

    ReplyDelete
  18. Eric Grange I totally agree with you. Did you write unit tests for this? If not: want to chat on how to set this up?

    ReplyDelete
  19. Eric Grange "for numeric charcodes, the compiler shouldn't try to be smart". Ok, so we should make the code less backwards compatible? As I wrote, numeric characters are/were used in practice in other cases, whether you like it or not.

    Having said this I agree that for AnsiChar you should have the same behavior no matter what the HIGHCHARUNICODE is, and I see that as a bug. For Char (ie. WideChar) you have two different options, providing backwards compatibility for different scenarios (and the option is indeed local and specific to a code fragment). For AnisChar, it seems it is messed up!

    ReplyDelete
  20. Jeroen Wiert Pluimers Yes, my system has Cyrillic codepage. I will write  unit tests, no problem :)

    ReplyDelete
  21. How did you set your code page? And which is the number of your code page? (I know of `chcp`, but maybe you use another way) 

    http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using

    ReplyDelete
  22. Marco Cantù I suppose the situation has become very messy, given the D2009 choice wasn't sanitized earlier. But backward-compatibility-wise, the hands off approach is the one that matches the pre-D2009 behavior.

    Neither of the HIGHCHARUNICODE option is backward-compatible, we experienced that firsthand :/

    Jeroen Wiert Pluimers there are some unit tests as part of the DWScript unit tests that stress this (though not enough as I the tokenizer had a bug in some edge cases when compiled from Delphi running on a Cyrillic system...). I think I also saw some in the mOrmot Framework (but I'm not 100% positive).

    ReplyDelete
  23. Eric Grange if you have names of source files, please let me know and I will try to integrate them into a bigger suite of tests. Those should at lease serve as knowledge to people on which coding patterns to avoid and hopefully to some of the embarcadero guys to improve their things.

    ReplyDelete
  24. Eric Grange and Roman Yankovsky any more input on unit tests yet?

    ReplyDelete
  25. Roman Yankovsky  Thanks. I'm adapting my code generator so it can generate my and your tests. Will keep you posted.

    ReplyDelete
  26. Managed to adapt the codegenerator so it can generate my own unit tests. See https://bitbucket.org/jeroenp/besharp.net/commits/1f9a3ef28f63da07e7786fa17a874bb66eb8c1c9
    Will start working on your unit tests later this week (problem: you mixed unit tests that do/don't compile with Delphi 2007 in one unit. I need to split that out first.)

    ReplyDelete

Post a Comment