A small tip about Character Encoding:


A small tip about Character Encoding:
If you directly write any Non-standard ASCII characters (>#127) in source code, please save the file as Unicode such as UTF8/UTF16. Otherwise, the source file may not be compiled. 

The following function was copied from SynEdit project (SynEditSearch.pas), it works well in CP-1250, but you might be surprised what it looks like in other code pages (like CP-936, Simplified Chinese). 

You see, there is not even ending single quote char. Do you know why? Take the character "°" for instance, it was saved as a singel byte #176 in the source file. This byte is regarded as a leading byte in DBCS code pages (as it is greater than 127) and the subsequent single quote byte will be taken as trailing byte so that the two bytes will be interpreted as a character. As this combination is invalid in current code page, it was translated with a question mark "?". So ... compiler failed.

In addition, this kind of function is also not friendly(correct) in Unicode world.

Comments

  1. or...

    function lettersNotAccents(Str:String):String;
      type
        USASCIIString = type AnsiString(20127);
    var
       C:Char;
    begin
      result:='';
       for C in Str do
         result:=result+String(USASCIIString(C));
    end;

    ReplyDelete
  2. Carlos Eduardo Paulino Hah, there is a misunderstanding. It doesn't happen in run time but
    Compiling time. (The source code can not even be compiled...)

    ReplyDelete
  3. sorry did not read the post properly before commenting: D

    ReplyDelete

Post a Comment