To "fork" the language strings or to keep them unified?

To "fork" the language strings or to keep them unified?
http://www.delphitools.info/2013/10/08/utf-8-utf-16-or-both-poll/

Comments

  1. why not stick with UTF-8 everywhere? me no likey forky

    ReplyDelete
  2. UTF-8 everywhere would mean a conversion overhead in Delphi, and a very significant overhead for Smart/JS.
    I want FreePascal, but I don't want to sacrifice Windows Delphi or JS for it ;-)

    ReplyDelete
  3. UTF8 is a variable-length encoding (UTF16 as well, but we can ignore variable part safely), so it's not very handy. UTF8 is to be used when the "backend" is ASCII-based and can not work with UTF16.

    ReplyDelete
  4. Eugene Mayevski there is nothing safe about ignoring the variable part of utf-16 if you deal with Chinese or Apple text (which can use combining for accented characters)

    ReplyDelete
  5. What's wrong with Apple? Did they steal the alphabet as well?

    ReplyDelete
  6. Apple had different conventions and practices wrt to Unicode normalization and equivalence (in simple terms in Unicode "é" can be precomposed as a single character or an "e"+combining diacritic ). When dealing with Unicode text, be it Chinese or a non-precomposed character, it means that a character, even a Latin one like "é", can be made of two WideChar, which shouldn't be separated (but can be normalized, though normalization can be lossy as not all combinations have a pre-composed character).

    For more details, see http://en.wikipedia.org/wiki/Precomposed_character and http://en.wikipedia.org/wiki/Combining_character

    ReplyDelete
  7. Eugene Mayevski "we can safely ignore the variable part"
    That’s like saying "what the heck, let's just stick to ANSI".
    Read this: http://www.utf8everywhere.org/#faq.almostfw

    ReplyDelete
  8. When I was offering Unicode support for non-Unicode Delphi with ElPack, it was very popular among chinese users. Nobody ever complained about missing composite characters or other problems with chinese. So let's split theory from practice.  Well, if one wants to fight about from what edge to start breaking the egg, I quit.

    ReplyDelete
  9. Interesting enough, non-BMP characters have become quite common thanks to twitter ;-)
    http://stackoverflow.com/questions/5567249/what-are-the-most-common-non-bmp-unicode-characters-in-actual-use

    Eugene Mayevski  If your components were targeted at Wintel, they didn't have to care about utf-16 endianness or normalization (as for nobody ever complaining, google tells otherwise)

    ReplyDelete
  10. Would a fork just manifest itself as having to download the appropriate version of DWS for each platform?  i.e. No source code changes for a developer?

    If yes, then it seems the way to go from the standpoint of performance.

    ReplyDelete
  11. Kevin Powick It would manifest itself as the same source code on the Delphi and FreePascal side, but with slight incompatibilities on the scripting side. The script would match the encoding of the platform that compiled it, rather than have an encoding independent from the platform.

    ReplyDelete
  12. Eric Grange I guess Z̫̜̬̘̩̬͎͙̻͋ͦ̓͗͝͠a̟ͩ̂͆̀l͙̟͓͓͆͢͞g̸͍̙̻͍͔͌ͬ̇ͦͤ̈́̽͘o̱̮̫̟̪̥͙͚ͯͤ̑ͨ̋ ̝̤͈͕̱̭̞͖ͫͫ͢͞͞t̶͓̤͈̣͖̓̎̔̎̓̋̓ͮ̈́͜ͅh̛̥͎̦̖̭̋͂̊ͮ͑́͋͡e̹̲̩̬̮͔̗̦̿̿̋ͨͪ ̷̥̫̰ͦ̋̀G̵̬̜̲̈́̈́ͤ̅͜͜r̾ͦͦͧ̐ͣͬ̒͘͏̴̰͙ȩ̼̭̥͔̳̿̽̿̉́ͅa̶̶̳̲̼͕ͧͩ͘t͕̯̫͇̩̳̤͖ͫ͋̎͋̾ͩ͒̈́͒͢͜͝ͅ ̬͈̜͉͔̺͕͂̆͞O̲̙͖̻̲̠̙̔͂͟͝ͅn̞̈́͐͠e̛̤͖ͨ͐̌ͯͩ͋̌ͤ͞ has been a bit responsible as well...

    ReplyDelete
  13. (interesting; on my Win7 machine, Chrome 28 completely messes up Zalgo the Diacritical One, whereas Firefox 24, Opera 12.16, Safari 5.1.7 and IE 9.0 render it without a hitch).

    ReplyDelete

Post a Comment