Is this a delphi string bug ?

Is this a delphi string bug ?
declaring a type as short string (subtypes) cause it's variable to be non unicode :

type
  SmallString = String[20];
var
  SmStr: SmallString;
  Str: String;
begin
  Str := 'UnicodeString';
  SmStr := Str; // SmStr is no longer Unicode !
  SmStr := SmStr;
end;

here is the sequence of allocated bytes (variable Str):img1.
https://lh5.googleusercontent.com/-4lTgu3CbG_Q/VETvaOtUVrI/AAAAAAAAAUE/Hj34jzJmv8g/s219-p/unicode.PNG

and img2 for the sequence of allocated bytes (variable SmStr)
https://lh4.googleusercontent.com/-uKcgkITdwBw/VETv79YasLI/AAAAAAAAAUM/SKVqHl1dpn8/s219-p/nounicode.PNG

Tested using the latest delphi version (XE7).
Any explanation ?

Comments

  1. The documentation says, "For a short string or AnsiString, S[i] is of type AnsiChar."  ( http://docwiki.embarcadero.com/RADStudio/XE7/en/String_Types_(Delphi) )

    So I'd say it's not a bug that it loses Unicode and becomes single-byte - unless there's no compiler warning about it.

    Interestingly the documentation also says a short string is statically allocated at 256 bytes, even if the size dynamically changes (because it can only vary between 0-255 bytes long.) I didn't know that; I always thought if I declared a string[32] it was up to 32 characters long and took only 32/33 bytes of memory.

    What are you using it for? Unless it's a backward-compatible record written to disk or something similar, an ordinary string is probably better. Otherwise, some variation of using an array of (Unicode) characters that you convert to/from a string?

    ReplyDelete
  2. Isn't short string a Pascal string from the good old days? It is essentially an ANSI string with length stored at the zero indexed byte.

    ReplyDelete
  3. David Millington ShortString is equivalent of saying string[255] and therefore its allocated size will be 256 bytes. If you declare string[10], its size will be 11 bytes.

    ReplyDelete
  4. Dalija Prasnikar So it is (I say while furiously reading the docs. http://docwiki.embarcadero.com/Libraries/XE7/en/System.ShortString )

    The more you learn... thanks for the correction :) I always though a declaration like "string[32]" was a "short string". I have never actually used them since, I suppose, Delphi 1, which is too far away for memory.

    ReplyDelete
  5. There is a compiler warning about an implicit cast and potential data loss.

    ReplyDelete
  6. David Millington There is a small difference between the "type ShortString" and a "short-string type".

    ReplyDelete
  7. David Millington "I always thought if I declared a string[32] it was up to 32 characters long" 
    That's true : declaring a string[32] will end with 33 bytes of memory .

    BTW , the compiler will not allow you to write something like that :
    type
      SmallStr = AnsiString[32];

    ReplyDelete
  8. Dalija Prasnikar The issue isn't from the allocated size .
    it's about the lose of unicode .

    ReplyDelete
  9. Mahdi Safsafi That is because short strings are legacy type from pre-Unicode versions of Delphi. Actually, their roots go all way back to Turbo Pascal. They store 8-bit characters in Ansi encoding and are not suited for storing Unicode characters.

    ReplyDelete
  10. I feeling myself old reading this question :(

    ReplyDelete
  11. Dalija Prasnikar disturbingly, you can store unicode in them very easily because they're known-length strings and can have nulls in them. Delphi will work perfectly well if you put UTF-8 strings in provided you set the length to the byte-count of the string not the char-count. UTF-16 or wide-char can be done but it's harder, and UTF-32 is kinda pointless but again can be done.

    Not that I would ever do such a horrid thing. Not me.

    ReplyDelete
  12. Wouldn't a statically sized array of Unicode characters be the best solution here, if you cannot use a normal string for some reason? Known size (good for streaming / saving, records, etc), can be easily converted to/from string, no data loss.

    ReplyDelete
  13. David Millington define "normal string"... the default has changed twice in the time I've been using Delphi.

    ReplyDelete
  14. Moz Le "Normal string" is whatever you get when you write "string".

    Unicode since 2009, non-Unicode before that. (Going way back 20 years, I think D1 had a simpler implementation similar or identical to shortstrings, and I don't even remember Turbo Pascal's strings. But the only change since D1->D2 in 1996 is Unicode, and that is a good one.)

    ReplyDelete
  15. David Millington huge strings were introduced with Delphi 2 and didn't become the default for a while.

    ReplyDelete

Post a Comment