Is this a delphi string bug ?
Is this a delphi string bug ?
declaring a type as short string (subtypes) cause it's variable to be non unicode :
type
SmallString = String[20];
var
SmStr: SmallString;
Str: String;
begin
Str := 'UnicodeString';
SmStr := Str; // SmStr is no longer Unicode !
SmStr := SmStr;
end;
here is the sequence of allocated bytes (variable Str):img1.
https://lh5.googleusercontent.com/-4lTgu3CbG_Q/VETvaOtUVrI/AAAAAAAAAUE/Hj34jzJmv8g/s219-p/unicode.PNG
and img2 for the sequence of allocated bytes (variable SmStr)
https://lh4.googleusercontent.com/-uKcgkITdwBw/VETv79YasLI/AAAAAAAAAUM/SKVqHl1dpn8/s219-p/nounicode.PNG
Tested using the latest delphi version (XE7).
Any explanation ?
declaring a type as short string (subtypes) cause it's variable to be non unicode :
type
SmallString = String[20];
var
SmStr: SmallString;
Str: String;
begin
Str := 'UnicodeString';
SmStr := Str; // SmStr is no longer Unicode !
SmStr := SmStr;
end;
here is the sequence of allocated bytes (variable Str):img1.
https://lh5.googleusercontent.com/-4lTgu3CbG_Q/VETvaOtUVrI/AAAAAAAAAUE/Hj34jzJmv8g/s219-p/unicode.PNG
and img2 for the sequence of allocated bytes (variable SmStr)
https://lh4.googleusercontent.com/-uKcgkITdwBw/VETv79YasLI/AAAAAAAAAUM/SKVqHl1dpn8/s219-p/nounicode.PNG
Tested using the latest delphi version (XE7).
Any explanation ?
The documentation says, "For a short string or AnsiString, S[i] is of type AnsiChar." ( http://docwiki.embarcadero.com/RADStudio/XE7/en/String_Types_(Delphi) )
ReplyDeleteSo I'd say it's not a bug that it loses Unicode and becomes single-byte - unless there's no compiler warning about it.
Interestingly the documentation also says a short string is statically allocated at 256 bytes, even if the size dynamically changes (because it can only vary between 0-255 bytes long.) I didn't know that; I always thought if I declared a string[32] it was up to 32 characters long and took only 32/33 bytes of memory.
What are you using it for? Unless it's a backward-compatible record written to disk or something similar, an ordinary string is probably better. Otherwise, some variation of using an array of (Unicode) characters that you convert to/from a string?
Isn't short string a Pascal string from the good old days? It is essentially an ANSI string with length stored at the zero indexed byte.
ReplyDeleteDavid Millington ShortString is equivalent of saying string[255] and therefore its allocated size will be 256 bytes. If you declare string[10], its size will be 11 bytes.
ReplyDeleteDalija Prasnikar So it is (I say while furiously reading the docs. http://docwiki.embarcadero.com/Libraries/XE7/en/System.ShortString )
ReplyDeleteThe more you learn... thanks for the correction :) I always though a declaration like "string[32]" was a "short string". I have never actually used them since, I suppose, Delphi 1, which is too far away for memory.
There is a compiler warning about an implicit cast and potential data loss.
ReplyDeleteDavid Millington There is a small difference between the "type ShortString" and a "short-string type".
ReplyDeleteDavid Millington "I always thought if I declared a string[32] it was up to 32 characters long"
ReplyDeleteThat's true : declaring a string[32] will end with 33 bytes of memory .
BTW , the compiler will not allow you to write something like that :
type
SmallStr = AnsiString[32];
Dalija Prasnikar The issue isn't from the allocated size .
ReplyDeleteit's about the lose of unicode .
Mahdi Safsafi That is because short strings are legacy type from pre-Unicode versions of Delphi. Actually, their roots go all way back to Turbo Pascal. They store 8-bit characters in Ansi encoding and are not suited for storing Unicode characters.
ReplyDeleteI feeling myself old reading this question :(
ReplyDeleteDalija Prasnikar disturbingly, you can store unicode in them very easily because they're known-length strings and can have nulls in them. Delphi will work perfectly well if you put UTF-8 strings in provided you set the length to the byte-count of the string not the char-count. UTF-16 or wide-char can be done but it's harder, and UTF-32 is kinda pointless but again can be done.
ReplyDeleteNot that I would ever do such a horrid thing. Not me.
Wouldn't a statically sized array of Unicode characters be the best solution here, if you cannot use a normal string for some reason? Known size (good for streaming / saving, records, etc), can be easily converted to/from string, no data loss.
ReplyDeleteDavid Millington define "normal string"... the default has changed twice in the time I've been using Delphi.
ReplyDeleteMoz Le "Normal string" is whatever you get when you write "string".
ReplyDeleteUnicode since 2009, non-Unicode before that. (Going way back 20 years, I think D1 had a simpler implementation similar or identical to shortstrings, and I don't even remember Turbo Pascal's strings. But the only change since D1->D2 in 1996 is Unicode, and that is a good one.)
David Millington huge strings were introduced with Delphi 2 and didn't become the default for a while.
ReplyDelete