Bad surprise of the day: SysUtils.TEncoding in XE2+ defaults to ANSI, while in XE it defaulted to UTF-8.
Bad surprise of the day: SysUtils.TEncoding in XE2+ defaults to ANSI, while in XE it defaulted to UTF-8.
Among other things this means that TStringList fails to load UTF-8 text files that don't have a BOM, which is very common nowadays.
Wasn't ANSI already obsolete 10 years ago?
Among other things this means that TStringList fails to load UTF-8 text files that don't have a BOM, which is very common nowadays.
Wasn't ANSI already obsolete 10 years ago?
For Windows it got obsoleted like 15 years ago: http://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
ReplyDeleteFor Delphi much more recent.
I'd almost say "QC this", but I won't be using QC any more anytime soon.
Jeroen Wiert Pluimers - Why not?
ReplyDeleteEric Grange Eh? On XE TEncoding defaults to TMBCSEncoding.Create(GetACP, 0, 0); which is what ANSI does.
ReplyDeleteStefan Glienke I've not investigated in detail. Bumped it after noticing the same source code in XE successfully loads BOM-less UTF-8 text files, but doesn't in XE2. I stopped looking after noticing that TEncoding.GetDefault was hard-coded to ANSI in XE2+.
ReplyDeletePlenty of strange to go around. Have you seen this? http://blog.spreendigital.de/2014/05/13/exception-while-using-tstreamreader-with-tzipfile/?utm_source=rss&utm_medium=rss&utm_campaign=exception-while-using-tstreamreader-with-tzipfile
ReplyDeleteEric Grange It is not more hardcoded to ANSI in XE2+ than it was before. The code that sits in GetANSI on XE2+ is the same code that was in GetDefault before. The problem with the GetDefault in XE is that it would not work on any platform except Windows. In XE2+ it defaults to using the ACP on windows and UTF-8 on UNIX.
ReplyDeleteEdit: I can confirm that also on XE it does not work to read a UTF-8 file without BOM into a stringlist because it simply uses ANSI.
Stefan Glienke Indeed, you're right. The issue must be deeper somewhere. Don't have time to investigate too much, I'm bypassing the RTL now (also have to work around the limitation that for utf-8 the TEncoding.GetString method returns an empty string if one character in the buffer isn't utf-8)
ReplyDeleteI wouldn't trust the RTL at all with loading non-ascii text, we've had it hang on invalid UTF-8 codes more than once.
ReplyDeleteAsbjørn Heid "we've had it hang on invalid[...]"
ReplyDeleteBad that it hangs, however i'd be more concerned if it missbehaves on valid utf8
Alexander B. Well that would be worse. But hanging is not nice when you're writing services.
ReplyDeleteUm so change your post title.
ReplyDeleteA. Pis Run-Time Library
ReplyDelete