Need 8-bit strings on Delphi mobile compiler?

Need 8-bit strings on Delphi mobile compiler?
Please cast your +10 votes for QC 119501
http://qc.embarcadero.com/wc/qcmain.aspx?d=119501

Comments

  1. I can keep repeating it, but to have any chance, we need a description of why this is needed. The only answer in the non-tech thread is for compatibility with existing code. Reading/writing UTF8 is covered by encoding. Managing raw byte strings in memory is covered by TBytes.

    Is AnsiString on mobile a requirement? You'll have to convert for input/output. Is there an actual scenario?

    UTF8String, this one I can understand but most existing Delphi libraries don't handle variable length processing UTF8 requires. Again, which is the precise scenario? Processing UTF8strings without converting them?

    RawByString makes little sense also on desktop, and never used in libraries. And very difficult to use...

    Is TBytes limited as an array of 1-byte elements type? Does it need to be augmented in speed and features? This one I fully understand and would be willing to push, for example. But I think it won't be well received...

    If we can go beyond the drama and discuss the specifics, it can be helpful.

    ReplyDelete
  2. Marco Cantù 
    Lots of industrial equipment uses plain ascii interfaces. Hooking up to those interfaces with a mobile is a business opportunity for us. Semantically it is wrong to use TBytes for that. It seems as your vision for what Delphi can do is limited. If the direction of Delphi for Mobile is towards mass consumers (on the application side), that train left the station years ago. Please support professional developers working for industrial and retail solutions.

    ReplyDelete
  3. Marco Cantù just to avoid such code !

    uses
      Androidapi.Log;

    procedure Log(const Str: string);
    var
      M: TMarshaller;
    begin
    // Oh dear ! give us back our AnsiString !
      LOGI(M.AsAnsi(Str).ToPointer);
    end;

    ReplyDelete
  4. Leif Uneus Makes sense, but can you be more specific od industrial equipment you can hook to your Android/iOS devices? We support conversion with Encodings, TBytes (not sure why that would be wrong), and other approaches for reading/writing AnsiString. Do you need to do specific in-memory processing of that data? Store into a DB and or show in user interface?

    ReplyDelete
  5. Paul TOTH 

    And LOGI (Pointer (PAnsiChar (AnsiString(Str)))) would make your day? I see your point, but how common is that code?

    ReplyDelete
  6. Marco Cantù 670 votes on the QC, 47 comments on your blog; these comments represent developers with millions of lines of code. I'm one of them.

    Yes, these applications use AnsiStrings and it's predecessor String, as it's a type that's existed in Pascal since the beginning. This codebase is why Delphi still exists, as the "contract" with developers was established decades ago that allowed my code (rtl) from 20 years ago to compile today. Arbitrarily deciding to drop support for a fundamental type is breaking this contract and represents a point-in-time where Delphi no longer supports my Pascal code. 

    You could continue to ask us to prove why we need AnsiStrings, like it's some kind of academic interview question, but I'll always say it's because my existing Pascal codebase needs to compile in Delphi. Why? Because developer-time == $$$. The labor required to audit the codebase and make changes represents a significant amount of money. Much, much more than what EMBT would charge for their Recharging, Upgrading, SLA, etc. 

    Developer Labor is more expensive than your product and I can't let a product determine how I spend my labor. Delphi is a tool, and if Delphi tells me I need to use a nail-gun vs a hammer; then I'll make my own determination and most likely go with the hammer, if I don't need a nail-gun.

    I fear Borland -> Inprise -> CodeGear -> Embt has lost the memory of how they have "stayed in business" all these years. It's certainly not because of how the product is managed and I think that needs to be evaluated.

    ReplyDelete
  7. Marco Cantù I have the feeling that whatever I might say will not be convincing enough. If numerous posts in non-tech, comments in QC, blogs with various use cases are not enough. Also this request is supported by some TeamB and other more prominent members of Delphi community.

    But here are some points I would like to bring out:
    1. 8-bit strings are not just matter of legacy code. UTF8 is very much matter of present and the future code. 
    2. We are really not talking about adding new feature to the language but re-enabling existing hidden one.
    3. 8-bit strings are not blocking language evolution and new compiler features.
    4. There are many reasons (beyond legacy code) why would people process UTF8 (or any other encoding) directly: memory consumption, performance, need to keep original Unicode code points, interfacing and communication with hardware or other software, where you need to have fast responses and yet need to process that textual data in some way (TBytes doesn't fit well here)
    5. TBytes are not fit to replace 8-bit strings that handle textual data. There is a reason why people were using 8-bit strings instead of array of bytes.
    6. 8-bit string replacement libraries are not fully compatible with existing desktop compiler 8-bit strings, and are also less performant. In addition rather soon multiple replacement libraries will emerge and you can easily end up using different incompatible replacement implementation of 8-bit strings through 3rd party code. 
    7. Delphi has always been alternative to C++ and not C# or Java.

    If that is not enough, the last and the most obvious reason is there are zillions and zillions lines of code in the wild that would otherwise be working perfectly on mobile compiler and would not have to be changed at all. Changing that code costs us real money and real time. Both doesn't come in infinite quantities and for some those costs can be greater than benefits.

    To be honest, Delphi is late to the mobile party. Some would say, better late than never, but when you are late you should use existing code base to give you momentum and as additional reason why would your customers use Delphi for mobile and not as additional reason against it.

    ReplyDelete
  8. Vin Colgin AnsiString didn't exist since the beginning. In fact, they didn't exist in Delphi 1 and were introduces in Delphi 2. It has been a lot of time, I know, but that change broke some existing code... Unicode string are in since Delphi 2009. You say that your 20 years ago code base need to recompile. But I doubt it will work properly if it is not Unicode aware. If you plan ignoring Unicode on mobile, I doubt it is a great idea. That's why I'm asking for use cases (not to prove you need it).
    I know that change is expensive, but there are times you cannot work around it. Are you asking us to port the VCL to Android for preserving the code? I haven't heard that often, and for good reasons. But how comes "not moving to Unicode" is such a compelling reason, when you are ready to redo the user interface?
    Andreas unlocked a System definition, but does everything work after that change, or do you have a type without a working RTL? The latter is of little use...

    ReplyDelete
  9. I make a product. That product is hardware based, which has a serial based I/O. The data going back and forth is US-ASCII (0x20 to 0x7f). The UI is not written in Delphi. The program uses AnsiString to communicate. to the serial device. The end.

    When Delphi XE5 Update 2 came out and removed AnsiStrings, I can not longer compile the code with the latest edition of Delphi. Now, I can't continue using Delphi. I have no need for Mobile. Is that clear enough example?

    ReplyDelete
  10. Dalija Prasnikar I know this request is supported, and see all you mention, but not "blogs with various use cases". These is what I'm looking for and cannot find. 
    1. Supporting UTF8 and supporting Ansi is not the same thing. People asking for 8bit strings want the former or the latter or both? How much Delphi library code supports UTF8? Very little to my knowledge. More than willing to be proved wrong
    2. No, you are looking for new RTL code to be written (for example converting strings in Windows uses native APIs) and a significant Q&A effort at the compiler and RTL level. Remember, the compiler uses a totally different engine, so things might work differently.
    3. Removing unneeded complexity from a language helps evolution. Sure, if the complexity is needed better keep it...
    4. This is were I'd want more details, because I'm not convinced. Unicode is an extra cost if you don't need it, but can you afford to ignore it? Reading and saving in other formats we handle with Encodings. TBytes is what we use for encodings, you have full access to that level. Unicode string preserve code points. I agree on the memory usage, extra processing. But not sure if this is true compared to UTF8.
    5. No, people weren't using TBytes because it didn't exist. And it is not backward compatible with old versions of Delphi, which is not a great point to me.
    6. On this one I'm 100% with you and It is something we can consider. I'm pretty sure that if we release it, most people will just belittle it.
    7. C++ has no strings at the compiler level. It has no concept of efficient COW strings. The only language with two string types (8 and 16 bit) I know of it Python, and they deprecated one. This doesn't mean we should not add a feature if other languages don't have it. But I doubt this is a really blocking issue.

    And about the libraries, as I just wrote, is that AnsiString handling code going to work with Unicode strings? Does it support UTF8, or needs to be fully rested against it? Unicode is a completely new world for string processing, whether developers like it or not. You say that code "working perfectly on mobile compiler and would not have to be changed at all". I'm asking for a sample of that code....

    ReplyDelete
  11. Marco Cantù
    Marco, it is simple. We need a fast 8Bit String Type. Not talking about ANSI Codepages or UTF8, simply the 8Bit string Type with COW etc... Tbytes will not fit in there without COW etc.... I'm now also a long time customer (since TP 3.0) but at Moment i don't have the feeling your company is on the right way. By the way, im often in Alicante maybe we can drink a Coffee one day there an discuss these directly....

    ReplyDelete
  12. Vin Colgin I understand your program runs on Windows, and Delphi XE5 Update 2 on Windows has the same AnsiString support it had in the past. If not, that's a bug we should address. Here we are discussing AnsiString in the LLVm mobile compiler for iOS and Android, that you say you are not interested it. Or am I missing something?

    ReplyDelete
  13. It's all about the codebase and using the codebase on new devices. 

    - The code is written in Pascal.
    - The code currently is targeted to Windows
    - The code needs to target Android, as the device needs to move from windows to android
    - I need to compile my codebase to the new android target
    - I can't compile it, AnsiString isn't supported by Delphi XE5 Update 2 and I can't target to non-Windows devices

    It's a simple case of trying to utilize an existing codebase that is targeted to multiple platforms. 

    I assume François Piette has the same problem with ICS. He deals with RFC protocols that are not UNICODE, specifically RFC 959 and makes heavy use of AnsiString. A simple grep of the source code shows 971 references to AnsiString. Sucks to be him.

    ReplyDelete
  14. Marco Cantù 
    My situation is very much like Vin Colgin .
    Industrial ascii protocols with existing codebase that has been working for 25+ years (TP) and still to large extent is reused on XE5 without rewrite. And we want to use this code for Mobile applications. Example protocols are Modbus ascii, NMEA 0183 and so many more (including support of our own equipment).

    ReplyDelete
  15. I just want to repeat this excellent point:
    7. Delphi has always been alternative to C++ and not C# or Java.
    Delphi is (was?) an unique language combining very low level capabilities (Asm/C-level) with very high level features (C#/Java level). That is why there exist a huge amount of in-house data processing and hardware control systems written in Delphi and highly utilizing its unique low-level power (obscure binary data formats, serial communications like ModBus/Profibus/etc or CAN bus access being only part of this area). Surely you will not see these applications in the retail market yet they do exist. The codebase for these systems usually has years of legacy, development and other know hows applied. If (when) there arises need to support some mobile additions to these systems, it is expected to reuse existing code for the most part, not to rewrite/reimplement complex algorithms with thousands of lines of code just because some compiler guy decided that some language features suddenly became "obsolete" for no reason.
    And yes, I feel that the current direction taken by the Delphi development team (as I understand it, making Delphi a Java clone with just a slightly different syntax and data access/visual library appended, replacing plain functions with record helpers - silly IOUtils unit! etc.) is not the one I would like it to be. That is why neither I nor my co-workers have updated to any new version since XE, moving more and more towards FPC/Lazarus and C++. I wish some commitee issued the modern Object Pascal language standard to force Embarcadero support the features which are really required, not the ones they think are...

    ReplyDelete
  16. Marco Cantù 
    You're right, Unicode is there seens D2009...but you know very well that many people remained on D7 before the arrival of the multiplatform.
    The need for 8bit-string has nothing to do with UTF8 nor VCL,  even in TP (short)string  built-in functions were a very convenient way to handle 8bit chars for any purpose. Did you notice that all the Internet protocols are using 8bit chars, even when they use some binary encoding like TELNET where #255 should NOT be codepaged.
    And one of the big advantages of AnsiString is all those built-in functions; TByte can only store data, each one has to build it's own functions to manipulate it.

    ReplyDelete
  17. Alexander Elagin Delphi remains an alternative to C++. These are the only two languages that allow you to compile native code on both iOS and Android, AFAIK. ASM support for multi-device is not technically possible, and the need to ASM code reduced with CPU speed progress and some compiler progress. Still great, but in niche areas. Asking us for compiler improvements is a good point. You can still handle binary data, use pointers, use array of bytes, writing low level code has nothing to do with AnsiString, which is only a convenience. I understand the legacy codebase and this is why AnsiString exists and is staying around on Windows many years after Delphi embraced Unicode. Removing it will be a big mistake. But the need of migrating that exact legacy code on mobile is a theoretical need or a practical one? I still haven't got a concrete scenario. As for moving complex algorithms with thousands of lines of code to a mobile app, I have some reservations it makes a lot of practical sense, but I might be wrong.
    Delphi is not making into a Java clone. Records helpers, for example, are not a Java feature. Since you mention XE, I understand your opinion has nothing to do with our mobile offering. As for a user-driven standard, I read it as a joke. Would be the first ever.

    ReplyDelete
  18. Paul TOTH I know about the need to handle UTF8 and other 8bit chars for Internet connectivity. As you probably know, Indy works quite smoothly on our mobile compilers (yes, this required some great work but its maintainers). I'm fully aware it is much easier to to processing on AnsiString than on TBytes, but this is something we are more than willing to address. But the question is, is this the REAL requirement? Would there be consensus that a fast and complete AnsiStrings like support for TBytes would answer the concern about the lack of 1-byte strings on mobile? Or the requirement is AnsiString as they used to work in Delphi 7 or nothing? I'm really asking, if not I won't have got involved into this conversation...

    ReplyDelete
  19. Marco Cantù
     OK, a real world example. A few months ago I had to convert one of my applications which previously run on Windows XP Embedded to make it run on embedded Linux and, more to say, bare metal ARM board (no OS). Luckily, the application was written in C++ and all I had to do was to rewrite some very low-level staff working with the CAN and MVB buses, timer support and similar things for the Linux version and for the ARM version. All the data processing functionality, including binary and textual (8-bit) data manipulation, remained untouched and was just recompiled. Other applications written in Delphi mostly have visual parts and if necessary will be recompiled using Lazarus to target Linux, also practically without core functionality rewrite. If Delphi provides the same level of code compatibility between different platforms, I'll consider using it again for new projects. In its current state - no.

    ReplyDelete
  20. Ok Marco Cantù let's take a look on Indy. in TIdHTTP, the request header is builded in UnicodeString until the final magical function ToBytes() that converts everything to TIdBytes. Isn't it a waste of time and resources especialy for small devices ?
    And what about reading ? well, there's a beautifull TIdBuffer.IndexOf() function to replace the lost AnsiString Pos(). Is that what you expect from us ? reinvent the wheel ?

    I really prefers to write 
     s := 'escape with '#27;

    then 
     b := TBytes.Create(101, 115, 99, 97, 112, 101, 32, 119, 105, 116, 104, 27);

    or
     b := ToBytes('escape with ') + 27;
    with some magic helper that will fails with
     b := ToBytes('escape with ') + 27 + 13 + 10;

    ReplyDelete
  21. Marco Cantù
    1. I cannot speak for others, but in my case I am talking about UTF8String and RawByteString. You can easily support those on every platform you like. Also some Ansi encodings are natively supported on Android and iOS. If you cannot support full Ansi spread of encodings, I am pretty sure that people will be able to understand that those are tightly connected to Windows. But that is not the reason to ditch all 8-bit string support.

    How many libraries work with UTF8? You can start with PCRE that is shipped with Delphi. Yes, I know you had to convert it to use Unicode strings (and I haven't checked how good that works), but originally that library is built around UTF8String. My library is built on top of UTF8String. Just because you don't know every single line of proprietary code out there that doesn't mean that people are not using UTF8String and others. 

    2. How does some work on part of your compiler/RTL team compare to all the work your customers must do instead. We are buying Delphi not to make your life easier but ours.

    3. It is needed badly... 

    4. I am not ignoring Unicode, I was using UTF8 in Delphi 7 days when Unicode support was just a blimp on Delphi's roadmap. My whole library is fully functional under Unicode versions of Delphi, including latest ones. 
    That library, same code without any IFDEFS when it comes to using string types, also works perfectly under OSX so it is not married to the Windows platform.

    Most textual data consists of ASCII characters and those are represented with only single byte in UTF8 vs two bytes in UTF16. So most of the time UTF16 costs you twice as much as UTF8 in memory terms. 

    Conversion from UTF8 to UTF16 and back can destroy original Unicode points. I don't have particular example available at the moment, but there were discussions about that in non-tech.

    5. TBytes and dynamic array of bytes are basically the same thing. They are not suitable replacement when it comes to processing textual data with 8-bit encodings.

    6. So you would rather waste your time on incompatible, lesser replacement for 8-bit strings than to re-enable it in compiler where it belongs. I'd rather not comment further.

    7. Why do you think people have chosen TP and then Delphi. Because it had stellar string support built in.

    ReplyDelete
  22. Vin Colgin I began using TP when it was first released, on CP/M. I began using Delphi at the beginning, with D1. Through most of my history with both tools, my work has been in what I will refer to as process control systems. My work area has been television broadcasting, but it is not the only area, by far, in which 8-bit ASCII (and sometimes binary) comms ruled.

    Although my work now is in more conventional desktop areas, for many years, I did almost nothing with database, and worked almost every day with machine control protocols. 

    UARTs are still commonplace in embedded processors. Yes, mobile devices are a huge and growing market, but I know from my own experience that many of the process control systems are written in Delphi. Perhaps this comes from Delphi having been first to combine structured exception handling with a very capable visual designer. 

    Whatever the reason, if Delphi moves away from support for serial comms--in this case, convenient 8-bit string support--it removes itself from consideration for such systems.

    ReplyDelete
  23. Marco Cantù I am currently having trouble finding examples about loosing code points with UTF8-UTF16 conversions, so I may be wrong here. But currently don't have too much time to investigate further on this topic.

    ReplyDelete
  24. Take a component like NativeXML (which is fully unicode compatible) with 10.000 lines of code and 639*Utf8String, 22*RawByteString, 33*AnsiString. It was really easy to get it working on OSX. But if I wanted to use it for iOS, I'd have to wait until the author makes it compatible with iOS (and I doubt this is going to happen anytime soon) or pretty much understand the logic myself and convert it on my own expenses. So lets just assume that I spent weeks on upgrading this and fixing all the bugs, then a new version of NativeXML comes out (without iOS support), then I'd have to do it all over again.

    So in my case the issue is not that I'm unable to remove the AnsiString, AnsiChar, PAnsiChar, ...  from my own code. The problem is the amount of existing components where I just don't have the resources to change them.

    ReplyDelete
  25. Sebastian Zierer That's a good point. On the other hand we've seen moving other XML engines to mobile as simple operations because they used UTF 16 internally, rather than UTF8.

    ReplyDelete
  26. Marco Cantù I think you just want understand...you should also remove Byte, who is interested to store data in such a narrow place when 64bits registers are so nice ?

    ReplyDelete
  27. Overall my understanding is there isn't one requirement by many:
    - ability to process UTF8 as a native string type (my worry here is most developers will still think in terms one byte = one char = one codepoint, but this is mostly irrelevant). In other words, UTF8 with no specific library support, will that be good?
    - ability to use RawByteString (not sure, as we are trying to hide it also on the desktop side, as it is really mostly irrelevant)
    - ability to use AnsiString for protocols, hardware integration, and the like, to do native processing and avoiding converting to/form unicode and avoiding the slower/less clean TBytes
    - ability to use ShortString (which don't even fall into the request of the QC, as they are not COW strings) to manage string in record with fixed size (again a good old static array of bytes...)

    So overall, there isn't one requirement. There are several. This isn't ideal...

    PS: I don't understand the complains on XE5 Update 2 for Windows, where all string types are still available as usual. Nothing was removed from the Windows (and Mac) compiler.

    ReplyDelete
  28. Marco Cantù  Yes, you get it right. There are different string types, they all have their application domains and cannot be easily replaced with each other - that was the reason they were created. Even the short string is really useful in records, as you have noted. The situation is absolutely similar to different numerical types - bytes, words, dwords and floats all have their use; you cannot easily change Double to Byte or vice versa. The ability to use the most efficient string (or data in general) type in each specific case has always been one of the strongest points of Delphi. Losing it in favour of dubious 'simplification' will severely damage the already shrunk Delphi application domain in favour of C++ or even FPC.
    To summarize: never forget about code efficiency, this is the corner stone of any software. Using suboptimal data types damages efficiency badly.

    ReplyDelete
  29. Marco Cantù In my code you can probably find all of these four requirements.
    In my opinion, using TBytes for (Ansi-)Text is a bit like using (Ansi-)String for data.
    What does UTF8 "with no specific library support" mean?
    I often see that strings are converted between UTF8 and string when using them on native OSX functions (probably for iOS, too). I think a native UTF-8 string type could be used for improving speed (if used correctly). I'm not complaining that this doesn't work on OSX. But I can't use UTF8String for OSX if I can't compile this on iOS.
    I'm also using AnsiString or ShortStrings or array[0..x] of AnsiChar for protocols or records with fixed sizes.

    Which of these is easier to read:
    var
    Header: array[0..7] of Byte;
    AStream.ReadBuffer(Header, SizeOf(Header));
    if (Header[0] = 70) and (Header[1] = 85) and (Header[2] = 74) and (Header[3] = 73) and (.....) then ....

    var
    Header: array[0..7] of AnsiChar;
    AStream.ReadBuffer(Header, SizeOf(Header));
    if Header = 'FUJIFILM' then ....
    I have converted many components to Unicode and at the same time I tried to avoid the AnsiString whenever it was possible. I wanted a real Unicode Delphi 2009 application, not just some legacy ansi code that compiles on Delphi 2009. I also fully agree that AnsiString is not a good type for storing binary data. But I still have many places where the AnsiString is just the best string type.

    ReplyDelete
  30. Marco Cantù
     While trying convert Synapse TCP IP library to Android: simple 8bit string without any conversion (but counted) will be good. Currently we use Record with TBytes with some implicit conversions, but this is not optimal solution.

    ReplyDelete
  31. Radek ÄŒervinka  Yes, I also think so. This is what is really needed: support for a 8-bit copy-on-write string without implicit codepage conversions (aka RawByteString). Classic ShortString support would be ideal, too.

    ReplyDelete
  32. I think that ShortString is of much less importance here than having an 8-bit COW string type. On the other hand, there is the matter of existing components which may have used the ShortString, and are therefore now dead. 

    When I first used D1, the absence of a serial component out of the box was a bit shocking, as so many of us had been doing serial control, and modem work. These days, of course, the PC is losing (has lost) its built-in serial port(s). However, there remain many devices which present a serial interface to the outside world. 

    To the embedded processor world, USB is often more overhead than the product will support. There are hundreds of small CPU types which would not easily connect over USB. These do have UARTs, however, and that is unlikely to change. 

    True, this is not a consumer market, but people who work in this area have developed substantial toolsets, and few would be willing to drop them, and then have to recreate in some other language. The market is clearly less obvious than that for mobile, but it may still represent significant revenue in terms of Delphi licences.

    ReplyDelete
  33. Bill Meyer ShortString are of much less importance indeed. But according to Anders Hejlsberg once you put feature in the language you cannot just remove it. To bad that EMBT folks don't think that way.

    ReplyDelete
  34. Dalija Prasnikar Dalija, I agree. But Marco seems to feel that the "feature list" needs to be reduced. In that spirit, I think few would suffer from the removal of ShortStrings. 

    Removing language features, it seems to me, is a way of removing customer segments. Not sure why anyone would wish to do that, but clearly, EMBT are moving aggressively on the mobile front. It seems to me, they are doing so at the expense of the desktop environment. But the company must make its decisions, and reap the results, positive or negative.

    ReplyDelete

Post a Comment