Thomas Mueller, I just noticed GExperts' GREP search doesn't work with .pas files saved in utf8 format - There is no result from those files, even using the latest svn trunk version I just compiled. Do you have any idea? Thanks.

Comments


  1. Offtopic, GEx Grep had nice UI, but eventually I switched from gExperts to CnWizards and they do not have Grep, so I use IDE's Find in Files. The UI is not that nice, but otherwise still working. Is there point in Gex Grep apart results in fancy UI ?

    ReplyDelete
  2. Could you provide me with an example file and example search settings? Edwin Yip
    As far as I know there are no Unicode issues left with regard to Grep search.

    ReplyDelete
  3. Thomas Mueller Yes, how can I send you the file privately? Thanks.

    ReplyDelete
  4. Arioch The I have been using GExperts' GREP since the D7 days, and it never occurred to me that I should try the IDE's Find In Files. So I just tried, yes, it's very close to GREP now, it's good! So far I can see several handy operations provided by GREP that doesn't exit in my XE4's Find In files: A) Fold/Expand All Results. B) Replace All Items. C) Replace Selected Item. It's about "handy", not "fancy" :) Anyways, it' s really very usable, thanks for the headsup.

    ReplyDelete
  5. Arioch The And D) The GREP result can be docked to the left/right side, while 'Find In Files' result is put in the Messages view and not suitable to be always docked to the left/right side.

    ReplyDelete
  6. Thomas Mueller Sent the email, thanks!

    ReplyDelete
  7. Edwin Yip well, i only discovered IDE's search after CnWiz implemented ctrl+alt+up/down feature and i decided to make an overall switch for the sake of their code structure highlighter.
    GEx grep is really neat, still remembering it.

    A) i think tree viee in messages can do it too, will have to check
    B) dangerous

    I'll put one more: GEx acts like diff - it shows not only the "found" line but also few lines above and before, to make you remember the context. IDE only shows the line

    ReplyDelete
  8. Arioch The Re item B, IIRC replacing text was very stable in the D7 era, but I actually have stopped using that feature, since I found out that the text replacing will make my Chinese characters become garbled (I don't remember since when, maybe D2010).

    ReplyDelete
  9. Edwin Yip i did not mesn implementation was shaky, i just meant mass replace in dangerous per se. You can replace unrelated text that happenned to be identical. Easiest thing - same name methods of different classes.
    So, yeah, find them all, but check every place when replacing.

    CnWiz has "rename term" feature on F2 key, that replaces one word, within file/class/procedure. I use it sometimes. But it tries to narrow the scope as possible and usually i only do it with terms that did not make it out of implementation. I would just fear to do cross-files automatic replace 99% cases

    ReplyDelete
  10. Arioch The I use all these IDE extensions, including MMX, which also has the 'rename identifier' refactoring function. And yes, the "find them all, but check every place *before* replacing." is a very handy workflow and why I liked using it.

    ReplyDelete
  11. Edwin Yip
    As I already wrote in my e-mail: That is a decoding issue with the RTL's TStringList for some Chinese characters used in the file. There is nothing I can do about it, really. TStringList.LoadFromFile in XE4 fails silently and simply returns an empty list. In 10.2 it shows an error message: "No mapping for the Unicode character exists in the target multi-byte code page", that's probably the reason.
    Notepad++ has a similar issue, but loads the file and displays these characters as xE5 x9C and xE2 x80 .
    Not sure whether that is a bug in the RTL or in Windows, or maybe simply an invalid file.
    Maybe GExperts could display an error message, if the file has a length > 0 but TStringList.Text is empty.

    ReplyDelete
  12. These characters seem to be in the Unicode "private use area".
    U+E000–U+F8FF
    en.wikipedia.org - Private Use Areas - Wikipedia
    So that's probably not even a bug in the RTL (but it should not fail silently!)

    ReplyDelete
  13. Oddly enough, Delphi seems to be able to display these characters.

    ReplyDelete
  14. Edwin Yip but this workflow is doable in IDE. Message View is "show them all" and editor tabs with Ctrl+Q,A is check&replace :D

    ReplyDelete
  15. Thomas Mueller TEncoding is a weird class. They did some semi-dotnet fast hack ISTM. Their ownership model is insane IMHO, for instance...

    We were hit with their silently swallowing all errors design when were moving a project from D5 to modern Delphi. The project saved data as component streams into DB, and they used names in Russian.
    So i had to drop down and to makr guys in-memory RTL patching just to load them back.
    There was not way to normally extend neither TEncoding, nor TComponent, nor TReader....

    ReplyDelete
  16. Thomas Mueller this check would not do, sadly.

    1) what if the file consists of UTF BOM marker and only it?
    2) what if the file contains some private values, but not only them?

    ReplyDelete
  17. Arioch The Everything is better than failing silently. If there is an issue with loading the file, the user will know that the results will not include results from this file.

    ReplyDelete
  18. Thomas Mueller why not deriving your own TEncoding on affected Delphi versions and thenexplicitly passing it to LoadFromFile ?

    ReplyDelete
  19. Arioch The You are welcome to try that and submit a patch. I really think you underestimate the complexity of writing a TEncoding implementation.

    ReplyDelete
  20. Thomas Mueller i did do it in that case, the TEncoding that tries both UTF-8 and 1251 and choses one, where there was no error.

    The point is, it is not implementing from scratch. It is deriving from ready RTL classes already implemented.

    In the said case AFAIR there were only two methods to override. 2.

    The hard thing was patching non-extensible static methods to pass this new old TEncoding for TComponent instead stock UTF8. Because i did not have control over TComponent sources unless i wanter to rebuild all the RTL.

    Also, even if deriving in OOP way would be impossible, you still have RTL sources and can copypaste.

    ReplyDelete
  21. Edwin Yip can you upload the file somewehere? (for instance as gist?)

    ReplyDelete
  22. Hi Thomas Mueller, thanks for the update, yes, after deleting those invalid characters the GREP works again.

    And I'm about 99% sure it's the CodeSite Method Tracer from the IDE's Tools menu produced those invalid characters, and I guess it has something to do with the existence of my Chinese comments.

    And what's odd is that, on the other hand, the IDE's "Find In Files" and "Find" does search in the file correctly.

    *But*, I'd like to point out that the IDE code editor doesn't display it correctly - it's garbled characters, at least in on my win7/xe4 system.

    Can you tell me which units/functions to try in the GExperts trunk source? I can try to find sometime to help debugging.

    ReplyDelete
  23. Jeroen Wiert Pluimers gist wouldn't work since I believe the process involves some encoding conversion, I use uploadfiles.io instead: uploadfiles.io - Uploadfiles.io - GrepNotWorkingU.pas

    ReplyDelete
  24. Edwin Yip thanks. try 7zipping it, just to male sure no reen coding is done.

    ReplyDelete
  25. Edwin Yip

    When the file is not open in the IDE:
    source/utils/GX_GenericUtils, line 4026, Procedure LoadDiskFileToUnicodeStrings

    When it is open in the IDE:
    source/utils/GX_OtaUtils, line 4539, Procedure GxOtaLoadSourceEditorToUnicodeStrings

    Both eventually call TStringList.LoadFromStream which fails. If you want to try and find a solution, it might be easier to just call TStringGrid.LoadFromFile with a problematic file in a test program.

    ReplyDelete
  26. The code editor likely loads the file in a different way as historically most of that code is C-based.

    ReplyDelete
  27. Thomas Mueller Tested just now, TStringList.LoadFromFile will return empty string.

    I also tried using chsdet.dll (http://chsdet.sourceforge.net/) to detect the encoding, and the strings can be successfully loaded, except that all of my Chinese characters are garbled when the loaded strings are viewed in a TMemo.
    chsdet.sourceforge.net - Charset Detector :: Summary

    ReplyDelete
  28. Edwin Yip for Grep it might be OK to skip invalid characters in order to find everything else. But:
    1. People will complain about Grep not working properly for Unicode files ;-)
    2. Source code positions may be off
    3. You then definitely don't want to do a replace in Grep

    I have changed the source in the mentioned procedures to raise an exception if the string list is empty, so at least the user gets an error message.

    ReplyDelete
  29. Thomas Mueller I think that's a feasible solution to notify the user the file's in trouble, since the file is somehow in a bad format and the user is responsible for correcting it, not GREP itself. Thanks Thomas!

    ReplyDelete
  30. I can confirm the unit file's corrupted by CodeSite Method Tracer.

    ReplyDelete
  31. Edwin Yip try contacting Ray Konopka. I think he is not on G+ but he is at twitter.com - Ray Konopka (@RayKonopka) | Twitter

    ReplyDelete

Post a Comment