Delphi Developers Archive

- June 29, 2018

Thomas Mueller, I just noticed GExperts' GREP search doesn't work with .pas files saved in utf8 format - There is no result from those files, even using the latest svn trunk version I just compiled. Do you have any idea? Thanks.

Comments

Arioch TheJune 29, 2018 at 4:18 AM

Offtopic, GEx Grep had nice UI, but eventually I switched from gExperts to CnWizards and they do not have Grep, so I use IDE's Find in Files. The UI is not that nice, but otherwise still working. Is there point in Gex Grep apart results in fancy UI ?
ReplyDelete
Replies
Thomas Mueller (dummzeuch)June 29, 2018 at 8:54 AM
Could you provide me with an example file and example search settings? Edwin Yip
As far as I know there are no Unicode issues left with regard to Grep search.
ReplyDelete
Replies
Edwin YipJune 29, 2018 at 8:47 PM
Thomas Mueller Yes, how can I send you the file privately? Thanks.
ReplyDelete
Replies
Edwin YipJune 29, 2018 at 9:03 PM
Arioch The I have been using GExperts' GREP since the D7 days, and it never occurred to me that I should try the IDE's Find In Files. So I just tried, yes, it's very close to GREP now, it's good! So far I can see several handy operations provided by GREP that doesn't exit in my XE4's Find In files: A) Fold/Expand All Results. B) Replace All Items. C) Replace Selected Item. It's about "handy", not "fancy" :) Anyways, it' s really very usable, thanks for the headsup.
ReplyDelete
Replies
Edwin YipJune 29, 2018 at 9:04 PM
Arioch The And D) The GREP result can be docked to the left/right side, while 'Find In Files' result is put in the Messages view and not suitable to be always docked to the left/right side.
ReplyDelete
Replies
Thomas Mueller (dummzeuch)June 30, 2018 at 1:25 AM
Edwin Yip I got your e-mail thanks.
ReplyDelete
Replies
Edwin YipJune 30, 2018 at 1:25 AM
Thomas Mueller Sent the email, thanks!
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 1:36 AM
Edwin Yip well, i only discovered IDE's search after CnWiz implemented ctrl+alt+up/down feature and i decided to make an overall switch for the sake of their code structure highlighter.
GEx grep is really neat, still remembering it.

A) i think tree viee in messages can do it too, will have to check
B) dangerous

I'll put one more: GEx acts like diff - it shows not only the "found" line but also few lines above and before, to make you remember the context. IDE only shows the line
ReplyDelete
Replies
Edwin YipJune 30, 2018 at 1:44 AM
Arioch The Re item B, IIRC replacing text was very stable in the D7 era, but I actually have stopped using that feature, since I found out that the text replacing will make my Chinese characters become garbled (I don't remember since when, maybe D2010).
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 1:56 AM
Edwin Yip i did not mesn implementation was shaky, i just meant mass replace in dangerous per se. You can replace unrelated text that happenned to be identical. Easiest thing - same name methods of different classes.
So, yeah, find them all, but check every place when replacing.

CnWiz has "rename term" feature on F2 key, that replaces one word, within file/class/procedure. I use it sometimes. But it tries to narrow the scope as possible and usually i only do it with terms that did not make it out of implementation. I would just fear to do cross-files automatic replace 99% cases
ReplyDelete
Replies
Edwin YipJune 30, 2018 at 4:27 AM
Arioch The I use all these IDE extensions, including MMX, which also has the 'rename identifier' refactoring function. And yes, the "find them all, but check every place *before* replacing." is a very handy workflow and why I liked using it.
ReplyDelete
Replies
Thomas Mueller (dummzeuch)June 30, 2018 at 4:31 AM
Edwin Yip
As I already wrote in my e-mail: That is a decoding issue with the RTL's TStringList for some Chinese characters used in the file. There is nothing I can do about it, really. TStringList.LoadFromFile in XE4 fails silently and simply returns an empty list. In 10.2 it shows an error message: "No mapping for the Unicode character exists in the target multi-byte code page", that's probably the reason.
Notepad++ has a similar issue, but loads the file and displays these characters as xE5 x9C and xE2 x80 .
Not sure whether that is a bug in the RTL or in Windows, or maybe simply an invalid file.
Maybe GExperts could display an error message, if the file has a length > 0 but TStringList.Text is empty.
ReplyDelete
Replies
Thomas Mueller (dummzeuch)June 30, 2018 at 4:36 AM
These characters seem to be in the Unicode "private use area".
U+E000–U+F8FF
en.wikipedia.org - Private Use Areas - Wikipedia
So that's probably not even a bug in the RTL (but it should not fail silently!)
ReplyDelete
Replies
Thomas Mueller (dummzeuch)June 30, 2018 at 4:43 AM
Oddly enough, Delphi seems to be able to display these characters.
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 5:45 AM
Edwin Yip but this workflow is doable in IDE. Message View is "show them all" and editor tabs with Ctrl+Q,A is check&replace :D
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 5:50 AM
Thomas Mueller TEncoding is a weird class. They did some semi-dotnet fast hack ISTM. Their ownership model is insane IMHO, for instance...

We were hit with their silently swallowing all errors design when were moving a project from D5 to modern Delphi. The project saved data as component streams into DB, and they used names in Russian.
So i had to drop down and to makr guys in-memory RTL patching just to load them back.
There was not way to normally extend neither TEncoding, nor TComponent, nor TReader....
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 5:52 AM
Thomas Mueller this check would not do, sadly.

1) what if the file consists of UTF BOM marker and only it?
2) what if the file contains some private values, but not only them?
ReplyDelete
Replies
Thomas Mueller (dummzeuch)June 30, 2018 at 6:12 AM
Arioch The Everything is better than failing silently. If there is an issue with loading the file, the user will know that the results will not include results from this file.
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 10:19 AM
Thomas Mueller why not deriving your own TEncoding on affected Delphi versions and thenexplicitly passing it to LoadFromFile ?
ReplyDelete
Replies
Thomas Mueller (dummzeuch)June 30, 2018 at 10:28 AM
Arioch The You are welcome to try that and submit a patch. I really think you underestimate the complexity of writing a TEncoding implementation.
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 10:45 AM
Thomas Mueller i did do it in that case, the TEncoding that tries both UTF-8 and 1251 and choses one, where there was no error.

The point is, it is not implementing from scratch. It is deriving from ready RTL classes already implemented.

In the said case AFAIR there were only two methods to override. 2.

The hard thing was patching non-extensible static methods to pass this new old TEncoding for TComponent instead stock UTF8. Because i did not have control over TComponent sources unless i wanter to rebuild all the RTL.

Also, even if deriving in OOP way would be impossible, you still have RTL sources and can copypaste.
ReplyDelete
Replies
Arioch TheJune 30, 2018 at 11:11 AM
Maybe i'd add this dual-encoding as a soecial case into https://stackoverflow.com/questions/16532633/is-there-an-easy-way-to-work-around-a-delphi-utf8-file-flaw

When back to office :D
stackoverflow.com - Is there an easy way to work around a Delphi utf8-file flaw?
ReplyDelete
Replies
Jeroen Wiert PluimersJune 30, 2018 at 3:13 PM
Edwin Yip can you upload the file somewehere? (for instance as gist?)
ReplyDelete
Replies
Edwin YipJuly 1, 2018 at 7:03 AM
Hi Thomas Mueller, thanks for the update, yes, after deleting those invalid characters the GREP works again.

And I'm about 99% sure it's the CodeSite Method Tracer from the IDE's Tools menu produced those invalid characters, and I guess it has something to do with the existence of my Chinese comments.

And what's odd is that, on the other hand, the IDE's "Find In Files" and "Find" does search in the file correctly.

*But*, I'd like to point out that the IDE code editor doesn't display it correctly - it's garbled characters, at least in on my win7/xe4 system.

Can you tell me which units/functions to try in the GExperts trunk source? I can try to find sometime to help debugging.
ReplyDelete
Replies
Edwin YipJuly 1, 2018 at 7:09 AM
Jeroen Wiert Pluimers gist wouldn't work since I believe the process involves some encoding conversion, I use uploadfiles.io instead: uploadfiles.io - Uploadfiles.io - GrepNotWorkingU.pas
ReplyDelete
Replies
Jeroen Wiert PluimersJuly 1, 2018 at 7:58 AM
Edwin Yip thanks. try 7zipping it, just to male sure no reen coding is done.
ReplyDelete
Replies
Thomas Mueller (dummzeuch)July 1, 2018 at 8:07 AM
Edwin Yip

When the file is not open in the IDE:
source/utils/GX_GenericUtils, line 4026, Procedure LoadDiskFileToUnicodeStrings

When it is open in the IDE:
source/utils/GX_OtaUtils, line 4539, Procedure GxOtaLoadSourceEditorToUnicodeStrings

Both eventually call TStringList.LoadFromStream which fails. If you want to try and find a solution, it might be easier to just call TStringGrid.LoadFromFile with a problematic file in a test program.
ReplyDelete
Replies
Jeroen Wiert PluimersJuly 1, 2018 at 9:58 AM
The code editor likely loads the file in a different way as historically most of that code is C-based.
ReplyDelete
Replies
Edwin YipJuly 1, 2018 at 8:22 PM
Thomas Mueller Tested just now, TStringList.LoadFromFile will return empty string.

I also tried using chsdet.dll (http://chsdet.sourceforge.net/) to detect the encoding, and the strings can be successfully loaded, except that all of my Chinese characters are garbled when the loaded strings are viewed in a TMemo.
chsdet.sourceforge.net - Charset Detector :: Summary
ReplyDelete
Replies
Thomas Mueller (dummzeuch)July 1, 2018 at 9:33 PM
Edwin Yip for Grep it might be OK to skip invalid characters in order to find everything else. But:
1. People will complain about Grep not working properly for Unicode files ;-)
2. Source code positions may be off
3. You then definitely don't want to do a replace in Grep

I have changed the source in the mentioned procedures to raise an exception if the string list is empty, so at least the user gets an error message.
ReplyDelete
Replies
Edwin YipJuly 2, 2018 at 1:46 AM
Thomas Mueller I think that's a feasible solution to notify the user the file's in trouble, since the file is somehow in a bad format and the user is responsible for correcting it, not GREP itself. Thanks Thomas!
ReplyDelete
Replies
Edwin YipJuly 6, 2018 at 5:12 AM
I can confirm the unit file's corrupted by CodeSite Method Tracer.
ReplyDelete
Replies
Jeroen Wiert PluimersJuly 6, 2018 at 5:29 AM
Edwin Yip try contacting Ray Konopka. I think he is not on G+ but he is at twitter.com - Ray Konopka (@RayKonopka) | Twitter
ReplyDelete
Replies

Add comment

Search This Blog

Delphi Developers Archive

Comments

Post a Comment