BACKGROUND

BACKGROUND
I developed a Calibration and Maintenance tracking system that I sell online. In 2011 I ported the application from VB6 to Delphi XE. At the same time a created translations for 9 of the languages utilizing code page 1252. To provide spellchecking, I used the Jedi jvSpellchecker. This has seems to work acceptably for all 10 languages.

PROBLEM STATEMENT
I have recently added Polish and Czech translations. However, I am having problems with Spellchecking. The spell checker seems to see all words that have characters that are unique to code page 1250 as errors. At the moment I don't know if this is all such characters or some characters. This problem is destined to get worse because I have Vietnamese and Chinese in the works.

I am not looking for a detailed answer to all the problems I face but only some ideas on how to get started. In starting to work on this problem, I realize that there too many things that I don't know. I need some resources to better prepare myself to resolve the problems.

THINGS THAT WOULD HELP (REQUESTS)
I cannot find any documentation regarding usage and capability of the jvSpellChecker. Does anyone know if this sort of information is available, and if so where can I find it? It seems that I cannot find it.

A quick look at the jvSpellchecker source code makes me worry that it is not compatible with Unicode. Does anyone know if this code is compatible with Delphi XE? Has anyone actually used this spellchecker with a code page 1250 language?

Am I correct that there is no programmatic way to determine the code page of an ANSI file without direct inspection and some knowledge of the target language's orthography?.

The dictionaries that I am using are derived from publicly available word lists and then compressed. It doesn't seem unreasonable that I may have mishandled these files because of the locale and language attributes for my OS. To be safe, do I need to change the Windows locale and language to work on foreign language files?

I realize that this I would suffer permanent excommunication from stackoverflow if I posted this question. I am hoping for a little flexibility here to help me get started.

Comments

  1. I would try to ensure all internal operations are done in Unicode, so that your first step always is to ensure code paged ANSI is translated properly to Unicode. Code pages really are relics from the past.

    ReplyDelete
  2. Thank you for the information. It seems that I probably should not have used the JEDI spellchecker because it doesn't work reliably with Unicode. This was fairly evident from the JVCL sources. Unfortunately, I do not have enough time to dissect it and stitch it back together.

    Furthermore, the original word lists that I used to create the dictionaries were not UTF-8 but ANSI (as you suspected). This was OK until I moved on to code pages other that 1252. I was completely unable to get reliable results in either Polish or Czech because of I had no convenient way to determine the original encode. Sometimes, I don't like to look backward because it reveals that I started this task without properly learning and preparing for it.

    As a result I have implemented "NHunSpell for Delphi" implementation of Hunspell written by Alexander Halser. See link below. This implementation uses OpenOffice dictionaries. Thus far I have completed and tested Czech, Danish, English, German and Polish. I had to do some work on the user interface to adapt it to my application but it was a relatively minor task. The interface to HunSpell appears to be flawless and robust. Now, I just have test Italian, French, Norwegian, Portuguese, Spanish and Swedish.

    I would like to thank Alexander Halser for making this code available. I can also heartily recommend NHunSpell for Delphi.
    cc.embarcadero.com - 28268 OpenOffice Spell Check for Delphi

    ReplyDelete

Post a Comment