Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index

On 06/24/2012 05:01 PM, webmaster-Kracked_P_P wrote:

BUT, I would like to make sure all of my .oxt dictionaries have the words/terms we use every day 
in articles and email support for LibreOffice and other open source related "items".

If you have the disk capacity, then:
* Download the Wikipedia article database;
* Run a script that writes each word it finds into a file;
* Manually go through the list, to pick up misspellings;
* Merge the "correct words" list into your existing wordlist;
* Merge the "known misspelling" list into the autocorrect list;

Two potential issues with this approach:
* Names of individuals, organizations, and things are included;
* Foreign words are included;

Whilst there are ways to eliminate both of those problems, the usual
result, when doing so using scripts, is that legitimate words in the
target language are removed, along with the foreign word, or nouns.  As
one example, the Afrikaans dictionary omitted the word "die" for several
years, because the script that was used to eliminate non-Afrikaans
words, read that word as the English "die".


For unsubscribe instructions e-mail to:
Posting guidelines + more:
List archive:
All messages sent to this list will be publicly archived and cannot be deleted


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.