On Fri, 2011-12-30 at 14:43 -0500, webmaster for Kracked Press
Productions wrote: > I need some "guidance" with the extent of my
dictionary files for > LibreOffice and OOo. > > My largest dictionaries
are about 638,000 words in the spelling word > .dic file.  I need to
know how large it too large. > > I found out this morning that if I
compare that word list with a > combined list for chemical and medical
words, over 98,000 words from > that combined list is not in the
current .dic word list[s]. > > Now here it the issue, how far should I
take this project? > > I am going to add all the "missing" words that
are part of the > open-source community's lexicon that are not in the
current lists, but > where do I stop, and how should I format the
"finalized" files? > > Should there be one super large list, or should
I break it up into > sub-lists?  Should the "standard" words go into
one .dic file, while > medical, chemistry, and computer/tech words each
have their own .dic > file within the .oxt file? > > Right now, there
is an English dictionary [default one?] that includes > US, British,
Canadian, and some other versions of English put together > as one .oxt
file, but separate .dic files.  I was wondering if that > would be the
route I should go with my super-size dictionaries. > > To be honest, 20
years ago the spelling dictionary project I was working > on has about
177,000 words and I was told that the English language was > about
250,000 words.  Now I have looked at a combined word list and it > has
about 737K words in it and there are more words/terms still needing >
to be checked.  The largest book style dictionary now has 25+ volumes
to > it when it was only 15 about 15-20 years ago.  So I really think
the > final super-sized dictionary word list could one day go over one
million > in the next year or two.  I just have to figure out if it is
worth > building a list for LO to that size. > > Your input would help
me make the best US, British, and Canadian English > dictionaries out
there for LibreOffice.  This is for our users to use, > so it would be
nice for users to let me know what they think.
     Something to remember: the main dictionary for the language used by 
LO is a binary file kept in the Installation folder. If a language pack 
is added, this language is also binary and kept in the same place. These 
are large files.
     User created dictionary files (.dic) are kept in the personal 
settings folder. These are text files.
     Some time ago, someone asked about dictionary file sizes referring to
the user created .dic files. The reply was 22K or less per file seemed 
like a good number. It was mentioned that OOo would not use a dictionary 
file if it was too large.
      The dictionary files .dic) are text documents with
the first four lines very important as far as content is concerned.
Below is the first four lines for an English user created .dic file followed
by a German user created .dic file.

           OOoUserDict1             OOoUserDict1
           lang: en-US      OR        lang: de-DE
           type: positive           type: positive
           ---                      ---
It appears like the second line is the one that has to be changed from
language to language. the letters before the hyphen are the language
(en,English; de, Deutch) and the letters afterward are the country
(US, USA; GB, Great Britain; etc.)
     But with the number of entries you have, you need to find some way
to make a binary file that LO can read as a .dic file. From what I remember
about the creation of the Austrialian dictionary, it is very time consuming
to create the binary files.


