Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


<snip>
On 01/01/2012 08:09 PM, Dan Lewis wrote:
      Something to remember: the main dictionary for the language used by
LO is a binary file kept in the Installation folder. If a language pack
is added, this language is also binary and kept in the same place. These
are large files.
      User created dictionary files (.dic) are kept in the personal
settings folder. These are text files.
      Some time ago, someone asked about dictionary file sizes referring to
the user created .dic files. The reply was 22K or less per file seemed
like a good number. It was mentioned that OOo would not use a dictionary
file if it was too large.
       The dictionary files .dic) are text documents with
the first four lines very important as far as content is concerned.
Below is the first four lines for an English user created .dic file followed
by a German user created .dic file.

            OOoUserDict1             OOoUserDict1
            lang: en-US      OR        lang: de-DE
            type: positive           type: positive
            ---                      ---
It appears like the second line is the one that has to be changed from
language to language. the letters before the hyphen are the language
(en,English; de, Deutch) and the letters afterward are the country
(US, USA; GB, Great Britain; etc.)
      But with the number of entries you have, you need to find some way
to make a binary file that LO can read as a .dic file. From what I remember
about the creation of the Austrialian dictionary, it is very time consuming
to create the binary files.

--Dan
Dan
Every dictionary .dic file that I looked at English/German/French/etc. for LibreOffice, and from OOo's own site, is shown as the following:
line 1:  the number of words in the list
line 2 through the end:  the list of words and any control codes

All of the dictionaries are in .oxt files and they are a type of archives.

The largest dictionary I have is Irish/Gaeilge, which is about 7 meg in size for the .oxt file, but the .dic file inside it is about 2 meg. The Thesaurus and hyphenation files are most of the rest of the file size.
I do not know about any .oxt files that are in a "binary" format.  Maybe 
the .dic files are converted to a binary somewhere, but not in 
dictionary creator's end.
When using my dictionaries, I do not notice any slowing down of the 
loading process.  Even with all of my American English .oxt file 
enabled, plus the largest British English and Canadian English enabled.  
I do not know what "binary conversion" takes a long time, but I do not 
see it.
I have the the Australian .oxt file with the .dic file from 2008-12-15, 
also the Australian Medical dictionary with its .dic file from 
2008-07-01.  Neither .dic files are in a binary format.  When opened 
they are just an ASCII text file in an archived file.  There are control 
codes after some of the words in that .dic file, so maybe that was what 
too time to create - words and their control codes.
Here is a link to the English dictionary section and Australian is the 
first ones listed.
http://libreoffice-na.us/English-3.4-installs/dictionary.html#english

Now, I have large word lists in my .dic files. 6.4 meg for the 638K word size. But there is no control codes in my .dic files except the top line stating the number of words in the list.
Now, I offer several word list sizes for my dictionaries; 98K, 217K, 
390K, and 638K words, with no 98K for Canada since I did not have a word 
list [yet] that size to use for one.  So if the user wants to use the 
98K word list for their spelling words, they can do it.  There is the 
638K word list dictionary since someone on this list asked me for a 
dictionary with the largest word list that I had.  I asked before I made 
them.
As for seeing these in the .dic files I got from the OOo dictionary 
list, sorry I did not see them.
           OOoUserDict1             OOoUserDict1
           lang: en-US      OR        lang: de-DE
           type: positive           type: positive

Maybe they are are created in the folder that they reside in after LO/OOo loads them up through the Extension Manager. I know that I used some of these dictionaries when I used OOo 3.x.x and they were still 500K or more for the .dic files then.
There is a 8074 word list with a .dic file of 87.9KB.  How many words 
would be in a 22K .dic file?  Where did you get that 22K size info?  I 
went to the .libreoffice hidden folder [Ubuntu 10.04] and not one of the 
.dic files listed there are anyway near that 22K size.  Most are in the 
1 to 3 MB range.
I did a lot of looking into what documentation I could find for creating 
a language dictionary, and nowhere did I find any info about file sizes 
and converting the .dic files to a binary format.  I know binary as 
something other than a file that shows the actual "text" of the file in 
a text editor.  I have seen "true binary" files when I had to program in 
Assembly and C.  The resulting files ended up into a binary format 
unreadable in a text editor.  The .dic files are not like that, as far 
as I can see.  So I do not know what "binary conversion" you are talking 
about.
Also, if you download my dictionaries [making sure there is the .oxt 
file extension is there], then install it using the Extension Manager, 
you will have to issues.  I have installed them on Ubuntu computers and 
Windows computers with equal results.  SO, my dictionaries work as they 
are.
All I was really wanting to know was how far should I go with the number 
and types of words for these dictionaries.  All the English words, plus 
English medical and chemistry add up to over 736,000 words [with not 
control codes after them].


.

--
For unsubscribe instructions e-mail to: users+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.