<snip>
On 01/01/2012 08:09 PM, Dan Lewis wrote:
Something to remember: the main dictionary for the language used by
LO is a binary file kept in the Installation folder. If a language pack
is added, this language is also binary and kept in the same place. These
are large files.
User created dictionary files (.dic) are kept in the personal
settings folder. These are text files.
Some time ago, someone asked about dictionary file sizes referring to
the user created .dic files. The reply was 22K or less per file seemed
like a good number. It was mentioned that OOo would not use a dictionary
file if it was too large.
The dictionary files .dic) are text documents with
the first four lines very important as far as content is concerned.
Below is the first four lines for an English user created .dic file followed
by a German user created .dic file.
OOoUserDict1 OOoUserDict1
lang: en-US OR lang: de-DE
type: positive type: positive
--- ---
It appears like the second line is the one that has to be changed from
language to language. the letters before the hyphen are the language
(en,English; de, Deutch) and the letters afterward are the country
(US, USA; GB, Great Britain; etc.)
But with the number of entries you have, you need to find some way
to make a binary file that LO can read as a .dic file. From what I remember
about the creation of the Austrialian dictionary, it is very time consuming
to create the binary files.
--Dan
Dan
Every dictionary .dic file that I looked at English/German/French/etc.
for LibreOffice, and from OOo's own site, is shown as the following:
line 1: the number of words in the list
line 2 through the end: the list of words and any control codes
All of the dictionaries are in .oxt files and they are a type of archives.
The largest dictionary I have is Irish/Gaeilge, which is about 7 meg in
size for the .oxt file, but the .dic file inside it is about 2 meg. The
Thesaurus and hyphenation files are most of the rest of the file size.
I do not know about any .oxt files that are in a "binary" format. Maybe
the .dic files are converted to a binary somewhere, but not in
dictionary creator's end.
When using my dictionaries, I do not notice any slowing down of the
loading process. Even with all of my American English .oxt file
enabled, plus the largest British English and Canadian English enabled.
I do not know what "binary conversion" takes a long time, but I do not
see it.
I have the the Australian .oxt file with the .dic file from 2008-12-15,
also the Australian Medical dictionary with its .dic file from
2008-07-01. Neither .dic files are in a binary format. When opened
they are just an ASCII text file in an archived file. There are control
codes after some of the words in that .dic file, so maybe that was what
too time to create - words and their control codes.
Here is a link to the English dictionary section and Australian is the
first ones listed.
http://libreoffice-na.us/English-3.4-installs/dictionary.html#english
Now, I have large word lists in my .dic files. 6.4 meg for the 638K
word size. But there is no control codes in my .dic files except the
top line stating the number of words in the list.
Now, I offer several word list sizes for my dictionaries; 98K, 217K,
390K, and 638K words, with no 98K for Canada since I did not have a word
list [yet] that size to use for one. So if the user wants to use the
98K word list for their spelling words, they can do it. There is the
638K word list dictionary since someone on this list asked me for a
dictionary with the largest word list that I had. I asked before I made
them.
As for seeing these in the .dic files I got from the OOo dictionary
list, sorry I did not see them.
OOoUserDict1 OOoUserDict1
lang: en-US OR lang: de-DE
type: positive type: positive
Maybe they are are created in the folder that they reside in after
LO/OOo loads them up through the Extension Manager. I know that I used
some of these dictionaries when I used OOo 3.x.x and they were still
500K or more for the .dic files then.
There is a 8074 word list with a .dic file of 87.9KB. How many words
would be in a 22K .dic file? Where did you get that 22K size info? I
went to the .libreoffice hidden folder [Ubuntu 10.04] and not one of the
.dic files listed there are anyway near that 22K size. Most are in the
1 to 3 MB range.
I did a lot of looking into what documentation I could find for creating
a language dictionary, and nowhere did I find any info about file sizes
and converting the .dic files to a binary format. I know binary as
something other than a file that shows the actual "text" of the file in
a text editor. I have seen "true binary" files when I had to program in
Assembly and C. The resulting files ended up into a binary format
unreadable in a text editor. The .dic files are not like that, as far
as I can see. So I do not know what "binary conversion" you are talking
about.
Also, if you download my dictionaries [making sure there is the .oxt
file extension is there], then install it using the Extension Manager,
you will have to issues. I have installed them on Ubuntu computers and
Windows computers with equal results. SO, my dictionaries work as they
are.
All I was really wanting to know was how far should I go with the number
and types of words for these dictionaries. All the English words, plus
English medical and chemistry add up to over 736,000 words [with not
control codes after them].
.
--
For unsubscribe instructions e-mail to: users+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.