On 05/23/2014 03:24 PM, libreoffice-ml.mbourne@spamgourmet.com wrote:
On 05/22/2014 12:10 PM, Urmas wrote:
"Kracked_P_P---webmaster":
There are 797866 lines in the .dic file with the top one the number
of words.
Due to the author's error, it is shipped unmunched. In the proper form
it contains 476898 entries, probably even less if some wordforms are
missing. That is close to 70% misrepresentation.
I don't know how spell-check dictionaries are usually compared but, to
me, it would make sense to count each form as a separate word. It may
be more efficient in use to compress the dictionary into a smaller
number of entries, but if there's a single entry encoding 4 forms of
the same root word, I'd count that as 4 words. Otherwise, a dictionary
containing 100000 words but only the root word of each would seem just
as good as a dictionary containing the same 100000 root words plus all
the variations encoded into each entry.
Kracked_P_P---webmaster wrote:
What do you mean by the term "unmunched"?
munch
/mʌntʃ/
verb (used with object)
1. to chew with steady or vigorous working of the jaws, often audibly.
...
Related forms
un·munched, adjective
YES, I heard of the term, but not used in the dictionary file.
(http://dictionary.reference.com/browse/unmunched - I didn't swallow
the dictionary, munched or otherwise)
Never heard of that term in relation to a .dic file.
Since a .dic file doesn't strike me as being particularly tasty, nor
useful after chewing, perhaps we should be glad that it is unmunched.
(FWIW, neither LibreOffice nor SeaMonkey recognises 'unmunched'...)
The mean that mine is "munched", since it seem to work just fine for me.
With my 797K dictionary enabled, it has checked itself, just fine. So
the words that are in it that is not in the default en_US works.
Mark.
Here are the file sizes of several version of the en_US .ext file[s].
spell checking words, thesaurus, and hyphenation file size based on
words - "unmunched".
98,000 words - 5.5 MB
217,000 words - 5.8 MB
390,000 words - 6.2 MB
797,000 word - 6.8 MB file
3 million words - 11 +/- MB file - experimental and not published.
The file size is shown on a Linux Mint "Caja" file manager.
I am not the only one who produced a 5-7 MB .ext file for the spell
checking, etc., system.
--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Context
- Re: [libreoffice-users] Spell Check Dictionary (continued)
Re: [libreoffice-users] Spell Check Dictionary · Mark LaPierre
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.