Date: prev next · Thread: first prev next last
2014 Archives by date, by thread · List index


On 05/23/2014 03:24 PM, libreoffice-ml.mbourne@spamgourmet.com wrote:
On 05/22/2014 12:10 PM, Urmas wrote:
"Kracked_P_P---webmaster":

There are 797866 lines in the .dic file with the top one the number
of words.

Due to the author's error, it is shipped unmunched. In the proper form
it contains 476898 entries, probably even less if some wordforms are
missing. That is close to 70% misrepresentation.

I don't know how spell-check dictionaries are usually compared but, to me, it would make sense to count each form as a separate word. It may be more efficient in use to compress the dictionary into a smaller number of entries, but if there's a single entry encoding 4 forms of the same root word, I'd count that as 4 words. Otherwise, a dictionary containing 100000 words but only the root word of each would seem just as good as a dictionary containing the same 100000 root words plus all the variations encoded into each entry.

Kracked_P_P---webmaster wrote:
What do you mean by the term "unmunched"?

munch
/mʌntʃ/
verb (used with object)
1. to chew with steady or vigorous working of the jaws, often audibly.
...
Related forms
un·munched, adjective

YES, I heard of the term, but not used in the dictionary file.


(http://dictionary.reference.com/browse/unmunched - I didn't swallow the dictionary, munched or otherwise)

Never heard of that term in relation to a .dic file.

Since a .dic file doesn't strike me as being particularly tasty, nor useful after chewing, perhaps we should be glad that it is unmunched.

(FWIW, neither LibreOffice nor SeaMonkey recognises 'unmunched'...)

The mean that mine is "munched", since it seem to work just fine for me.
With my 797K dictionary enabled, it has checked itself, just fine. So the words that are in it that is not in the default en_US works.

Mark.



Here are the file sizes of several version of the en_US .ext file[s].

spell checking words, thesaurus, and hyphenation file size based on words - "unmunched".

  98,000 words - 5.5 MB
217,000 words - 5.8 MB
390,000 words - 6.2 MB
797,000 word - 6.8 MB file
3 million words - 11 +/- MB file - experimental and not published.

The file size is shown on a Linux Mint "Caja" file manager.
I am not the only one who produced a 5-7 MB .ext file for the spell checking, etc., system.



--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.