Date: prev next · Thread: first prev next last
2014 Archives by date, by thread · List index


       reminds me of "and the longest word in the English language is ... "

          or is it supercalifragilisticespialidocious  ;-)
              https://www.youtube.com/watch?v=tRFHXMQP-QU



From: Kracked_P_P---webmaster <webmaster@krackedpress.com>
Date: Thu, May 22, 2014 at 8:58 AM
Subject: Re: [libreoffice-users] Re: Spell Check Dictionary
To: users@global.libreoffice.org


There are 797866 lines in the .dic file with the top one the number of
words.  The rest of the lines are one word each.  The .dic file treats each
line, except the first, as an individual word.

Each line is a correct spelling of a word.  The first part of the list are
the capitalized words and the rest are the lowercased ones.

"timed" and "timing" are two forms of a single root word and are not
considered the same word as "time".  If you create a word list of a
document, for all of the words used, time, timed, and timing, are three
individually listed words.  Just because they share the same root word does
not mean they are the same word.

Also, for a spell checker, a word that has the first letter uppercased and
a word with that same letter lowercased are treated differently.   When not
as the first word in a sentence, there are words that are allowed, or even
need the first letter to be uppercased, while other will be misspelled if
the first letter is uppercased.  That is defined in the spell checking .dic
file.

You can either take a word and list each version or you can figure out all
the control "options" to follow that word so it would also define all of
those prefixed and suffixed versions of that word. Since I do not know
those control codes, I listed each form or version of the word out in the
list so I could also give a "good" word count.

So the 797,865 words in the .dic file is correct.

Would you like to deal with my unpublished 3,068,588 word .dic file that
has even more versions and correct spellings of "en_US" words?  This
contains many, many, suffix and prefix versions that are rarely seen but
technically spelled correctly.  I just created that version to see how
massive it could go.  But, I will not publish it as a single dictionary.
 It would be divided up into "common" and "rare" files to be
enabled/disabled as the user would choose.  For now, the spell checking
extension project is not going to be continued till a lot of other projects
are finished - LO projects and many more non-LO projects.

-- 
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.