Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


Hi,

2012/10/25 Caolán McNamara <caolanm@redhat.com>:
On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote:
Hi,

Adding a simple new item to the en_US.dic, like

men's

will extend the dictionary. The biggest plus in the American English
dictionary of LibreOffice is the morphological data (also based on
Kevin's data and maybe WordNet) for stemming and morphological
generation in thesaurus suggestions, see the attached conversion
script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.

So basically one attractive route to go would be to build our dictionary
at LibreOffice build time ourselves from wordnet +
custom-libreoffice-words patch + that script. Which would give us
something we can easily sync whenever wordnet gets updated without
losing the extra morphological data. Or is there any gotchas with doing
that ?

Only a small part of Wordnet – the list of the irregular forms – used
by the script. But the thesaurus of LibreOffice is based on the full
Wordnet, so it would be fine to add the thesaurus generation to the
building process. We would be able to add some attractive thesaurus
improvements, too, like Unicode symbols as synonyms: eg. alpha -> α,
skull -> ☠, as in the Hungarian thesaurus.

Gotchas: there were some manual fixes (documented in the
README_en_US.txt) to handle Unicode apostrophes and ligatures.
Adding a small list with the most urgent words would be easier for me.

I also tried to find an old OpenOffice.org issue about the quality
analysis/extension of the (American) English dictionary, but I have
found only the
en-GB-oed dictionary for international organizations, see
https://issues.apache.org/ooo/show_bug.cgi?id=51093,
http://ftp.nluug.nl/office/openoffice/contrib/dictionaries/README_en_GB-oed.txt.

Best regards,
László



C.


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.