Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


On Mon, 2012-02-13 at 22:39 +0000, Richard Wordingham wrote:
The spell-checker seems to break up a phrase consisting of just กุหลาบ
into 3 or 4 words.

Hmm, so I played around with this and here's what I think is the
problem...

We have some customized break iterator rules in LibreOffice, so we're
using those ones and *not* the built-in icu ones. But we lack a
customized Thai one, so we're using some ultra-generic word breaking
stuff for Thai and not going near the special built-into-icu Thai
iterator :-(

I think this change:
http://cgit.freedesktop.org/libreoffice/core/commit/?id=475d0c59c66fb7752d230f76130b17145aad0c12
should improve matters a lot. Makes "กุหลาบ" get treated as a single
word in the unit test there now anyway, though the Northern Thai one is
still not considered a single word, that might be due to the oldish icu
we're still using.

After some googling I'm unsure if the "right way to go" to further
improve Thai break iterators is to simply have another go at upgrading
icu to get the latest and greatest there, or for "someone" to have a go
at integrating libthai into LibreOffice and hand off break iteration for
Thai to that. Either way, link above and related unit test give an entry
point to the relevant code.

C. 


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.