On Sat, 2012-02-11 at 16:23 +0000, Richard Wordingham wrote:
Is it possible to create an experimental alternative to the Thai
break iterator that can be shared with other people as a LibreOffice
extension?
I don't think we have any way to override our breakiterators from
extensions.
FWIW, i18npool/source/breakiterator is where we have our word,
character, sentence and line break iterators implemented.
Typically we forward everything on to icu to do the real work, albeit
with some customization of the default icu rules.
What I'd *expect* to happen is that text marked as "Thai" should end up
getting broken into words by the default icu word break iterator, which
at http://userguide.icu-project.org/boundaryanalysis claims "ICU
provides a special dictionary-based break iterator."
So, assuming that nothing is simply broken, improving the icu Thai break
iterator should improve the libreoffice "for free".
I'd be sort of interested in confirming that what we have right now
actually works correctly, in the sense that Thai text definitely *is*
getting run through the special Thai-specific icu word break handler.
There is a i18npool/qa/cppunit/test_breakiterator.cxx which we use to
make sure that some existing edge-cases continue to work. If you wanted
to hack that to add some Thai word break tests that'd be helpful, and/or
simply pass me on some sample text where we *are* doing the right thing
and where we *aren't* and I could populate a test in there with that
data and turn the problem into a developer friendly "enable this
word-break unit test and make it work" problem.
C.
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.