On Fri, 17 Feb 2012 14:10:21 +0000
Caolán McNamara <caolanm@redhat.com> wrote:
On Thu, 2012-02-16 at 23:24 +0000, Richard Wordingham wrote:
Indeed, yeah, I suppose, assuming its as complicated as "Thai", that
the right direction would be for someone to write for icu new
dictionary-based breakiterators for the "nod"(?) language and then the
rather trivial changes to LibreOffice to know about the language in
order to mark text as that language to bubble that info down to icu
Northern Thai's not quite as simple or standardised as Siamese! One can
meet (at least) the following spelling systems:
1) Chiangmai phonetics
2) Chiangrai phonetics (different mapping of tones to Siamese spelling
rules)
3) Transliteration from Tai Tham script (probably rare for connected
text)
4) Tai Tham script
However, perhaps dictionary-based break iterators are something to be
treated like dictionaries. There are several other writing systems
that could probably benefit from them:
Thai script:
Northern Thai
NE Thai (for recording songs - use of Siamese tone rules scrambles
the tonemarks compared to Siamese cognates)
Khmer script:
Khmer - there's already a project for this set up on SourceForge.
Pali
Tai Tham script:
Tai Khuen
Tai Lue
Pali
Lao script
Lao
Tibetan script
Tibetan
I've a feeling Burmese may also have a need for dictionary based text
breaking, though it's better behaved for syllable breaking than most of
the others listed here. Shan would come in the same category.
The above list is not exhaustive. Tai Lue in Lao script probably
belongs in the list.
Not all Thai script writing systems need a break iterator - some of the
minority languages separate words with spaces, but that's partially a
matter of literacy - Thais start writing Thai with interword gaps and
then learn to suppress the gaps. Pali written in Thai also separates
words with spaces - but Pali has some very long words!
Richard.
Context
- Re: Adding Extension for Experimental Thai Spelling (continued)
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.