Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


On Fri, 17 Feb 2012 14:10:21 +0000
Caolán McNamara <caolanm@redhat.com> wrote:

On Thu, 2012-02-16 at 23:24 +0000, Richard Wordingham wrote:
Indeed, yeah, I suppose, assuming its as complicated as "Thai", that
the right direction would be for someone to write for icu new
dictionary-based breakiterators for the "nod"(?) language and then the
rather trivial changes to LibreOffice to know about the language in
order to mark text as that language to bubble that info down to icu

Northern Thai's not quite as simple or standardised as Siamese!  One can
meet (at least) the following spelling systems:

1) Chiangmai phonetics
2) Chiangrai phonetics (different mapping of tones to Siamese spelling
rules)
3) Transliteration from Tai Tham script (probably rare for connected
text)
4) Tai Tham script

However, perhaps dictionary-based break iterators are something to be
treated like dictionaries.  There are several other writing systems
that could probably benefit from them:

Thai script:
  Northern Thai
  NE Thai (for recording songs - use of Siamese tone rules scrambles
  the tonemarks compared to Siamese cognates)

Khmer script:
  Khmer - there's already a project for this set up on SourceForge.
  Pali

Tai Tham script:
  Tai Khuen
  Tai Lue
  Pali

Lao script
  Lao

Tibetan script
  Tibetan

I've a feeling Burmese may also have a need for dictionary based text
breaking, though it's better behaved for syllable breaking than most of
the others listed here.  Shan would come in the same category.

The above list is not exhaustive.  Tai Lue in Lao script probably
belongs in the list.

Not all Thai script writing systems need a break iterator - some of the
minority languages separate words with spaces, but that's partially a
matter of literacy - Thais start writing Thai with interword gaps and
then learn to suppress the gaps.  Pali written in Thai also separates
words with spaces - but Pali has some very long words!

Richard.

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.