Date: prev next · Thread: first prev next last
2017 Archives by date, by thread · List index


I am trying to put together a workable solution for spell-checking
Northern Thai in the Lanna (a.k.a. Tai Tham) script.  I have a good idea
how to do it, and it is already working in Firefox.  The solution may
not be suitable for run of the mill users, but I don't believe run of
the mill users need the solution.  Additionally, a Thai or English user
interface is probably better than a Northern Thai interface.

There are a number of problems, but the significant ones all relate to
fonts.  The others are all soluble.

1) The Universal Script Engine

The Universal Script Engine inserts far too many dotted circles into
Tai Tham text.  Most closed syllables cannot be written in accordance
with Unicode's principle of phonetic ordering, and some cannot be
written at all.  This I have overcome by creating a font that removes
inappropriate dotted circles.

This turns the Universal Script Engine into a solution for DirectWrite,
HarfBuzz and AAT.

2) Scriptio Continua

The Tai languages in the Tai Tham script do not separate words by
spaces.  The old solution to this problem, U+200B ZERO WIDTH SPACE,
works.  (By contrast, Pali, at least in modern texts, tends to have
spaces between words, as is done in Pali in the Thai script.
Significant sandhi may suppress the word-breaks.)

3) Northern Thai is not supported by LibreOffice

It is, however, supported by Open Document Format.  The solution is to
edit the XML file to set the CTL language in the XML, and then propagate
and edit text for which nod-TH is the CTL language.

The lack of a Northern Thai interface is probably not a problem.  Any
need for it is emotional rather than practical.

It is possible that Burmese, Chinese, English and possibly Lao
interfaces will similarly cater for Tai Khuen and Tai Lue users. 

4) Visually Ambiguous Spelling

Words that normally look identical may be sorted and pronounced
differently.  Actually, there are surprisingly few visual homographs
with such differences.

So that users may see what they are typing, the solution I have adopted
is to colour code the glyphs so that users can see whether a consonant
precedes or follows the vowel of the syllable in coding and phonetic
order.

5) Font Support

Does LibreOffice support any type of multi-colour font?  I may have to
devise a shape difference to indicate the spelling, which is less
appealing.  This would be most important in choosing a spelling
correction.

To see what it is that one has actually typed, switching to a
transliteration font and then undoing the change is one approach. 

6) Font Selection

How does one control the font used in the spell-checking interface?  I
am particularly interested in the solution for Ubuntu, but it would be
good to also know the solution for Windows.  For Ubuntu, I suspect the
answer will lie in Fontconfig, but I first need to know how to identify
the font that LibreOffice tries to use.  Fontconfig would work by
controlling the fallback.

Even without grammar coding, there may be an issue in that some Lanna
script fonts are barely usable in the User Interface - readable Northern
Thai text can need much greater vertical extent than English, depending
on the style.

7) Dictionary Creation

I currently have a large, working Northern Thai dictionary.  I do need
to sort out IP issues before I can share it.  Even then, there needs to
be a lot of shake-down testing to eliminate my typographical errors,
and birds, fish and trees need to be added.


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.