Hi Richard, On Monday, 2015-06-29 22:27:45 +0100, Richard Wordingham wrote:
1) Determine script from character(s). 2) Categorise script as Western/CTL/CJK
Sounds good.
3) Locale is then the Western locale, the CTL locale or the CJK locale as appropriate.
That's more or less what we do already. If a portion of text has a Western and a CJK locale assigned, it depends on the script used in the text which one is actually taken for a segment of text.
Unless one first categorises the script, one does not know what the language is.
Unless the user wants to assign it, for example if s/he wants to assign a language tag (note again, I'm talking of BCP 47 here) before there is any content.
Now, with more support, one may need the script. For example, a Serbian date field should depend on the script (Latin v. Cyrillic) as well as just the language, and Serbian is not the only language using competing scripts in the same class. However, what a date field picks up from its environment is curious. If I copy a Thai date field and paste it into the middle of an English word, I get a date in English!
That's quite certainly an implementation detail that could be solved and not the general W/C/C classification problem. Eike -- LibreOffice Calc developer. Number formatter stricken i18n transpositionizer. GPG key "ID" 0x65632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A Better use 64-bit 0x6A6CD5B765632D3A here is why: https://evil32.com/ Care about Free Software, support the FSFE https://fsfe.org/support/?erack
Attachment:
pgpMYFgI2l3Tw.pgp
Description: PGP signature