Date: prev next · Thread: first prev next last
2019 Archives by date, by thread · List index


Hi,

On Wednesday, 2019-04-17 22:11:58 +0100, Richard Wordingham wrote:

Is there a pointer as to which tag sequences that "strictly follow the
BCP 47 language tag specification" are "correct"?

"strictly" here means, do not invent stuff, specifically not anything
that is not covered by the syntax defined, e.g. es-ES_tradnl is not
a valid language tag; do not invent language codes, tags or subtags, do
not use unassigned language codes. Be aware that an "x-..." private use
tag indeed *does* mean private and thus should not be stored in
documents that reach the wild.

"correct" here also means, furthermore than being strict it should make
sense.. e.g. assigning a ...-Latn tag to the CTL category does not make
sense, a language-script combination that does not exist also doesn't
make sense.

As far as I can tell, the following all strictly follow the
specification:

Yes.

"sa-IN" Sanskrit as used in India - so far as I can tell, that could be
in, for example, Devanagari, Grantha or even the Tamil script!  For
Devanagari at least, I understand that this implies that homorganic
nasals may be written using U+0902 DEVANAGARI SIGN ANUSVARA.

If in doubt, ask Microsoft if the in isolang.cxx assigned LCID isn't
LANGUAGE_USER_..., here it is LANGUAGE_SANSKRIT 0x044F. Most of these
even predate the existance of BCP 47 when only combinations of language
code and country code were used (also due to the Java Locale
restrictions).

What I usually did is, lookup the language at SIL and the Ethnologue and
use the most prevalent script as implied default script. Which here
https://www.ethnologue.com/language/san would lead to Devanagari, but in
this case more important is also what MS assigned the LCID for.

"sa-150" Sanskrit written using European conventions - so, any script,
but, at least for Devanagari, the anusvara sign is not used for
homorganic nasals.

Though valid, LibreOffice doesn't use the numeric UN M.49 code, it may
be accepted but might not work everywhere.

"sa-Deva-150" Sanskrit written in Devanagari in the manner used in
Europe.

Same here.

"sa-Latn" Sanskrit written in the Roman script.

"sa-Latf" Sanskrit written in Fraktur (I'm not sure that this exists.
It might need a hint as to where to find a Fraktur script with a
combining candrabindu.)

Both perfectly valid, if they serve any purpose. Though with sa-Latn
I doubt there's a use case, so I wouldn't call that "correct" in common
sense.

I also just learned that sa-Latf somehow exists..

  Eike

-- 
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918  630B 6A6C D5B7 6563 2D3A

Attachment: signature.asc
Description: PGP signature


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.