Hi, On Wednesday, 2019-04-17 22:11:58 +0100, Richard Wordingham wrote:
Is there a pointer as to which tag sequences that "strictly follow the BCP 47 language tag specification" are "correct"?
"strictly" here means, do not invent stuff, specifically not anything that is not covered by the syntax defined, e.g. es-ES_tradnl is not a valid language tag; do not invent language codes, tags or subtags, do not use unassigned language codes. Be aware that an "x-..." private use tag indeed *does* mean private and thus should not be stored in documents that reach the wild. "correct" here also means, furthermore than being strict it should make sense.. e.g. assigning a ...-Latn tag to the CTL category does not make sense, a language-script combination that does not exist also doesn't make sense.
As far as I can tell, the following all strictly follow the specification:
Yes.
"sa-IN" Sanskrit as used in India - so far as I can tell, that could be in, for example, Devanagari, Grantha or even the Tamil script! For Devanagari at least, I understand that this implies that homorganic nasals may be written using U+0902 DEVANAGARI SIGN ANUSVARA.
If in doubt, ask Microsoft if the in isolang.cxx assigned LCID isn't LANGUAGE_USER_..., here it is LANGUAGE_SANSKRIT 0x044F. Most of these even predate the existance of BCP 47 when only combinations of language code and country code were used (also due to the Java Locale restrictions). What I usually did is, lookup the language at SIL and the Ethnologue and use the most prevalent script as implied default script. Which here https://www.ethnologue.com/language/san would lead to Devanagari, but in this case more important is also what MS assigned the LCID for.
"sa-150" Sanskrit written using European conventions - so, any script, but, at least for Devanagari, the anusvara sign is not used for homorganic nasals.
Though valid, LibreOffice doesn't use the numeric UN M.49 code, it may be accepted but might not work everywhere.
"sa-Deva-150" Sanskrit written in Devanagari in the manner used in Europe.
Same here.
"sa-Latn" Sanskrit written in the Roman script. "sa-Latf" Sanskrit written in Fraktur (I'm not sure that this exists. It might need a hint as to where to find a Fraktur script with a combining candrabindu.)
Both perfectly valid, if they serve any purpose. Though with sa-Latn I doubt there's a use case, so I wouldn't call that "correct" in common sense. I also just learned that sa-Latf somehow exists.. Eike -- GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A
Attachment:
signature.asc
Description: PGP signature