Hi Tomáš, On Saturday, 2012-03-31 21:52:06 +0200, Tomáš Chvátal wrote:
Gentoo dev reporting the issue actually tried to write patch. [1] It is backported from ICU upstream. [2] Let me know if it is correct enough for commiting and I will do it. [1] http://people.apache.org/~Arfrever/libreoffice-3.5.2.2-icu-49.patch [2] https://ssl.icu- project.org/trac/changeset/31071/icu/trunk/source/data/brkitr/char.txt
It took me some digging around to find what actually the comment in [2] "TODO: Restore if the Prepend set becomes non-empty again" was referring.. so, according to UAX #29 rev19 for Unicode version 6.1 we have in section 3.1 http://www.unicode.org/reports/tr29/tr29-19.html#Default_Grapheme_Cluster_Table Prepend: (Currently there are no characters with this value.) UAX #29 rev17 for Unicode Version 6.0 was the last revision that listed some characters for Prepend, and Unicode Version 6.1 changed that handling. Note that Prepend characters were defined only for THAI, LAO and TAI VIET, see http://www.unicode.org/reports/tr29/tr29-17.html#Default_Grapheme_Cluster_Table and as our char_in.txt is used only for Indic languages the Prepend rule shouldn't have any effect there anyway. Or I think so ... So yes, Tomáš, I think it's safe to commit the patch. Btw, we have a slight problem here, specifically the case if Prepend became non-empty again we wouldn't notice other than polling UAX #29 changes, and in general using modified RBBI rules based on maybe completely outdated rules we once adapted for an ancient ICU version. Problem is that no one can judge on them other than native speakers AND those have to be Unicode segmentation rules and ICU RBBI rules savvy ... Eike -- LibreOffice Calc developer. Number formatter stricken i18n transpositionizer. GnuPG key 0x293C05FD : 997A 4C60 CE41 0149 0DB3 9E96 2F1A D073 293C 05FD
Attachment:
pgpC6RsIKzX1U.pgp
Description: PGP signature