Hi Mike, On Wednesday, 2011-12-14 07:48:49 -0800, Mike Whiteley wrote:
The current libicudata is about 15MB, but it doesn't need to be that large. We can probably get it down to 5MB or less. I took a bit of time on this. 1) The configuration files needed to modify what is actually in icudata are NOT included in the package in libreoffice's repository. A real "source" package can be downloaded, which is what we'll need to do if we want to customize this library.
There's an easier way by using the ICU data library customizer available at http://apps.icu-project.org/datacustom/
2) The bulk of the data in (our non-source package) icudata comes from an input file ./source/data/in/icudt44l.dat. This file can be removed which causes the resulting icudata library to go from 15MB to 4.4MB.
The configurable data lib is 13945 KB in size. We can't remove that in its entirety. We need at least, numbers taken from ICU 4.4 http://apps.icu-project.org/datacustom/ICUData44.html * Break Iterator (534 KB) * Collators (4830 KB) * Transliterators (308 KB) From "Miscellaneous Data (4282 KB)" we'd only need parts of. Quite safe to remove are currently * Charset Mapping Tables (3469 KB) * Rule Based Number Format (275 KB) * Formatting, Display Names and Other Localized Data (572 KB) The resulting lib would be 10197 KB (including miscellaneous) respectively 5916 KB (miscellaneous removed). So we could gain between ~4MB and ~8MB. Given that only systems where ICU doesn't already exist (Windows and?) would benefit from this it's surely a benevolent task for some merciful soul ;-)
HELP: We should try this solution first. Will someone please who knows more about icudata see if a library build this way is enough for what libreoffice needs?
The only way to be sure is to exchange the library in the build environment, build, run and test.
3) There are two icudata packages in my repository. Probably one of them can be deleted (also, these are not pure source packages anyways).
Which repository? If the external sources tarballs downloaded during build, then one probably can be removed, at least LibO 3.4 and later use ICU 4.4.2, if built internally at all.
5) Keep in mind my knowledge of icudata is still very limited, and this information is only from me reading their web pages for 20 min, and me messing with code for another 20 minutes.
With good results :-)
Anyways, I just thought this would be helpful.
Sure, thanks. Would be nice if you could try out stripped-down versions of the library and report back the results. Eike -- LibreOffice Calc developer. Number formatter stricken i18n transpositionizer. GnuPG key 0x293C05FD : 997A 4C60 CE41 0149 0DB3 9E96 2F1A D073 293C 05FD
Attachment:
pgpGAtN26j0IA.pgp
Description: PGP signature