Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index



On 2012-10-01 12:38, Michael Meeks wrote:
We could do some magic there; of course - space is a bit of an issue - we already pointlessly bloat bazillions of ascii strings into UCS-2 (nominally UTF-16) representations and nail a ref-count and length on the beginning. If you turn on the lifecycle diagnostics in sal/rtl/source/strimp.hxx with the #ifdef and re-build sal, you can start to see the scale of the problem when you launch libreoffice ;-)

Changing subject because I'm changing the topic.

That was something I was thinking about the other day - given than the bulk of our strings are pure 7-bit ASCII, it might be a worthwhile optimisation to store a bit that says "this string is 7-bit ASCII", and then store the string as a sequence of bytes.

The latest Java VM does this trick internally - it pretends that String is stored with an array of 16-bit values, but actually it stores them as UTF-8.

Even in an app running in a language other than US-English, strings are used for so many internal things that >90% of the strings are 7-bit ASCII.


Disclaimer: http://www.peralex.com/disclaimer.html



Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.