Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index



On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote:
That was something I was thinking about the other day - given than the 
bulk of our strings are pure 7-bit ASCII, it might be a worthwhile 
optimisation to store a bit that says "this string is 7-bit ASCII", and 
then store the string as a sequence of bytes.

        Optimisation ? :-) IMHO the ideal is to store all strings as UTF-8
underneath the hatches anyway. All the people I've discussed this with
that objected to that, turned out (after some discussion) to have a weak
understanding of UTF-8, UTF-16 and of rendering complex text ;-) Of
course, perhaps I should discuss with more people.

        The only problem with a change there is our ABI - which explicitly
exposes the encoding of that.

The latest Java VM does this trick internally - it pretends that String 
is stored with an array of 16-bit values, but actually it stores them as 
UTF-8.

        Interesting - for all strings ? is there a pointer to the code / docs
for that detail somewhere ? :-) Last I looked Java also stored partial
strings chained to it's parent; so 'substring' takes a reference on the
parent (be it ever so large), and can return a single character string
out of it without re-allocation. IIRC that can cause huge grief when
parsing big files into little ones ;-)

Even in an app running in a language other than US-English, strings are 
used for so many internal things that >90% of the strings are 7-bit ASCII.

        Sure - so define the define, see what it prints, and do the quick
calculation of how much time/space we save by doing it :-)

        Then again - last I looked we still had some real dumbness that needed
hunting down relating to many (tens of?) thousands of allocations and
frees of the "/" string at startup ;-)

        ATB,

                Michael.

-- 
michael.meeks@suse.com  <><, Pseudo Engineer, itinerant idiot


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.