On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote:
That was something I was thinking about the other day - given than the
bulk of our strings are pure 7-bit ASCII, it might be a worthwhile
optimisation to store a bit that says "this string is 7-bit ASCII", and
then store the string as a sequence of bytes.
Optimisation ? :-) IMHO the ideal is to store all strings as UTF-8
underneath the hatches anyway. All the people I've discussed this with
that objected to that, turned out (after some discussion) to have a weak
understanding of UTF-8, UTF-16 and of rendering complex text ;-) Of
course, perhaps I should discuss with more people.
The only problem with a change there is our ABI - which explicitly
exposes the encoding of that.
The latest Java VM does this trick internally - it pretends that String
is stored with an array of 16-bit values, but actually it stores them as
UTF-8.
Interesting - for all strings ? is there a pointer to the code / docs
for that detail somewhere ? :-) Last I looked Java also stored partial
strings chained to it's parent; so 'substring' takes a reference on the
parent (be it ever so large), and can return a single character string
out of it without re-allocation. IIRC that can cause huge grief when
parsing big files into little ones ;-)
Even in an app running in a language other than US-English, strings are
used for so many internal things that >90% of the strings are 7-bit ASCII.
Sure - so define the define, see what it prints, and do the quick
calculation of how much time/space we save by doing it :-)
Then again - last I looked we still had some real dumbness that needed
hunting down relating to many (tens of?) thousands of allocations and
frees of the "/" string at startup ;-)
ATB,
Michael.
--
michael.meeks@suse.com <><, Pseudo Engineer, itinerant idiot
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.