On Mon, Oct 01, 2012 at 01:58:24PM +0200, Michael Stahl wrote:
On 01/10/12 13:25, Michael Meeks wrote:
On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote:
That was something I was thinking about the other day - given than
the bulk of our strings are pure 7-bit ASCII, it might be a
worthwhile optimisation to store a bit that says "this string is
7-bit ASCII", and then store the string as a sequence of bytes.
Optimisation ? :-) IMHO the ideal is to store all strings as UTF-8
underneath the hatches anyway.
The only problem with a change there is our ABI - which explicitly
exposes the encoding of that.
of course this would only affect C++ binding (and possibly Python -- am
not up to date how that does Unicode; there are differences between 2
and 3 iirc; of course we should migrate to Python 3 as well...)
How the Python2 and Python 3.2 C ABIs deal with strings is ... a
compile-time option! It can be UCS2 or UCS4. The actual type
(Py_UNICODE) can be a typedef for wchar_t, unsigned short or unsigned
long.
http://docs.python.org/c-api/unicode.html
Python 3.3 and later, on the other hand, switches between ASCII, UCS1,
UCS2 and UCS4 on the fly depending on the contents of this particular
string.
http://docs.python.org/py3k/c-api/unicode.html
--
Lionel
Context
Re: OUString is mutable? · Stephan Bergmann
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.