On Wednesday 22 of February 2012, Stephan Bergmann wrote:
On 02/22/2012 11:25 AM, Michael Meeks wrote:
Great ! :-) incidentally, I had one minor point around the ASCII vs.
UTF-8 side; the rtl_string2UString (cf. sal/rtl/source/string.cxx) does
a typically slower UTF-8 length counting loop; I suggest that we could
do better performance wise (and we do create a biggish scad of these
strings) by sticking with ascii, and doing a single, simple copy/expand
of the string. Perhaps in a new rtl_uString_newFromAsciiL method.
Actually rtl_string2UString() is reasonably optimized for the case when the
data is ASCII or UTF-8-that-in-fact-is-ASCII, so the one loop analysing the
contents is the only overhead. Makes me wonder if avoiding that one loop is
really worth it. I'll go with 'no' for the time being, until somebody shows
me otherwise.
Thinking about it again, the restriction to ASCII could become a
hindrance in the longer run. C++11 has provision for UTF-8 string
literals (u8"..."), but they still have type char const[], so are not
distinguishable from traditional plain "..." literals via function
overloading. So, if we ever wanted to extend the new facilities to also
support UTF-8 string literals, but would want to keep the performance
benefit for the ASCII-only case, we could not offer the same simple syntax
rtl::OUString("foo");
rtl::OUString(u8"I\u2764C++");
for both.
We could have OUString::fromUtf8( utf8literal ), which I consider acceptable,
especially given that IMO we are unlikely to have a larger number of utf8
literals anyway. But I think it's better to go for utf8 always and optimize
if we find out it's worth it.
I thought there could be a way to test string literal contents at
compile-time, but string literals are not considered to be compile-time
constants just because the standard says so, so templates can't take them as
arguments, and while I've eventually found a way to do it, based on
http://www.macieira.org/blog/2011/07/initialising-an-array-with-cx0x-using-constexpr-and-variadic-templates/
,
see attachment, it turns out to be unusable in practice. Maybe later.
--
Lubos Lunak
l.lunak@suse.cz
// With gcc-4.5.1 this is awfully slow to compile.
// Also, for longer strings the computation is no longer done at compile
// time and instead code for handling it at runtime is generated.
#include <stdio.h>
constexpr inline
int sum()
{
return 0;
}
template< typename... T >
constexpr inline
int sum( int v1, T... v2 )
{
return v1 + sum( v2... );
}
// TODO BUG
// This is the other way around, it should in fact lead to skipping ret-1
// following characters, so this needs to be handled as
// { utf8LengthChar( s[ i ] )... ) } (i.e. array) to ensure ordering.
constexpr inline
int utf8LengthChar( unsigned char c )
{
return !( c & 0x80 ) ? 1
: ( c & 0xe0 ) == 0xc0 ? 2
: ( c & 0xf0 ) == 0xe0 ? 3
: ( c & 0xf8 ) == 0xf0 ? 4
: ( c & 0xfc ) == 0xf8 ? 5
: ( c & 0xfe ) == 0xfc ? 6
: 1;
}
template< int... >
struct IndexList
{
};
template< typename IndexList, int Right >
struct Merge;
template< int... Left, int Right >
struct Merge< IndexList< Left... >, Right >
{
typedef IndexList< Left..., Right > Range;
};
template< int N >
struct Indexes
{
typedef typename Merge< typename Indexes< N - 1 >::Range, N >::Range Range;
};
template<>
struct Indexes< 0 >
{
typedef IndexList<> Range;
};
template< int N, typename T >
struct Utf8LengthHelper;
template< int N, int... i >
struct Utf8LengthHelper< N, IndexList< i... > >
{
constexpr inline Utf8LengthHelper( const char s[ N ] )
: value( sum( utf8LengthChar( s[ i ] )... ))
{
}
const int value;
};
template< int N >
constexpr inline int utf8Length( const char s[ N ] )
{
return Utf8LengthHelper< N, typename Indexes< N >::Range >( s ).value;
}
template< int N >
inline
void foo( const char (&s)[ N ] )
{
fprintf( stderr, "%s %d\n", s, utf8Length< N - 1 >( s ));
}
int main()
{
foo( "testé" );
}
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.