On Thursday 09 of February 2012, Lubos Lunak wrote:
On Thursday 09 of February 2012, Michael Meeks wrote:
Personally, I'd -love- someone to rename ~all of these
RTL_USTR() or RTL_STR() or somesuch - we have enough pointlessly hard to
read and indent coding around the place. Hey - we could even have an:
RTL_USTRING("foo")
that hid all the:
rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("foo"))
madness from sight ;-)
It'd be probably almost the same amount of work to get rid of the macro
completely and add a special ctor to handle string literals (can you
arrange a HackWeek for me soon ;) ? ).
Actually, no need for a week, this one is reasonably simple. Therefore I
challenge RTL_CONSTASCII_USTRINGPARAM to a fight to the death. Its death,
that is.
Attached is a patch that removes the need for RTL_CONSTASCII_USTRINGPARAM
(henceafter called 'macro', because it annoys me to write this even in its
death warrant) when using the OUString ctor. Assuming this code compiles also
with MSVC[*], the result should be at least as good as when using the macro,
except that it is not needed to use the macro.
Technically, the trick is to use "const char (&)[ N ]' , which ensures that
the parameter passed really will be a const char array, which either will be
a string literal, or a real const char array, which in turn is unlikely to be
anything else than a string literal. If OUString had also "const char*" ctor,
then compiler would actually prefer that one to instantiating the template,
but luckily that's not the case.
Note that the patch only removes the need of the macro for ctor calls, but
there can be further changes that allow string literals for other methods.
That would allow deprecating the C-style "overloads" fooAsciiL() and using
proper overloads.
There are few questions about this:
- Technical flaws: As can be seen in the unittest, the only problem is
having "const char [][ N ]" array, where the length for all literals will be
considered to be N-1 and therefore possibly wrong. This is however not used,
and SAL_N_ELEMENTS() (and thus the macro) has the same flaw, so I don't
consider this to be a problem in practice.
- OString is not included in the patch, as that one does have "const char*"
ctor. Since it is inline, it actually does not matter for binary
compatibility, so I can change this, but I didn't want to do this now before
I get some feedback. It should be later changed as well.
- The String class in tools is obsolete AFAIK, and so I do not see the need to
bother with that one.
- I added an explicit check that when converting from an 8-bit string there is
no \0 inside the string (i.e. that the length really matches), but the check
actually triggers a couple of times. One case is editeng's ImpEditEngine
ctor, where I'd expect the problem is real and somebody thought "\0xff"
was "\xff", but I don't know for sure. Other case is sal's
test_oustring_endswith.cxx , which intentionally creates OStrings with \0's
in them and even operates on them, which suggests that OString in fact does
not stop at \0's when handling strings. I'm kind of baffled by this, I
personally consider it a flaw that a string class does this, but the unittest
is very explicit about this. Does somebody know?
- The biggest question probably : I have not made the ctor explicit. That
means that the string literal to OUString conversion is automatic.
I however do not see anything wrong with automatic string literal to OUString
conversions. UI strings, AFAICS, are not included in .cxx files themselves,
but come from elsewhere, so sources contain only technical strings. Therefore
it is a very fair assumption that all string literals are ASCII, since
anything outside of ASCII would presumably need translation anyway, and the
few legal cases (math symbols not in ASCII maybe) can simply be explicit
about it (there will be a runtime warning if this is not the case, and since
it is a string literal, it should be noticed by the developer). Since
non-const data will not trigger this automatic conversion, the room for any
problems caused by the automatic conversion should be minimal and far
outweighted by the convenience. I mean, this way it can be possible to just
do things like
foo( "test", 10, "bar" ) or str = "<tag>"
instead of
foo( OUString( "test" ), 10, OUString( "bar" )) or str = OUString( "<tag>" )
Moreover in fact the explicit OUString doesn't really buy us anything, as
nobody says foo's arguments or 'str' really are OUString, so this is about as
good as Hungarian notation.
So while I can be talked into adding the explicit, I can't find any good
reason for doing that.
Comments?
[*] Could somebody please try if the patch compiles with MSVC? Just building
sal/ including the unittest should be enough.
--
Lubos Lunak
l.lunak@suse.cz
diff --git a/sal/CppunitTest_sal_rtl_strings.mk b/sal/CppunitTest_sal_rtl_strings.mk
index fe6da66..c1db17d 100644
--- a/sal/CppunitTest_sal_rtl_strings.mk
+++ b/sal/CppunitTest_sal_rtl_strings.mk
@@ -32,6 +32,7 @@ $(eval $(call gb_CppunitTest_add_exception_objects,sal_rtl_strings,\
sal/qa/rtl/strings/test_oustring_convert \
sal/qa/rtl/strings/test_oustring_endswith \
sal/qa/rtl/strings/test_oustring_noadditional \
+ sal/qa/rtl/strings/test_oustring_stringliterals \
))
$(eval $(call gb_CppunitTest_add_linked_libs,sal_rtl_strings,\
diff --git a/sal/inc/rtl/ustring.hxx b/sal/inc/rtl/ustring.hxx
index ace4b1a..6c35022 100644
--- a/sal/inc/rtl/ustring.hxx
+++ b/sal/inc/rtl/ustring.hxx
@@ -168,6 +168,65 @@ public:
}
/**
+ New string from an 8-Bit string literal that contains only characters
+ in the ASCII character set. All string literals in the codebase are
+ assumed to be only ASCII, so this constructor allows an efficient
+ and convenient way to create OUString instances from literals.
+
+ @param value the 8-bit string literal
+
+ @exception std::bad_alloc is thrown if an out-of-memory condition occurs
+ */
+ template< int N >
+ OUString( const char (&literal)[ N ] )
+ {
+ pData = 0;
+ rtl_string2UString( &pData, literal, N - 1, RTL_TEXTENCODING_ASCII_US, 0 );
+ if (pData == 0) {
+#if defined EXCEPTIONS_OFF
+ SAL_WARN("sal", "std::bad_alloc but EXCEPTIONS_OFF");
+#else
+ throw std::bad_alloc();
+#endif
+ }
+ }
+
+ /**
+ * This overload exists only to avoid creating instances directly from (non-const) char[],
+ * which would otherwise be picked up by the optimized const char[] constructor.
+ * Since the non-const array cannot be guaranteed to contain only ASCII characters,
+ * this needs to be prevented.
+ *
+ * It is an error to try to call this overload.
+ *
+ * @internal
+ */
+ template< int N >
+ OUString( char (&value)[ N ] )
+ {
+#ifndef RTL_STRING_UNITTEST
+ error_non_const_char_to_OUString_conversion_is_not_automatic( value );
+#else
+ (void) value; // unused
+ pData = 0; // for the unittest create an empty string
+ rtl_uString_new( &pData );
+#endif
+ }
+
+#ifdef RTL_STRING_UNITTEST
+ /**
+ * Only used by unittests to detect incorrect conversions.
+ * @internal
+ */
+ template< typename T >
+ OUString( T )
+ {
+ pData = 0;
+ rtl_uString_new( &pData );
+ }
+#endif
+
+ /**
New string from a 8-Bit character buffer array.
@param value a 8-Bit character array.
@@ -1492,6 +1551,9 @@ public:
values are not converted in any way. The caller has to make sure that
all ASCII characters are in the allowed range between 0 and
127. The ASCII string must be NULL-terminated.
+
+ Note that for string literals it is simpler and more efficient
+ to directly use the OUString constructor.
@param value the 8-Bit ASCII character string
@return a string with the string representation of the argument.
diff --git a/sal/inc/sal/log-areas.dox b/sal/inc/sal/log-areas.dox
index 1af17a3..9b238d7 100644
--- a/sal/inc/sal/log-areas.dox
+++ b/sal/inc/sal/log-areas.dox
@@ -21,6 +21,10 @@ certain functionality.
@li oox.xmlstream - XmlStream class
+@section SAL/RTL
+
+@li rtl.string - rtl::OString, rtl::OUString and related functionality
+
@section VCL
@li vcl.gdi - the GDI part of VCL: devices, bitmaps, etc.
diff --git a/sal/qa/rtl/strings/test_oustring_stringliterals.cxx
b/sal/qa/rtl/strings/test_oustring_stringliterals.cxx
new file mode 100644
index 0000000..142f53d
--- /dev/null
+++ b/sal/qa/rtl/strings/test_oustring_stringliterals.cxx
@@ -0,0 +1,91 @@
+/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
+/*************************************************************************
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * Copyright 2000, 2010 Oracle and/or its affiliates.
+ *
+ * OpenOffice.org - a multi-platform office productivity suite
+ *
+ * This file is part of OpenOffice.org.
+ *
+ * OpenOffice.org is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License version 3
+ * only, as published by the Free Software Foundation.
+ *
+ * OpenOffice.org is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License version 3 for more details
+ * (a copy is included in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * version 3 along with OpenOffice.org. If not, see
+ * <http://www.openoffice.org/license.html>
+ * for a copy of the LGPLv3 License.
+ *
+ ************************************************************************/
+
+// activate the extra needed ctor
+#define RTL_STRING_UNITTEST
+
+#include "sal/config.h"
+#include "sal/precppunit.hxx"
+
+#include <cppunit/TestFixture.h>
+#include <cppunit/extensions/HelperMacros.h>
+#include "rtl/string.h"
+#include "rtl/ustring.hxx"
+
+namespace test { namespace oustring {
+
+class StringLiterals: public CppUnit::TestFixture
+{
+private:
+ void checkCtors();
+
+ void testcall( const char str[] );
+ // invalid conversions will trigger templated OUString ctor that creates an empty string
+ // (see RTL_STRING_UNITTEST)
+ bool validConversion( const rtl::OUString& str ) { return !str.isEmpty(); }
+
+CPPUNIT_TEST_SUITE(StringLiterals);
+CPPUNIT_TEST(checkCtors);
+CPPUNIT_TEST_SUITE_END();
+};
+
+void test::oustring::StringLiterals::checkCtors()
+{
+ CPPUNIT_ASSERT( validConversion( rtl::OUString( "test" )));
+ const char good1[] = "test";
+ CPPUNIT_ASSERT( validConversion( rtl::OUString( good1 )));
+
+ CPPUNIT_ASSERT( !validConversion( rtl::OUString( (const char*) "test" )));
+ const char* bad1 = good1;
+ CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad1 )));
+ char bad2[] = "test";
+ CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad2 )));
+ char* bad3 = bad2;
+ CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad3 )));
+ const char* bad4[] = { "test1" };
+ CPPUNIT_ASSERT( !validConversion( rtl::OUString( bad4[ 0 ] )));
+ testcall( good1 );
+
+// This one is technically broken, since the first element is 6 characters test\0\0,
+// but there does not appear a way to detect this by compile time (runtime will complain).
+// RTL_CONSTASCII_USTRINGPARAM() has the same flaw.
+ const char bad5[][ 6 ] = { "test", "test2" };
+// CPPUNIT_ASSERT( validConversion( rtl::OUString( bad5[ 0 ] )));
+ CPPUNIT_ASSERT( validConversion( rtl::OUString( bad5[ 1 ] )));
+}
+
+void test::oustring::StringLiterals::testcall( const char str[] )
+{
+ CPPUNIT_ASSERT( !validConversion( rtl::OUString( str )));
+}
+
+}} // namespace
+
+CPPUNIT_TEST_SUITE_REGISTRATION(test::oustring::StringLiterals);
+
+/* vim:set shiftwidth=4 softtabstop=4 expandtab: */
diff --git a/sal/rtl/source/ustring.cxx b/sal/rtl/source/ustring.cxx
index a37353c..0314e36 100644
--- a/sal/rtl/source/ustring.cxx
+++ b/sal/rtl/source/ustring.cxx
@@ -39,6 +39,7 @@
#include <string.h>
#include <sal/alloca.h>
+#include <sal/log.hxx>
#include "hash.hxx"
#include "strimp.hxx"
@@ -594,8 +595,11 @@ static void rtl_string2UString_status( rtl_uString** ppThis,
do
{
/* Check ASCII range */
- OSL_ENSURE( ((unsigned char)*pStr) <= 127,
+ SAL_WARN_IF( ((unsigned char)*pStr) > 127, "rtl.string",
"rtl_string2UString_status() - Found char > 127 and
RTL_TEXTENCODING_ASCII_US is specified" );
+ /* Check given length (no \0 in the string) */
+ SAL_WARN_IF( ((unsigned char)*pStr) == 0, "rtl.string",
+ "rtl_string2UString_status() - Found char == \'\\0\' and
RTL_TEXTENCODING_ASCII_US is specified" );
*pBuffer = *pStr;
pBuffer++;
Context
- Obsoleting RTL_CONSTASCII_USTRINGPARAM · Lubos Lunak
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.