Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


On 10/01/2012 06:05 PM, Dennis E. Hamilton wrote:
Regarding the mention that the latest Java VM is using UTF8 internally instead of unsigned short 
arrays is rather daunting.  There is an easy way to test it -- see if char values that are not 
admissible UTF16 codes can be used in construction of a string and then extracted correctly.  If 
they can, there is no way that transformation to and from UTF8 occurred.  If they can't, it is an 
interesting breaking change in Java.  With regard to string literals, it would be interesting to 
see what can be introduced into those via escape codes too.

Note that the JVM traditionally also makes use of a modified form of UTF-8 (encoding surrogate code points individually, and encoding \u0000 as 0xC0 0x80), see the JNI spec.

Stephan

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.