Date: prev next · Thread: first prev next last
2017 Archives by date, by thread · List index


Hi Stephan,

Thanks a lot for your reply.

On Mon, 23 Jan 2017 10:26:09 +0100, Stephan Bergmann <sbergman@redhat.com> wrote:
On 01/20/2017 03:25 AM, Takeshi Abe wrote:
Preparing a patch for tdf#105382 [1], I come across a question about
character encoding for the path part of a URL representing a
com.sun.star.frame.XStorable's location.
I wonder if the original (before percent-encoded) path of such a URL can
be in an encoding other than UTF-8 or even in a different charset due
to e.g. a code page of some legacy filesystems.
Is it possible?
And, if so, is there any reasonable way to tell the encoding?

A conforming URL itself, by definition, is written with a subset of ASCII-only
characters.

For file URLs, there never was a definition how to interpret the octets encoded
in the URL's path component, so OOo/LO came up with the convention of always
interpreting those as UTF-8.  (So any code that converts between file URLs and
native pathnames needs to do that mapping between UTF-8 and the relevant native
pathname encoding, which LO assumes to be as reported by
osl_getThreadTextEncoding.)
Got it. What should be done for tdf#105382 becomes clear now.

IIUC the basic strategy to encode a file URL for UNO is the same as a current
standard [1] describing in section "2.5. Identifying Data":
(...) A
system that internally provides identifiers in the form of a
different character encoding, such as EBCDIC, will generally perform
character translation of textual identifiers to UTF-8 [STD63] (or
some other superset of the US-ASCII character encoding) at an
internal interface, thereby providing more meaningful identifiers
than those resulting from simply percent-encoding the original
octets.

[1] https://tools.ietf.org/html/rfc3986

Cheers,
-- Takeshi Abe

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.