Re: Your presentation on LibreOffice code

Arvind Kumar <arvind.kumar -AT- rocketmail.com>
Sat, 21 Mar 2020 17:02:52 +0000 (UTC)

On Fri, 20 Mar 2020 Jan-Marek Glogowski wrote:

Hmm - I know fcitx uses some kind of tables for the direct mappings. My
Debian has fcitx-table-emoji. Guess that would be the easiest starting
point, if your languages typed letters don't depend already existing
previous or next letters and just need some keys to code point mapping.

There are two separate issues here - keyboard input and display of the glyph. Leaving aside for the
moment the input mechanism assuming that I have done what you suggest, I'd like to understand the
code dealing with the display mechanism in LO. This is because even if some external method did the
input mappings and the keycode came into LO as a result of those mappings, the problem here is that
although everything works fine in the case of copy-paste, it is not the same with keyboard input.

In the case of keyboard input, the keycodes that have a value above 65535 get truncated to short
when it passes through various layers of functions that handle the codes. The PUAs I use are values
greater than 65535.
As an example, the values of keyval and aOrigCode in the arguments of GtkSalFrame::doKeyCallback
are both 97 when you type the letter 'a' on the standard keyboard. Printing the individual elements
of the array pStr in CommonSalLayout::LayoutText, you see the value 97 printed here. Now change the
97 to a PUA value in doKeyCallback (e.g.: 1051531) and you see that the corresponding value printed
in LayoutText is the truncated value (printed value of 2955 for 1051531). 2955 is the value that
will be printed when an integer type containing 1051531 is written into a short type and printed.
I also see that uInt16 is used in many places in the code.

At this point, I just want to understand the flow. I'm not suggesting that LO make any change.
Where in the code do the key values get handled as they are typed in and where in the code do they
get mapped to the value needed for displaying the glyph. I assume the value for display will be
encoded in UTF-8. I'd like to know where in the source code that happens as well.

Yup. No LO changes needed, unless you find some bug.

I'm definitely not suggesting changes, but am trying to understand the code as I explained above. 
However, I would also not rule out the possibility that copy-paste part of the code works well 
because it correctly reads the UTF-8 encoded values of the codepoints expected by the font file, 
while the keyboard input results in these values being incorrect as they pass through various 
layers of the program. I just want to know what these layers are.

I'm not sure I understand you. Is this a Gtk-only problem, so qt5 or kf5
works? I'm not aware of any restriction regarding file names. Sure Gtk+
and Qt5 default to utf-8 encoding, but that should just work. Or do they
reject PUA code points (which IMHO makes sense, because a filename has
no font).


Not sure about other systems, but GNOME restricts to valid unicode values. It does not reject PUA 
but rejects 32 bit values encoded in UTF-8. I wrote my own UTF-8 encoding mechanism that would take 
32 bit values but some GNOME functions fail which is why I mapped my coding system to PUAs. As far 
as this discussion for LO's functionality is concerned, it is only related to PUA values.

From the filesystem POV it's all just bytes.


This is not related to LO, but this is where many GNOME libraries impose the restriction. It does 
not follow the filesystem of filenames being just bytes. If you try using a g_filesystem* function 
and pass a filename containing a character which is not approved by the Unicode Consortium, it will 
fail. GNOME is not agnostic to various Standards out there but follows the Standards set by some 
organizations. Of course, in those cases, I just use fopen or related calls.

-a

Context

Re: Your presentation on LibreOffice code · Arvind Kumar
- Re: Your presentation on LibreOffice code · Jan-Marek Glogowski

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.