As a newcomer I am trying out LibreOffice 220.127.116.11 and is quite satisfied
with most of its features, especially the experimental sidebar panel.
However, several issues occurred when I was playing around with Unicode
characters in LO.
Environment: MS Win7 64bit Ultimate English with French and Chinese language
pack, non-Unicode code page: Chinese (CP936), LibreOffice 18.104.22.168, JRE
1.7.0_09 (both 32-bit and 64-bit).
THE "SPECIAL CHARACTERS" DIALOG
1. no unicode code point entry field in the dialog
Let's first talk about issues regarding the Special Characters dialog.
Unlike MS Word, it does not feature a Unicode >code point entry field. This
causes some inconvenience if you know the code point of a certain glyph and
need to >insert it regularly. The “Compose Character” extension only solves
part of the puzzle, which will be explained later >in issues with non-zero
2. wrong detection of unicode ranges
A more complicated problem with the Special Character dialog is the
mis-identification of supported code points in >a font—LO doesn't seem to
handle this thing correctly at all. Instead, it displays blocks or squared
question marks >or blanks for the unsupported glyphs in a Unicode range
partially supported by the font, and glyphs from fallback >fonts assigned by
the OS for a Unicode range the font totally does not support—with almost no
sign of suppressing >these unsupported glyphs or ranges from display. Only
very few fonts are correctly identified (Cardo, Code2000 >and Quivira, for
To illustrate this, install FreeSerif (version 0412.2263), and bring up
the Special Characters dialog in Writer. >Switch to this font, and go to the
“Tibetan” subset. FreeSerif does not support Tibetan glyphs as of now, but
LO still >assumes it supports, and therefore displays strings of boxed
question marks in this range.
GLYPHS IN NON-ZERO UNICODE PLANES
3. limited support for fonts with non-zero plane glyphs?
More serious issues are found in LO’s support for non-zero plane Unicode
glyphs. For many fonts, LO doesn't seem >to support non-zero plane Unicode
glyphs even though the font itself does. Pasting the glyph from external
applications doesn't help either, regardless of the text format:
unformatted, RTF or HTML. The glyph is shown as >square, question mark or
For example, you try to input in LO Unicode character U+1F374 (fork and
knife) which is supported by the font >Segoe UI Symbol (version 5.01). It is
not shown in the Special Characters dialog—the highest code point available
(for this font) is limited within the BMP plane (plane 0). You open MS Word
2010, and find that in >Insert→Symbols, this glyph is correctly displayed
and inserted when you switch to this font. You select this glyph, >copy it,
and paste it into LO. The glyph becomes square. Paste it into Windows
Notepad (ensuring that Segoe UI >Symbol is the font used), the glyph is
again displayed correctly. You copy the glyph again from Notepad, still it
is >shown as square in LO.
In fact, among all the fonts with non-zero plane glyphs installed on my PC
(namely: freeserif, freesans, freemono, >code2001, code2002, WenQuan Yi Zen
Hei, Cardo, PMingLiU-ExtB, Quivira, Segoe UI Symbol, Sun-ExtB, >SunManPUA,
SimSun-ExtB, and SimSun(Founder Extended)), only code2001, Cardo and Quivira
are correctly >identified for the non-zero plane glyphs the three fonts each
support. The other fonts either do not expose their >non-zero plane glyphs
in LO or do so only sometimes.
4. no built-in mechanism to produce glyphs from typed code points
Nor can you work around this issue by manually specifying code points, as
is usually done in MS Word (to generate >the glyph previously discussed, key
in 1F374, and press [Alt]+[X]). There is no built-in mechanism in LO to
generate a glyph from typed code points. One may argue the “compose
character” extension will work, but this >extension adopts a fairly obsolete
Unicode standard and does not recognize any code point higher than U+FFFF,
i.e. no support for non-zero plane glyphs. Some professionals would suggest
configuring the Windows registry to >allow hexadecimal Unicode Alt input
using the Numpad, but this method also does not support non-zero planes.
Therefore, basically you could only manipulate non-zero plane glyphs in LO
with a fairly limited set of fonts which >may not share the same typeface
with your document.
5. unstable detection of unicode ranges: weird behaviors in the Special
This compatibility issue is further complicated by some strange behavior
with the Special Character dialog in >Writer. Sometimes, when you switch to
a font which supports non-zero plane glyphs after browsing or inserting
glyphs using another un-supporting font in Special Characters dialog, the
non-zero plane Unicode ranges >supported by this new font disappears in the
“subset” drop-down, and remains invisible until you restart Writer.
Conversely, when you switch to a font which does not support non-zero plane
glyphs after browsing or inserting >glyphs using another supporting font, LO
still tries to display the last used glyph in the new font, and occasionally
even the last used Unicode range. What you see is, of courses, boxes,
question marks or blanks.
Interestingly enough, this issue does not occur every time I use Writer.
Two hours ago I was encountering this >issue all the way, but right now,
with no settings configured, new programs launched or updates installed
during >the interval, Writer appears to be comparatively more stable in
terms of non-zero plane support, although still not >all Unicode ranges are
correctly detected for all fonts.
MESSED UP WITH MIXED-SCRIPT DOCUMENTS
Some of the most dramatic blunders I have came across are found in
documents with multiple languages. >Generally, support for documents with
multiple writing systems and/or complex scripts exhibit defects in four
aspects: broken bi-di display, broken glyph display, improperly spaced
glyphs, and wrongly applied fonts and >glyphs.
A typical example which illustrates this issue to its full extent is the
UTF-8 test page from Columbia University. >This test page features glyphs
from a number of writing systems of different families, and can be used as a
stress >test for correct rendering of Unicode glyphs in different
applications. Internet Explorer displayed the page >perfectly without any
glitches, and Microsoft Word, as well as Firefox, also scored high.
LibreOffice, however, >failed to present a number of lines correctly. The
renditions by IE, MS Word and LO are printed using PDFCreator >into three
PDF files respectively. You can compare them for future research. I have
highlighted lines that are >evidently ill-displayed in these files.
6. broken bi-di display in mixed-script document
LO Writer failed to display lines with LRT texts mixed with RTL texts,
including Pashto, Persian/Farsi, Hebrew, >Yiddish, Arabic, Hebrew, Yiddish,
and Urdu. These lines appear completely in RTL direction and may sometimes
display glyphs abnormally overlapped. However, pasting these lines one by
one unformatted into a new document >resolves the problem. This is not seen
in IE or MS Word.
7. Broken glyph display in mixed-script document
LO Writer failed to display glyphs in several languages, including Gothic,
Bengali, Telugu, Sinhalese, Burmese, >Vietnamese (nôm), Khmer, Lao and
Tibetan. Not all glyphs are shown for Vietnamese (nôm), and the diacritics
did >not combine in Lao. It appears LO Writer was not able to automatically
assign fonts to these writing systems >properly. One has to do so manually,
which still leaves Gothic and Vietnamese (nôm) ill-formed in LO, probably
because the two systems use glyphs from non-zero planes. This is not seen
in IE, or MS Word (after manually >setting font for Gothic).
8. problematic character spacing
LO Writer failed to space glyphs properly for certain fonts in certain
writing systems. One example is the Runes >glyphs used for ancient
Scandinavian texts displayed using Code2001. The glyphs are densely
arranged, sometimes >overlapped, with display problems when one scrolls the
page. Changing the font to Segoe UI Symbol resolves the >issue. This is not
seen in IE or MS Word.
9. wrong font information for mixed language documents
LO Writer failed to display font information correctly for mixed language
documents. Although a number of glyphs >from non-Latin writing systems have
been identified and correctly displayed, the font information is not set
correspondingly. One example is Ogham. To correctly display Ogham glyphs,
one need to use fonts like Code2000 >or Segoe UI Symbol. However, Times New
Roman, a Latin/Greek/Cyrillic/Arabic font, is displayed in the “font >name”
drop-down or the “character” dialog when you select Ogham texts.
These are the issues I am concerned with after using LO Writer for a dozen
hours. I am happy to see that the document text is more elegantly displayed
in LO than in MS Office, and that combing marks are well supported in this
system, but to get LO to work as a MS Office substitute means finding a
resolution to these issues, as I rely heavily on these rarely used Unicode
glyphs in my work, especially Chinese characters in SIP plane (plane 2).
Only several hours of experience with LibreOffice implies unfairness to
simply justify these problems as bugs or flaws, but the limited support for
Unicode LO exhibits on initial runs as compared to MS Office and web
browsers also underlines the urgent necessity to address these issues, at
least by means as such. Is there any possible answers to these problems,
e.g. extensions? Or at least workarounds? Are these issues already known to
the LibreOffice development team? Any work addressing them under progress?
And finally, if these issues can be adequately qualified as bugs or
limitations with the LibreOffice products, how can I report them to the
development team? Thanks very much.
To unsubscribe e-mail to: firstname.lastname@example.org
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be