As a newcomer I am trying out LibreOffice 4.1.4.2 and is quite satisfied
with most of its features, especially the experimental sidebar panel.
However, several issues occurred when I was playing around with Unicode
characters in LO.
Environment: MS Win7 64bit Ultimate English with French and Chinese
language pack, non-Unicode code page: Chinese (CP936), LibreOffice
4.1.4.2, JRE 1.7.0_09 (both 32-bit and 64-bit).
THE "SPECIAL CHARACTERS" DIALOG
========================
1. no unicode code point entry field in the dialog
Let's first talk about issues regarding the Special Characters dialog.
Unlike MS Word, it does not feature a Unicode >code point entry field.
This causes some inconvenience if you know the code point of a certain
glyph and need to >insert it regularly. The “Compose Character”
extension only solves part of the puzzle, which will be explained later
>in issues with non-zero plane glyphs.
2. wrong detection of unicode ranges
A more complicated problem with the Special Character dialog is the
mis-identification of supported code points in >a font—LO doesn't seem
to handle this thing correctly at all. Instead, it displays blocks or
squared question marks >or blanks for the unsupported glyphs in a
Unicode range partially supported by the font, and glyphs from fallback
>fonts assigned by the OS for a Unicode range the font totally does not
support—with almost no sign of suppressing >these unsupported glyphs or
ranges from display. Only very few fonts are correctly identified
(Cardo, Code2000 >and Quivira, for example).
To illustrate this, install FreeSerif (version 0412.2263), and bring up
the Special Characters dialog in Writer. >Switch to this font, and go to
the “Tibetan” subset. FreeSerif does not support Tibetan glyphs as of
now, but LO still >assumes it supports, and therefore displays strings
of boxed question marks in this range.
GLYPHS IN NON-ZERO UNICODE PLANES
====================
3. limited support for fonts with non-zero plane glyphs?
More serious issues are found in LO’s support for non-zero plane Unicode
glyphs. For many fonts, LO doesn't seem >to support non-zero plane
Unicode glyphs even though the font itself does. Pasting the glyph from
external >applications doesn't help either, regardless of the text
format: unformatted, RTF or HTML. The glyph is shown as >square,
question mark or blank space.
For example, you try to input in LO Unicode character U+1F374 (fork and
knife) which is supported by the font >Segoe UI Symbol (version 5.01).
It is not shown in the Special Characters dialog—the highest code point
available >(for this font) is limited within the BMP plane (plane 0).
You open MS Word 2010, and find that in >Insert→Symbols, this glyph is
correctly displayed and inserted when you switch to this font. You
select this glyph, >copy it, and paste it into LO. The glyph becomes
square. Paste it into Windows Notepad (ensuring that Segoe UI >Symbol is
the font used), the glyph is again displayed correctly. You copy the
glyph again from Notepad, still it is >shown as square in LO.
In fact, among all the fonts with non-zero plane glyphs installed on my
PC (namely: freeserif, freesans, freemono, >code2001, code2002, WenQuan
Yi Zen Hei, Cardo, PMingLiU-ExtB, Quivira, Segoe UI Symbol, Sun-ExtB,
>SunManPUA, SimSun-ExtB, and SimSun(Founder Extended)), only code2001,
Cardo and Quivira are correctly >identified for the non-zero plane
glyphs the three fonts each support. The other fonts either do not
expose their >non-zero plane glyphs in LO or do so only sometimes.
4. no built-in mechanism to produce glyphs from typed code points
Nor can you work around this issue by manually specifying code points,
as is usually done in MS Word (to generate >the glyph previously
discussed, key in 1F374, and press [Alt]+[X]). There is no built-in
mechanism in LO to >generate a glyph from typed code points. One may
argue the “compose character” extension will work, but this >extension
adopts a fairly obsolete Unicode standard and does not recognize any
code point higher than U+FFFF, >i.e. no support for non-zero plane
glyphs. Some professionals would suggest configuring the Windows
registry to >allow hexadecimal Unicode Alt input using the Numpad, but
this method also does not support non-zero planes. >Therefore, basically
you could only manipulate non-zero plane glyphs in LO with a fairly
limited set of fonts which >may not share the same typeface with your
document.
5. unstable detection of unicode ranges: weird behaviors in the Special
Characters dialog
This compatibility issue is further complicated by some strange behavior
with the Special Character dialog in >Writer. Sometimes, when you switch
to a font which supports non-zero plane glyphs after browsing or
inserting >glyphs using another un-supporting font in Special Characters
dialog, the non-zero plane Unicode ranges >supported by this new font
disappears in the “subset” drop-down, and remains invisible until you
restart Writer. >Conversely, when you switch to a font which does not
support non-zero plane glyphs after browsing or inserting >glyphs using
another supporting font, LO still tries to display the last used glyph
in the new font, and occasionally >even the last used Unicode range.
What you see is, of courses, boxes, question marks or blanks.
Interestingly enough, this issue does not occur every time I use Writer.
Two hours ago I was encountering this >issue all the way, but right now,
with no settings configured, new programs launched or updates installed
during >the interval, Writer appears to be comparatively more stable in
terms of non-zero plane support, although still not >all Unicode ranges
are correctly detected for all fonts.
MESSED UP WITH MIXED-SCRIPT DOCUMENTS
======================
Some of the most dramatic blunders I have came across are found in
documents with multiple languages. >Generally, support for documents
with multiple writing systems and/or complex scripts exhibit defects in
four >aspects: broken bi-di display, broken glyph display, improperly
spaced glyphs, and wrongly applied fonts and >glyphs.
A typical example which illustrates this issue to its full extent is the
UTF-8 test page from Columbia University. >This test page features
glyphs from a number of writing systems of different families, and can
be used as a stress >test for correct rendering of Unicode glyphs in
different applications. Internet Explorer displayed the page >perfectly
without any glitches, and Microsoft Word, as well as Firefox, also
scored high. LibreOffice, however, >failed to present a number of lines
correctly. The renditions by IE, MS Word and LO are printed using
PDFCreator >into three PDF files respectively. You can compare them for
future research. I have highlighted lines that are >evidently
ill-displayed in these files.
6. broken bi-di display in mixed-script document
LO Writer failed to display lines with LRT texts mixed with RTL texts,
including Pashto, Persian/Farsi, Hebrew, >Yiddish, Arabic, Hebrew,
Yiddish, and Urdu. These lines appear completely in RTL direction and
may sometimes >display glyphs abnormally overlapped. However, pasting
these lines one by one unformatted into a new document >resolves the
problem. This is not seen in IE or MS Word.
7. Broken glyph display in mixed-script document
LO Writer failed to display glyphs in several languages, including
Gothic, Bengali, Telugu, Sinhalese, Burmese, >Vietnamese (nôm), Khmer,
Lao and Tibetan. Not all glyphs are shown for Vietnamese (nôm), and the
diacritics did >not combine in Lao. It appears LO Writer was not able to
automatically assign fonts to these writing systems >properly. One has
to do so manually, which still leaves Gothic and Vietnamese (nôm)
ill-formed in LO, probably >because the two systems use glyphs from
non-zero planes. This is not seen in IE, or MS Word (after manually
>setting font for Gothic).
8. problematic character spacing
LO Writer failed to space glyphs properly for certain fonts in certain
writing systems. One example is the Runes >glyphs used for ancient
Scandinavian texts displayed using Code2001. The glyphs are densely
arranged, sometimes >overlapped, with display problems when one scrolls
the page. Changing the font to Segoe UI Symbol resolves the >issue. This
is not seen in IE or MS Word.
9. wrong font information for mixed language documents
LO Writer failed to display font information correctly for mixed
language documents. Although a number of glyphs >from non-Latin writing
systems have been identified and correctly displayed, the font
information is not set >correspondingly. One example is Ogham. To
correctly display Ogham glyphs, one need to use fonts like Code2000 >or
Segoe UI Symbol. However, Times New Roman, a Latin/Greek/Cyrillic/Arabic
font, is displayed in the “font >name” drop-down or the “character”
dialog when you select Ogham texts.
These are the issues I am concerned with after using LO Writer for a dozen
hours. I am happy to see that the document text is more elegantly
displayed in LO than in MS Office, and that combing marks are well
supported in this system, but to get LO to work as a MS Office substitute
means finding a resolution to these issues, as I rely heavily on these
rarely used Unicode glyphs in my work, especially Chinese characters in
SIP plane (plane 2).
Only several hours of experience with LibreOffice implies unfairness to
simply justify these problems as bugs or flaws, but the limited support
for Unicode LO exhibits on initial runs as compared to MS Office and web
browsers also underlines the urgent necessity to address these issues, at
least by means as such. Is there any possible answers to these problems,
e.g. extensions? Or at least workarounds? Are these issues already known
to the LibreOffice development team? Any work addressing them under
progress? And finally, if these issues can be adequately qualified as bugs
or limitations with the LibreOffice products, how can I report them to the
development team? Thanks very much.
Sincerely,
Neil Ren
--
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Context
- [libreoffice-users] Limited Unicode Support in LibreOffice 4.1.4.2? Character insertion, non-zero (SMP, SIP) planes, and multi language documents. · NeilBR
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.