[libreoffice-users] Re: Struggling with Hebrew in LO

CVAlkan <foberle -AT- enteract.com>
Thu, 27 Oct 2016 07:55:31 -0700 (MST)

A very good question but, depending on what you mean specifically by "foundry
data" I'm afraid I don't know that it will. My best guess is that it will
**improve the chances that it will**, but there are many other factors
involved. My understanding of all this is murky at best given the unclear
and often conflicting information I could find on the web.

In Linux at least, there is a "thing" (utility? service?) called fc-match
that seems to actually decide which fonts are the closest match to the one
that doesn't meet the immediate needs of the calling app - whether fc-match
is called directly by an app or indirectly through Harfbuzz or other
rendering mechanism (it isn't clear to me if, or to what extent, Windows
uses this, although Gimp under Windows certainly does).

fc-match is part of Behdad Esfahbod's fontconfig package (see
https://en.wikipedia.org/wiki/Behdad_Esfahbod), and it determines the
matches (and ranking) according to a multitude of factors exposed/reported
by the fonts themselves. Since a number of perfectly lovely fonts are either
missing some things, or define them incorrectly or inconsistently, the
answer to your question might be: if you were to fix all of those things,
you probably wouldn't encounter the unwarranted and unexpected font
substitutions.

There are of course two significant "gotchas" in expecting good results from
fc-match: The first is while the font itself is actually suitable, it
doesn't report some of its capabilities correctly, causing an unecessary
quest for a substitute. The second is that, while looking for a good
substitute, the fonts being examined don't correctly or consistently report
their capabilities. The acronymn we used to use in the early days was GIGO
(Garbage In = Garbage Out)

I've long been annoyed that LibreOffice (among other apps, but this is an LO
list) doesn't report that such substitutions were made, but I've since
discovered the possibility that it might not even have been given that
information as feedback from the rendering engine in the first place. The
way I confirm these stealth substitutions by the way is to either generate a
.pdf or save the file as an .fodt; in either case the actual font being used
can be determined from those files.

So: Trying to answer the question you pose has been part of my goal for a
while, but I first wanted to come up with a list of fonts that would be
suitable for experimentation. There are a number of ways of "looking into" a
font to see what's in it, but they are all fairly tedious if you want to
compare some arbitrary set of fonts, and involve looking at one font at a
time.

I've recently been playing with a shell script I wrote; if I'd known how
deep the water was that I was stepping into, the choice of bash would likely
have been different, but that's water (pun intended) under the bridge. I
give the script multiple command line arguments for the particular scripts
I'm interested in combining in a single document and it gives me back a list
of all the fonts that are "potentials." Along with each font name, there is
a short list of some of the things it has to say about itself. The results
so far are rather fascinating, and tend to confirm your suspicion/hope/guess
that fixing the fonts may fix the problem.

In no particular order, here are some representative tidbits I've
discovered:

1) When looking at fonts containing both Greek and Armenian characters,
there are 31 of them installed on my machine, and all of those 31 (all from
the FreeFont and DejaVu families) include the appropriate language codes
('el' and 'hi') for this example. BUT: DejaVuSans-ExtraLight.ttf is missing
the 0x0559 character from the available character bit map. Not knowing
Armenian, I don't know what to make of that, but it's interesting.
FreeSerifItalic doesn't report the ISO 15924 script tag 'grek' and FreeMono
fails to report the code 'armn'; DejaVuSansMono-Oblique and FreeMonoOblique
don't report either 'el' or 'hi'. You can see that answering your question
would require some serious experimentation: are there valid reasons some
members of these families are slightly different from others, etc? There are
more examples like this.

2) I have found two versions of Garamond on my system using this script (I
don't believe I would have done that, so I suspect that different apps may
have added them not realizing the other was there). Appearance-wise, their
glyphs seem at least superficially identical, but what they report is quite
different. The base font for one reports it's in a 'Normal' style, while the
other says it's 'Regular.' One of these families is clearly superior in what
it is reporting as capabilities, so I'll soon be purging the other, but the
questions remain: how the heck would I ever have stumbled across this? and
what effect(s) might this have had on unexpected font substitution?

3) For coverage of "upper" Unicode planes (i.e. scripts that begin beyond
0xffff and containing such things as ligatures, box drawing characters,
complete musical symbols and so forth), none of the utilities I've used
seems to report anything. An examination (using FontForge) of some fonts
that provide these seem to be constructed correctly, leading me to believe
that the underlying utilities may never have been updated to handle extended
values, but that's just a guess.

4) Despite the fact that Thai and Laotian character sets, while different
and have different Unicode plane assignments, are similar enough that they
can be read (though possibly not understood) on either side of the border, I
have found no fonts whatever that contain both of these. Since my collection
of Thai fonts is rather extensive, I find this odd. If I were ever to mix
Thai and Laotian in the same document - which I haven't - my guess is that
substitution problems will pop up immediately.

5) The Droid family of fonts, by the way, does NOT contain any Thai
characters. Fair enough, as they provide supplemental fonts for other
Scripts. For Thai, I have DroidSerifThai-Regular, DroidSerifThai-Bold, and
DroidSansThai installed. You would think therefore, that when using
DroidSerif-Regular as the font, DroidSerifThai-Regular would be the perfect
substitute for text passages containing Thai. For reasons I've yet to track
down, however, it isn't even on the list of fonts considered as substitutes;
I suspect that since none of these Thai variants report support for the ISO
639-1 Language Code 'th' that's probably a good clue but, since I don't
actually use Droid, I haven't pursued that further. (To the OP's original
question, there are equivalent Droid Hebrew fonts as well.)

Finally, I have some hesitation in modifying any particular font, since as
far as I know they could be overwritten at any time by a helpful app or OS.
It seems preferable that if any font errors are found, that they be vetted,
confirmed and corrected by the original creating entity. Unfortunately, I'm
not sure how all the existing "faulty" versions could ever be rounded up and
destroyed.

But - that's another problem. If enough definitive examples are found,
perhaps there will be some recognition that there is still work to be done.
I know from past postings that you're familiar with this situation, so if
there is a way for me to pass my bash script along - assuming you have
access to a linux machine (or the Windows 10 bash shell experiment???), let
me know; I'd be happy to hear any comments or corrections you might have 
...

Time to stop here: I have a tendency to run on when goaded.

Regards - Frank



--
View this message in context: 
http://nabble.documentfoundation.org/Struggling-with-Hebrew-in-LO-tp4198211p4198423.html
Sent from the Users mailing list archive at Nabble.com.

-- 
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context

[libreoffice-users] Struggling with Hebrew in LO · Jonathan Allen
- Re: [libreoffice-users] Struggling with Hebrew in LO · Trever L. Adams
  - Re: [libreoffice-users] Struggling with Hebrew in LO · Trever L. Adams
- Re: [libreoffice-users] Struggling with Hebrew in LO · jonathon
  - [libreoffice-users] Re: Struggling with Hebrew in LO · CVAlkan
    - Re: [libreoffice-users] Re: Struggling with Hebrew in LO · jonathon
      - [libreoffice-users] Re: Struggling with Hebrew in LO · CVAlkan
        
        Re: [libreoffice-users] Re: Struggling with Hebrew in LO · jonathon
        
        [libreoffice-users] Re: Struggling with Hebrew in LO · CVAlkan
        
        Re: [libreoffice-users] Re: Struggling with Hebrew in LO · jonathon
        [libreoffice-users] Re: Struggling with Hebrew in LO · CVAlkan
        [libreoffice-users] Re: Struggling with Hebrew in LO · CVAlkan
        
        Re: [libreoffice-users] Re: Struggling with Hebrew in LO · toki
        [libreoffice-users] Re: Struggling with Hebrew in LO · CVAlkan
        Re: [libreoffice-users] Re: Struggling with Hebrew in LO · jonathon
- Re: [libreoffice-users] Struggling with Hebrew in LO · Dotan Cohen
  - [libreoffice-users] Re: Struggling with Hebrew in LO · CVAlkan

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.