Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


The recent removal of the extension prereg mechanism revealed a problem with how we select which dictionaries (which come in the form of bundled extensions) are included in a given installation.

At least with the "official" (<http://download.libreoffice.org>) Linux and Mac OS X installation sets, the base installation set contains en-US localization and only contains dictionaries "related" to that locale (dict-en, dict-es, dict-fr; see below for details of what "related" means). The additional per-language langpacks contain dictionaries "related" to the given langpack (e.g., langpack_de contains dict-de).

However, on Windows, the base installation set contains all available localizations and all available dictionaries. During msi installation, some code apparently determines a default selection of only a subset of the "Additional user interface languages" entries (presumably based on the current system locale settings), but all of the available "Optional Components - Dictionaries" entries are selected by default. This now causes per-user generation of data about all those bundled dictionary extensions at per-user first-start of LO, leading to noticeable time and space requirements (see <https://bugs.freedesktop.org/show_bug.cgi?id=53009> "Large UserInstallation's user/extensions/bundled/ tree").

Hence, one suggestion to address that problem would be to reduce the amount of "Optional Components - Dictionaries" entries selected by default during Windows msi installation, similar to how a certain combination of base installation set plus langpack(s) on the other platforms also only installs a subset of all the available dictionaries. (That is, the code that apparently now determines a default selection of "Additional user interface languages" entries would need to be extended to also determine a default selection of "related" "Optional Components - Dictionaries" entries.)

Initial reactions on IRC (see below) were that (a) the status quo on Windows was to avoid "political issues" (though that would be inconsistent with the status quo on the other platforms), and (b) to rethink having dictionaries as bundled extensions (though I would prefer to keep things simple, solving the problem by harmonizing behavior across platforms now and leaving anything more ambitious for the future).

Any further thoughts?

Stephan

PS1: The way dictionaries "related" to a given locale are determined appears to be the the list at setup_native/source/packinfo/spellchecker_selection.txt. That's why the en-US base installation set for Linux and Mac OS X contains dict-en, dict-es, and dict-fr, for example. However, an apparent inconsistency is that langpack_de only contains dict-de, and not also dict-fr and dict-it, as that list would suggest.

PS2: At least the Mac OS X LO 3.6.1 en-US base installation set contains share/extension/dict-* directories for all available dictionaries, not just dict-en, dict-es, dict-fr, but the additional ones are effectively empty and their existence is a bug.

PS3: For the record, the relevant log of yesterday's #libreofifice-dev:

Aug 29 12:50:57 <sberg> timar, do you know anything about our msi by default installing all "Optional Components - 
Dictionaries" entries, but only selected (at installation time, I presume?) "Additional user interface 
languages"?
Aug 29 12:51:59 <timar> sberg: yes, we always install all dictionaries on Windows in order to avoid 
"political issues"
Aug 29 12:52:26 <tml_> is this the old "omg, I waste SEVERAL MEGABYTES on dictionaries for languages I 
don't even like" discussion?
Aug 29 12:53:41 <sberg> timar, but that causes one part of the problems of fdo#53009, so I had 
hoped we could fix that
Aug 29 12:53:44 <IZBot> LibreOffice-Libreoffice normal/medium ASSIGNED Large UserInstallation's 
user/extensions/bundled/ tree https://bugs.freedesktop.org/show_bug.cgi?id=53009
Aug 29 12:54:41 <tml_> wouldn't the best solution then be to stop treating these as "extensions"?
Aug 29 12:55:12 <tml_> don't we have too much optionality in the installer anyway?
Aug 29 12:55:40 <tml_> hmm, those are orthogonal issues, sorry
Aug 29 12:58:36 <timar> sberg: what is your suggestion?
Aug 29 13:02:55 <sberg> timar, assuming that there is code in our msi to default-enable some subset X of "Additional user 
interface languages" entries: extend that code to also default-enable only a "matching" subset of "Optional 
Components - Dictionaries" entries
Aug 29 13:03:44 <tml_> that assumes people would prefer to use software (including the OS) in the 
same language as they write/edit documents it. not true
Aug 29 13:03:46 <sberg> ...for some suitable definition of "matching"
Aug 29 13:05:01 <timar> sberg: tml_ there is 
http://opengrok.libreoffice.org/xref/core/setup_native/source/packinfo/spellchecker_selection.txt that we 
still use for creating Linux langpacks IMHO (not sure)
Aug 29 13:05:11 <sberg> tml_, no, but it might be a better approximation to typical users' needs than the 
current "install everything" approach (after all, users /can/ install additional dics -- its only about 
the defaults)
Aug 29 13:06:45 <sberg> timar, yes, that list I had on my mind
Aug 29 13:06:56 <tml_> sberg: one person's good approximation is another person's grave insult to 
the XXX people ;)
Aug 29 13:07:26 <sberg> tml_, we already use that approximation on other platforms
Aug 29 13:07:45 <tml_> so that is broken, then? ;)
Aug 29 13:09:16 <sberg> tml_, do you have a better suggestion?
Aug 29 13:10:01 <tml_> sberg: is that there are lots of *extensions* that is causing problems, or 
lots of *dictionaries* ?
Aug 29 13:11:03 <tml_> or, wait, am I smoking crack with this talk about extensions?
Aug 29 13:11:25 <tml_> (I somehow had the impression that many dictionaires are technically packaged as 
"extensions", are they?)
Aug 29 13:11:51 <timar> tml_: dictionaries are extensions
Aug 29 13:12:15 <sberg> tml_, dictionaries come as bundled extensions, and every bundled extension 
increases the per-user space reqs and per-user--first-start time reqs (though some do more than others)
Aug 29 13:12:20 <tml_> ok, so then the question above to sberg still holds
Aug 29 13:12:52 <tml_> sberg: ok, so wouldn't the solution then be to stop packaging dictionaries 
as extensions? or do they *have* to be such for some obscure technical reason?
Aug 29 13:13:05 <tml_> I mean, they could still be optional in the installer even if they weren't 
extensions
Aug 29 13:13:29 <tml_> just like lots of other things are optional but aren't extensions
Aug 29 13:16:28 <sberg> tml_, I think the origin of having dicts as exts is so that (a) people can 
install additional ones (OOo traditionally did not come with such a large number of bundled dicts as LO 
does at least on Windows, IIUC), and (b) people can update dicts independently from updating the app 
itself (as the dicts were traditionally provided by 3rd parties, IIUC)
Aug 29 13:17:38 <tml_> but having the bundled ones not be extensions wouldn't stop (a), and (b) is 
made unnecessary by our time-based frequent releases
Aug 29 13:22:54 <sberg> tml_, I'm not arguing that having dicts as exts is necessarily good; what 
I'm not sure about is whether turning a given dict from ext to non-ext could cause technical problems, if 
a user installed an ext variant of that dict into a LO that contains that dict as non-ext
Aug 29 13:24:24 <tml_> that is something to check (and fix) then, if the bundled dictionaries would 
not be extensions any more
Aug 29 13:24:31 <sberg> maybe makes sense to put this on the ESC agenda
Aug 29 13:27:11 <caolan> some of the code for the old pre-extension mechanism for dictionaries 
still exists in lingucomponent/source/lingutil/lingutil.cxx now used for the system dictionary case
Aug 29 13:27:30 <caolan> its *supposed* to prefer extensions IIRC over system dicts
Aug 29 13:27:41 <caolan> *shrug*
Aug 29 13:28:43 <caolan> the removed pre-extension code had a dictionary.lst in some dir or other 
that listed the dicts and languages they were for
Aug 29 13:29:47 <caolan> but that was back in pre language tool days, not sure if that makes some 
of our bundled dicts no longer just simple hunspell/hyphen/mythes containers
Aug 29 13:30:10 <tml_> sberg: but anyway, I am not opposed to making the installer by default 
select only a (somewhat arbitrary) subset of dictionaries to install, if that fixes a problem for most 
people
Aug 29 13:30:37 <tml_> and even if I was opposed, that could be ignored;)
Aug 29 13:32:23 <caolan> throw the net wide enough, dict for langpack + top X languages always 
installed + langs also in use in territory + Y neighbouring langs :-)
Aug 29 13:36:46 <tml_> caolan: but isn't it so that exactly selecting "neighbouring langs" (but not langs 
from some country a few borders away) can cause immense irritation. "why would we proud Freedonians want to write in the 
language of those dogs of Elbonia. what we need is the language of our beloved friends from Bulvania"
Aug 29 13:37:36 <tml_> but whatever
Aug 29 13:40:20 <caolan> including Russian in a shortlist of dicts for the Latvian langpack is a 
potential contender for that problem
Aug 29 13:41:58 <tml_> which is why when including *all* one can always say "we don't make any 
judgements"
Aug 29 13:42:29 <caolan> Bosnian/Serbian/Croatian, *shudder*
Aug 29 13:45:12 <tml_> caolan: Serbian/Albanian/Russian was the real-world example I had in mind. even if 
Albanian seems to be a "recognized minority language" in Serbia, so at least officially they couldn't 
oppose it that heavily
Aug 29 13:46:33 <tml_> caolan: and what do I know, maybe I am too pessimistic, and only a very 
small minority of people would take stuff like this so seriously
Aug 29 13:46:43 <tml_> caolan: after all, it isn't *maps* ;)
Aug 29 13:47:34 <caolan> tml_: RH has a utility to search for possible maps in software packages :-)

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.