Date: prev next · Thread: first prev next last
2011 Archives by date, by thread · List index


Hi Michael,

On 28 January 2011 04:04, Michael Meeks <michael.meeks@novell.com> wrote:


       Licensing wise - I'd like to add the standard LGPLv3+/MPL header to it
(see bootstrap/) but having MIT too is fine if you want.

This patch adds the (c) header from the template to the idxdict.cpp
although i had to tweak it to 2011.

I have no idea how this would be integrated into the build process as I'm
not even sure where it is called from, but happy if someone wants to
take up the challenge and/or incorporate it as an installer process.

       So - the installer process is more exciting on Windows I think - we'll
need to see how the setup_native/ tools are called and be inspired by
that I think.

I think in order to do any work on the windows installer I would have
to work out how to get a windows compile environment setup.
I currently only have it setup on my Ubunto machine.

The same set of files using th_gen_idx.pl took around 5 seconds (although
some basic fixups got it done to 3.5 seconds).

       Great - its trivial; indeed - it rather makes you wonder whether we
need the indexes at all ? [ I wonder what they are good for, and/or what
code loads and uses them ;-]. We may discover that in fact there is no
need for them to be indexed - any chance of a dig around ?

I imagine my timings are a bit skewed by the machine I tested on, and
the number of times I ran it.  I'm sure all the dictionaries were well
and truly in buffer cache so there was no I/O for the test.

On slower machines (are you targetting these) or slower disks there is
a chance the index files may offer a performance improvement.

Here is the same test after I dropped all my buffer cache:
real    0m2.300s
user    0m0.700s
sys     0m0.150s


These range from having the entry count incorrect, causing the index
process to miss a word (lots of these in some dictionaries), to having
words apparently duplicated either as the next entry, or sometimes a long
way apart.

       That is bad; we should mail the l10n list to ask them to have a look I
suppose.

I wasn't aware there was such a list and I can't find one on
freedesktop.org - is it a libreoffice related l10n list, or are these
dictionaries sourced from another project?

I have not attempted to fix these dictionary issues, but if they are
serious it might be worth having a perl script that is able to validate
the dictionaries are internally consistent.  Unfortunately, it would have
to use heuristics as the file format makes it difficult to tell in general
what kind of line is being processed.

       Right; we should validate them as we compile the index perhaps - or at
least, look at the parser and see how it has traditionally interpreted
them.

If a utility were written that can validate the files, would it be
possible to make it reject on commit if it detected errors?

Having multiple entries for a word when loaded into libreoffice?

       The native code thing is great; it'd be wonderful if you had some time
to look at hooking it into the build process in dictionaries/ (?)

Yep... I will have to try to figure out how the build works though.
Back to the wiki, at least I've realised how to make git work across
the multiple checkouts now.
--
Regards,
Steven Butler

Attachment: copyright.patch
Description: Binary data


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.