Date: prev next · Thread: first prev next last
2014 Archives by date, by thread · List index


Hey,

On Fri, Oct 31, 2014 at 2:45 PM, Christian Lohmaier
<lohmaier@googlemail.com> wrote:
Hi Markus, *,

On Fri, Oct 31, 2014 at 2:38 PM, Markus Mohrhard
<markus.mohrhard@googlemail.com> wrote:

The quick and ugly one is to partition the directories into 100 file
directories. I have a script for that as I have done exactly that for
the memcheck run on the 70 core Largo server. It is a quick and ugly
implementation.
The clean and much better solution is to move away from directory
based invocation and partion by files on the fly.

Yeah, I also thought of keeping the per-directory/filetype processing,
but instead run multiple dirs at once, rather divide the set of files
of a given dir into the <number of workers> chunks.

I have a
proof-of-concept somewhere on my machine and will push a working
version during the next days.

nice :-)



So a working version is currently running on the VM. The version in
the repo will be updated as soon as the script finishes without a
problem. It parallelizes now nearly perfectly as it divides the work
in 100 file chunks and works on them. This means that after the last
update of the test files we have 641 jobs that will be put into a
queue and we process as many jobs in parallel as we want (5 at the VM
at the moment).

Additionally the updated version of the script no longer hard codes a
mapping from the file extension to the component and instead queries
LibreOffice to see which component opened the file. That allows to
remove quite a few mappings and will result in all file types to be
imported. The old version only imported file types that were
registered.

The new script should scale nearly perfectly. There are still a few
enhancements on my list so if anyone is interested in python tasks
please talk to me.

Regards,
Markus

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.