Re: parallelizing crashtest runs (was: minutes of ESC call ...)

Markus Mohrhard <markus.mohrhard -AT- googlemail.com>
Fri, 31 Oct 2014 14:38:15 +0100

Hey,

On Fri, Oct 31, 2014 at 2:23 PM, Christian Lohmaier
<lohmaier@googlemail.com> wrote:

Hi *,

On Thu, Oct 30, 2014 at 5:39 PM, Michael Meeks
<michael.meeks@collabora.com> wrote:


* Crashtest futures / automated test scripts (Markus)
    + call on Tuesday; new testing hardware.
    + result - get a Manitu server & leave room in the budget for
      ondemand Amazon instances (with spot pricing) if there is
      special need at some point.
[...]


When I played with the crashtest setup I noticed some limitations in
the current layout of the crashtest-setup that prevents just using
lots of cores/high parallelism to get faster results.

The problem is that it is parallelized per directory, but the amount
of files in a directory is not evenly distributed at all. So when the
script decides to start odt tests last, the whole set of odt files
will only be tested in one thread, leaving the other CPU-cores idling
around with nothing to do.

I did add a sorting statement to the script, so it will start with the
directories with most files[1], but even with that you run into the
problem that towards the end of the testrun not all cores will be
used. As the AMD Opterons in the Manitu ones are less capable per-cpu
this will set a limit to how much you can accelerate the run by just
assigning more cores to it.

Didn't look into the overall setup to know whether just segmenting the
large directories into smaller ones is easy to do or not (i.e instead
of having one odt dir with 10500+ files, have 20 with ~ 500 each.

ciao
Christian

[1] added the sorted statement that uses the number of files in the
directory as the key to sort by:

def get_numfiles(directory):
    return len([f for f in os.listdir(directory)])

def get_directories():
    d='.'
    directories = [o for o in os.listdir(d) if os.path.isdir(os.path.join(d,o))]
    return sorted(directories, key=get_numfiles, reverse=True)



This is currently a known limitation but there are two solutions to the problem:

The quick and ugly one is to partition the directories into 100 file
directories. I have a script for that as I have done exactly that for
the memcheck run on the 70 core Largo server. It is a quick and ugly
implementation.
The clean and much better solution is to move away from directory
based invocation and partion by files on the fly. I have a
proof-of-concept somewhere on my machine and will push a working
version during the next days. This would even give us about half a day
on our current setup as ods and odt are normally the last two running
for about half a day longer than the rest of the script.

With both solutions this scales perfectly. We have already tested it
on the Largo server where I was able to keep a load of 70 for exactly
a week (with memcheck but that does only affect the overall runtime).

Regards,
Markus

Context

parallelizing crashtest runs (was: minutes of ESC call ...) · Christian Lohmaier
- Re: parallelizing crashtest runs (was: minutes of ESC call ...) · Markus Mohrhard
  - Re: parallelizing crashtest runs (was: minutes of ESC call ...) · Christian Lohmaier
    - Re: parallelizing crashtest runs (was: minutes of ESC call ...) · Markus Mohrhard
      - Re: parallelizing crashtest runs (was: minutes of ESC call ...) · Wols Lists
- Re: parallelizing crashtest runs (was: minutes of ESC call ...) · Michael Meeks

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.