Re: Extra mv in filter-showIncludes.awk?

Ashod Nakashian <ashnakash -AT- gmail.com>
Thu, 22 Jan 2015 11:09:18 -0500

On Thu, Jan 22, 2015 at 10:02 AM, Michael Stahl <mstahl@redhat.com> wrote:

On 22.01.2015 15:31, Ashod Nakashian wrote:

It seems that filter-showIncludes.awk is writing the dependency file .d
in a .tmp file first, then it's renaming it to .d.
Considering that the input is piped from stdin and .d either doesn't
exist or will be overwritten (and is never an input,) the wisdom of
writing to a temp file and then moving is lost on me.


the point is to have an atomic update of the make target, i.e. if the
build is interrupted for whatever reason, prevent an incompletely
written .d file with up-to-date time stamp that will very likely break
the next build.

A complete build generates close to 9000 .d files in CxxObject alone.
Spawning processes and doing file I/O in the several thousands are bound
to have a toll.


sure but filter-showIncludes is only run when the C++ compiler is also
run on the file and that will take a lot longer than forking a "mv"
process.

I see.

My main issue is I/O contention. mv is a filesystem operation that is done
synchronously (ultimately the FS must take a lock that is system-wide).
On a powerful rig I/O is the biggest bottleneck to building LO. If you have
enough CPU cores to compile dozens of files in parallel, but not enough I/O
ops, you'll spin the CPU a good deal of the time.
On my 6c/12t rig I could build LO from scratch with dbgutil with
parallelism=16 in 68~70 minutes at best. This was on 256GB Samsung 830.
Replace the 830 with the new 850 and what do you get?
Would you have guessed 34-35 minutes? In terms of bandwidth, the 850 isn't
much faster than the 830. But I/O ops wise it's ~3x faster, especially with
high queue count.

Spawning 2 processes and renaming a file is quick compare with compiling
the source in question, but it hinders parallelism on high-core machines.

The temp file solution seems to be the simplest solution to the atomic
update.
How about writing an EOF marker in .d files? We'll then update concat-deps
(that's the consumer of .d, right?) to consider the file corrupt/invalid if
it doesn't include the marker.

I know this is marginally more complex than mv .tmp .d, but so is the whole
business with generating and processing the .d files in the first place,
which serves to accelerate the build.
My numbers above show that there are significant gains by reducing I/O load
during build (for high threaded builds).

Would the EOF marker solve the issue you raise? Would you agree it's worth
the effort (which I'm volunteering) to reduce I/O?

Context

Extra mv in filter-showIncludes.awk? · Ashod Nakashian
- Re: Extra mv in filter-showIncludes.awk? · Michael Stahl
  - Re: Extra mv in filter-showIncludes.awk? · Ashod Nakashian

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.