On 07/12/2016 09:32 AM, Norbert Thiebaud wrote:
so I ran some number... after upgrading ccache
clang+pluging+dbgutil (time result in minutes.. elapsed/user/system)
cold: 33/840/50
hot: 9/79/14
no-op: 4/46/1
clang+dbgutil (no plugins)
cold: 26/605/46
hot: 9/79/14
no-op: 4/46/1
gcc-dbgutil
cold: 28/621/97
hot: 9/79/14
no-op: 4/45/1
note: none of these comprise make check
so the cost of the plugins on a full build is 7 minutes elapsed, ~240
minutes cpu.
ccache works fine... on the other hand any change in any of the
plugins invalidate the cache...
Looking at the ccache documentation at
<https://ccache.samba.org/manual.html#_configuration_settings> and the
source code at <https://github.com/ccache/ccache/blob/master/ccache.c>:
ccache detects that clang is called in a way so that it uses our
compilerplugins/obj/plugin.so, and then includes information about both
the clang executable itself and the plugin.so in its hash (which
determines whether a cached object has been built with the same
toolchain as a newly requested object).
ccache knows different ways of what kind of information about a
toolchain entity (the clang executable itself and the plugin.so) to
include in the hash: The default is compiler_check=mtime, which
includes the entity's mtime and size.
Now imagine the Gerrit/Jenkins bot does three builds A, B, C in
sequence, where A and C are based on the same revision of
compilerplugins/, while B is based on a different revision. That means
that compilerplugins/obj/plugin.so will be built anew for each of A, B,
C, and will have different mtime for A and C. That in turn means that
/any/ objects cached during build A will not be taken into account
during build C, even if they would still be in the cache after build B.
Another ccache configuration option is compiler_check=content, which
uses a hash of the entity's content instead of its mtime/size. If the
bot's underlying toolchain produces sufficiently reproducible builds, so
that compilerplugins/obj/plugin.so from builds A and C have identical
content, then build C should be able to reuse the objects from build A
that are still in the cache (given a large enough cache).
compiler_check=content computes the hash of both the clang executable
itself and the plugin.so for each ccache request. Should that turn out
to slow things down too much compared to compiler_check=mtime, a third
option would be comiler_check=string:X, which simply includes the
information "X" for each entity (without looking at the entity's real
characteristics at all). For each build done by the bot, that X could
e.g. be determined as the SHA1 of the latest git commit that modified
compilerplugins/ (and passed from the build to ccache via the
CCACHE_COMPILERCHECK environment variable corresponding to the
compiler_check configuration setting). (That would use the same "X"
when determining the characteristics of both the clang executable itself
and the plugin.so, which would be fine assuming that the clang
executable itself never changes anyway, at least not in a way that
necessarily requires rebuilds. Worst, the ccache would need to be
cleaned by the bot's admin when installing a new version of Clang.)
This option would also work if the compiler_check=content option should
not work because the bot's builds of compilerplugins/obj/plugin.so turn
out to not produce exactly the same content.
I've enabled an additional build for gerrit doing clang + plugins on linux
we will see how that perform in average.
preliminary observation is that there is way to much churn in the
plugins for this to be viable at this time....
There are generally two ways to address bot performance issues caused by
changes to comilerplugins/: One, make the builds faster. Two, make
changes to compilerplugins/ less frequent.
For one, one option should be to make ccache more effective by using
compiler_check=content or compiler_check=string:X as described above
(and potentially also increasing the ccache size if necessary and
possible). Another option might be to just throw more computing power
at the problem.
For two, two options have been discussed so far: Either restrict
commits to compilerplugins/ to certain points in time (when they come in
in batches). Or break compilerplugins/ out into its own git repo.
The main problem I see with the first choice is that new plugins, or
substantial changes to existing ones, typically also require changes to
very many files across the "real" LO code base. Doing such changes on a
branch (so that it can be merged later, at the next point when commits
to compilerplugins/ are allowed), would thus likely result in
large-scale merge conflicts. The solution might be to commit the
resulting changes to "real" LO code base directly, and only hold off the
compilerplugins/ changes themselves on a branch.
The idea behind the second choice is to periodically update and rebuild
the bot's compilerplugins repo, so that sequences of LO builds on the
bot are guaranteed to all use the same plugin.so and be able to reuse
cached objects across those builds. However, that would mean that
Gerrit changes based on a relatively old code base could be built
against a newer version of the compilerpluigns repo, running into
warnings/errors from new plugins, the fixes for which across the LO code
base not yet being included in the code base revision the change is
based on. So I think this approach isn't feasible. (Another problem
would be that e.g. the name of a class from the LO code base can be
whitelisted in one of the plugins, to suppress warnings/errors from that
class. If the name of the class changes in the LO code base, the
compilerplugins repo must be changed in sync.)
So my proposal would be as follows: First, check whether enabling
compiler_check=content or the compiler_check=string:X setup (and
increasing the ccache size if necessary and possible) gives good-enough
performance. If not, restrict commits to compilerplugins/ to be less
frequent, and see whether that increases the ccache hit rate and results
in good-enough performance.
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.