Date: prev next · Thread: first prev next last
2016 Archives by date, by thread · List index


On 07/12/2016 09:32 AM, Norbert Thiebaud wrote:
so I ran some number... after upgrading ccache

clang+pluging+dbgutil  (time result in minutes.. elapsed/user/system)

cold: 33/840/50
hot: 9/79/14
no-op: 4/46/1

clang+dbgutil (no plugins)

cold: 26/605/46
hot: 9/79/14
no-op: 4/46/1

gcc-dbgutil

cold: 28/621/97
hot: 9/79/14
no-op: 4/45/1

note: none of these comprise make check

so the cost of the plugins on a full build is 7 minutes elapsed, ~240
minutes cpu.

ccache works fine... on the other hand any change in any of the
plugins invalidate the cache...

Looking at the ccache documentation at <https://ccache.samba.org/manual.html#_configuration_settings> and the source code at <https://github.com/ccache/ccache/blob/master/ccache.c>:

ccache detects that clang is called in a way so that it uses our compilerplugins/obj/plugin.so, and then includes information about both the clang executable itself and the plugin.so in its hash (which determines whether a cached object has been built with the same toolchain as a newly requested object).

ccache knows different ways of what kind of information about a toolchain entity (the clang executable itself and the plugin.so) to include in the hash: The default is compiler_check=mtime, which includes the entity's mtime and size.

Now imagine the Gerrit/Jenkins bot does three builds A, B, C in sequence, where A and C are based on the same revision of compilerplugins/, while B is based on a different revision. That means that compilerplugins/obj/plugin.so will be built anew for each of A, B, C, and will have different mtime for A and C. That in turn means that /any/ objects cached during build A will not be taken into account during build C, even if they would still be in the cache after build B.

Another ccache configuration option is compiler_check=content, which uses a hash of the entity's content instead of its mtime/size. If the bot's underlying toolchain produces sufficiently reproducible builds, so that compilerplugins/obj/plugin.so from builds A and C have identical content, then build C should be able to reuse the objects from build A that are still in the cache (given a large enough cache).

compiler_check=content computes the hash of both the clang executable itself and the plugin.so for each ccache request. Should that turn out to slow things down too much compared to compiler_check=mtime, a third option would be comiler_check=string:X, which simply includes the information "X" for each entity (without looking at the entity's real characteristics at all). For each build done by the bot, that X could e.g. be determined as the SHA1 of the latest git commit that modified compilerplugins/ (and passed from the build to ccache via the CCACHE_COMPILERCHECK environment variable corresponding to the compiler_check configuration setting). (That would use the same "X" when determining the characteristics of both the clang executable itself and the plugin.so, which would be fine assuming that the clang executable itself never changes anyway, at least not in a way that necessarily requires rebuilds. Worst, the ccache would need to be cleaned by the bot's admin when installing a new version of Clang.) This option would also work if the compiler_check=content option should not work because the bot's builds of compilerplugins/obj/plugin.so turn out to not produce exactly the same content.

I've enabled an additional build for gerrit doing clang + plugins on linux
we will see how that perform in average.
preliminary observation is that there is way to much churn in the
plugins for this to be viable at this time....

There are generally two ways to address bot performance issues caused by changes to comilerplugins/: One, make the builds faster. Two, make changes to compilerplugins/ less frequent.

For one, one option should be to make ccache more effective by using compiler_check=content or compiler_check=string:X as described above (and potentially also increasing the ccache size if necessary and possible). Another option might be to just throw more computing power at the problem.

For two, two options have been discussed so far: Either restrict commits to compilerplugins/ to certain points in time (when they come in in batches). Or break compilerplugins/ out into its own git repo.

The main problem I see with the first choice is that new plugins, or substantial changes to existing ones, typically also require changes to very many files across the "real" LO code base. Doing such changes on a branch (so that it can be merged later, at the next point when commits to compilerplugins/ are allowed), would thus likely result in large-scale merge conflicts. The solution might be to commit the resulting changes to "real" LO code base directly, and only hold off the compilerplugins/ changes themselves on a branch.

The idea behind the second choice is to periodically update and rebuild the bot's compilerplugins repo, so that sequences of LO builds on the bot are guaranteed to all use the same plugin.so and be able to reuse cached objects across those builds. However, that would mean that Gerrit changes based on a relatively old code base could be built against a newer version of the compilerpluigns repo, running into warnings/errors from new plugins, the fixes for which across the LO code base not yet being included in the code base revision the change is based on. So I think this approach isn't feasible. (Another problem would be that e.g. the name of a class from the LO code base can be whitelisted in one of the plugins, to suppress warnings/errors from that class. If the name of the class changes in the LO code base, the compilerplugins repo must be changed in sync.)


So my proposal would be as follows: First, check whether enabling compiler_check=content or the compiler_check=string:X setup (and increasing the ccache size if necessary and possible) gives good-enough performance. If not, restrict commits to compilerplugins/ to be less frequent, and see whether that increases the ccache hit rate and results in good-enough performance.

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.