Date: prev next · Thread: first prev next last
2020 Archives by date, by thread · List index


Following up on the results of the email thread starting at <https://lists.freedesktop.org/archives/libreoffice/2019-December/084084.html> "How are Jenkins builds killed exactly?", <https://git.libreoffice.org/lode/+/bded43937c6efc82efc5924820a281c8a1ead5ba%5E%21> "kill-wrapper: pstree of hung processes" had tried to improve the information provided for a hung and aborted Jenkins build. Typically, such a build is aborted because one or more tests hang, and it would be interesting to at least learn which tests hung. To that end, that commit tried to print pstree output of any leftover processes---but failed, see the comment at <https://gerrit.libreoffice.org/c/lode/+/91496/2#message-8e52d669f48a9edb5f183d1221164784059e8959> "kill-wrapper: pstree of hung processes" for details.

Now, <https://git.libreoffice.org/lode/+/92c9372417f883781471bade5e703518bd1cd5c6%5E%21> "Incorporate timeout-on-idle into kill-wrapper, renaming to timeout-kill-wrapper" and its follow-up <https://git.libreoffice.org/lode/+/4d6d63299fea804ed7cdf63dde46922ed81b4e8a%5E%21> "Simplify transition from old kill-wrapper to new timeout kill-wrapper" fix that, by moving the timeout handling from Jenkins into lode's bin/kill-wrapper. (Which accepts an optional second argument now, specifying a stdout/-err inactivity timeout in seconds, after which the pstree output is generated and the process tree gets killed. Leaving the argument out or specifying it as zero disables that timeout logic.)

For now, I have updated <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/> to use the new kill-wrapper timeout feature instead of Jenkins' "Abort the build if it's stuck" option. (And am planning to roll it out to other Linux Jenkins jobs that could benefit from it, once it has proven sufficiently stable.)

<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/60539/> is a live example of such an aborted Gerrit Jenkins job. One noticeable difference is that such a job is now marked as failed (red dot) rather than as aborted (gray dot). But a new "kill-wrapper" (i.e., <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/failure-cause-management/48ce9c26-9d0a-43a8-83d8-c44f54920d59/>) failure cause label should make the actual reason of the failure obvious. And the pstree output (<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/60539/consoleFull#147661240548ce9c26-9d0a-43a8-83d8-c44f54920d59>), while probably a bit overwhelming, should show that apparently all of UITest_calc_tests, UITest_calc_tests4, UITest_calc_tests7, UITest_chart, and UITest_demo_ui hung in this case. That should give at least a hint where to start local debugging...


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.