Date: prev next · Thread: first prev next last
2019 Archives by date, by thread · List index


Still trying to track down why sometimes zombie processes survive on the (Linux) Jenkins build machines (and then make later, unrelated Jenkins builds on those machines fail when zombie soffice.bin processes still hold onto named pipes that tests from the new builds want to create too).

One such recent case on tb79 was the aborted <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/49895/>. It left behind a zombie python.bin -> oosplash -> soffice.bin process tree executing UITest_calc_tests3. (Where presumably the soffice.bin process had deadlocked, which then caused the Jenkins

Build timed out (after 15 minutes). Marking the build as aborted.
Build was aborted
Finished: ABORTED

reaction. But once I noticed, the images of the involved processes had already been overwritten by later builds, so I couldn't use gdb to get backtraces.)

<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/49895/consoleFull> shows that some entity runs lode's tb_slave_wrapper as (the main) part of the build, see

[linux_clang_dbgutil_64] $ /bin/sh -xe /tmp/jenkins3389683698813990355.sh
+ /home/tdf/lode/bin/tb_slave_wrapper --real --mode=config --clean

That tb_slave_wrapper script contains

trap cleanup 1 2 3 6 15

cleanup()
{
  echo "Caught Signal ... killing everything...."
  # kill everything in same process group (pseudo-pid 0)
  kill -9 0
}

intended to kill all processes if the script itself receives any of SIGHUP/-INT/-QUIT/-ABRT/-TERM.

But how does the tb_slave_wrapper script get terminated by whatever entity that starts it and prints out the

Build timed out (after 15 minutes). Marking the build as aborted.
Build was aborted
Finished: ABORTED

mentioned above? Could it be that the script itself gets killed with SIGKILL, so its cleanup() trap doesn't fire, and processes (indirectly) spawned from the script may stay alive?

Interestingly, the output from the above

  echo "Caught Signal ... killing everything...."

doesn't show up anywhere in <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/49895/consoleFull> (supporting the theory that cleanup() doesn't run), while other output that apparently stems from similar echo/printf commands in that script does show up there, see

OS:
pwd:/home/tdf/lode/jenkins/workspace/lo_gerrit/Config/linux_clang_dbgutil_64
config mode : linux_clang_dbgutil_64
Taking configuration values from ./distro-configs/Jenkins/linux_clang_dbgutil_64


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.