On 28/05/2020 22:19, Stephan Bergmann wrote:
Now, <https://git.libreoffice.org/lode/+/92c9372417f883781471bade5e703518bd1cd5c6%5E%21> "Incorporate timeout-on-idle into kill-wrapper, renaming to timeout-kill-wrapper" and its follow-up <https://git.libreoffice.org/lode/+/4d6d63299fea804ed7cdf63dde46922ed81b4e8a%5E%21> "Simplify transition from old kill-wrapper to new timeout kill-wrapper" fix that, by moving the timeout handling from Jenkins into lode's bin/kill-wrapper. (Which accepts an optional second argument now, specifying a stdout/-err inactivity timeout in seconds, after which the pstree output is generated and the process tree gets killed. Leaving the argument out or specifying it as zero disables that timeout logic.)
With <https://git.libreoffice.org/lode/+/755d10a73be251390d6512c65e93d249c60b0ba1%5E%21> "Print backtraces of leftover processes from kill-wrapper" (and its follow-up <https://git.libreoffice.org/lode/+/1a7b6d021a4ab7118c588fa85f0f9e6c0ee28c85%5E%21> "Don't generate full backtraces, for performance reasons") we now get backtraces of hung processes, for those Jenkins jobs that use kill-wrapper.
And---instant success---as soon as I had updated lode on the TDF tb7X Linux machines, <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62022/consoleFull#133533548648ce9c26-9d0a-43a8-83d8-c44f54920d59> caught one of those dreaded sporadic deadlocks of UITest_writer_dialogs and printed the long-sought information. (Fix at <https://gerrit.libreoffice.org/c/core/+/96712> "Don't call out to UNO with SolarMutex locked".)