At FOSDEM'11 there was some mumbling about LO's regression tests taking two days to run due to some timeout kludgery necessitated by occasional hangs at exit due to possible threading bugs. Or something like that. I can't remember exactly. Recently I've been improving Valgrind's Helgrind tool a bit, and I thought I'd try it on a simple startup/exit of LO, to see what happened. It reports a whole bunch of lock order violations (potential deadlocks) during both startup and shutdown, ending up with a thread unlocking a not-locked lock, which doesn't sound good. One thing I expected to see a lot of was false reports of races due to release methods in thread-safe reference counted classes. Helgrind doesn't understand the implications of a 1 -> 0 refcount transition in a release method -- that the calling thread is now the only owner, and so can run the destructor without locking -- and requires that such methods have a couple of lines of annotation explaining this. However, I didn't see any races resulting from lack of such annotations, which surprised me. Surely some part of LO uses threadsafe refcounted classes? A bzip2'd text file containing the actual reports is attached. It also contains details of how to reproduce them. I don't have time to chase these myself. But I am happy to provide guidance in the most effective use of Helgrind, if anyone else is interested to chase them. J
Attachment:
helgrind-results-for-LO-1.txt.bz2
Description: application/bzip