On 10/01/2012 07:40 PM, Riccardo Magliocchetti wrote:
Il 01/10/2012 18:11, Michael Meeks ha scritto:On Fri, 2012-09-28 at 20:26 +0200, Riccardo Magliocchetti wrote:Here you go, you'll find the valgrind output sans all the zlib / python related errors. Thanks a lot!update to git 7a3a8a254363801bf6893b23d67f86a7461f8f3b and more or less same valgrind errors. See attached trace.It looks really interesting :-) it's hard to tell (from valgrind) where the threads were pre-empted [ actually logging that might be quite a nice feature to add to for valgrind ;-], but there is already the SolarMutex guard to try to stop this from happening. I append the odd trace with some thoughts. It is -possible- that we never get around to taking the solar mutex in the first instance in 'main' - such that by the time we come to release and re-take it in vcl/headless/svpinst.cxx: void SvpSalInstance::Yield( bool bWait, bool bHandleAllCurrentEvents ) the count is zero - could you add some fprintf debugging to check that - just dump nAcquireCount. If that is the case this patch might help: --- a/vcl/headless/svpinst.cxx +++ b/vcl/headless/svpinst.cxx @@ -171,6 +171,9 @@ bool SvpSalInstance::CheckTimeout( bool bExecuteTimers ) // timed out, update timeout m_aTimeout = aTimeOfDay; m_aTimeout += m_nTimeoutMS; + + osl::SolarGuard aGuard( mpSalYieldMutex ); + // notify ImplSVData* pSVData = ImplGetSVData(); if( pSVData->mpSalTimer ) If it does - it's not the real fix ;-) but it'd be good to know. Thanks for posting the trace - rather interesting :-)nAcquireCount is always 0 and the patch above let me run two unoconv testsuite runs in a row without valgrind logs.
It indeed looks like Application::Yield shall be routinely called with the solar mutex unlocked, and that timer callbacks called from within Yield shall be called with the solar mutex locked, so e.g. sal_gtk_timeout_dispatch (vcl/unx/gtk/app/gtkdata.cxx; when using the GTK VCL plug-in) explicitly locks the solar mutex before calling the timer callback in a stack like
#7 0x00007ffff32e940e in Timer::ImplTimerCallbackProc () at lo/core/vcl/source/app/timer.cxx:133 #8 0x00007fffe64e370b in SalTimer::CallCallback (this=0x11001d0) at lo/core/vcl/inc/saltimer.hxx:57 #9 0x00007fffe64e2d54 in sal_gtk_timeout_dispatch (pSource=0x12be2a0) at lo/core/vcl/unx/gtk/app/gtkdata.cxx:844 #10 0x0000003d8d647825 in g_main_dispatch (context=0x6597b0) at gmain.c:2539 #11 g_main_context_dispatch (context=context@entry=0x6597b0) at gmain.c:3075 #12 0x0000003d8d647b58 in g_main_context_iterate (context=context@entry=0x6597b0, block=block@entry=0, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146 #13 0x0000003d8d647c14 in g_main_context_iteration (context=0x6597b0, may_block=0) at gmain.c:3207 #14 0x00007fffe64e207a in GtkData::Yield (this=0x6113a0, bWait=true, bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkdata.cxx:582 #15 0x00007fffe64e5c38 in GtkInstance::Yield (this=0x611830, bWait=true, bHandleAllCurrentEvents=false) at lo/core/vcl/unx/gtk/app/gtkinst.cxx:563 #16 0x00007ffff32debcc in ImplYield (i_bWait=true, i_bAllEvents=false) at lo/core/vcl/source/app/svapp.cxx:425 #17 0x00007ffff32dafb7 in Application::Yield (i_bAllEvents=false) at lo/core/vcl/source/app/svapp.cxx:459 #18 0x00007ffff32daf58 in Application::Execute () at lo/core/vcl/source/app/svapp.cxx:404 #19 0x00007ffff7a8cda4 in desktop::Desktop::Main (this=0x7fffffffe430) at lo/core/desktop/source/app/app.cxx:1661 #20 0x00007ffff32e644b in ImplSVMain () at lo/core/vcl/source/app/svmain.cxx:162 #21 0x00007ffff32e6578 in SVMain () at lo/core/vcl/source/app/svmain.cxx:199 #22 0x00007ffff7acddc7 in soffice_main () at lo/core/desktop/source/app/sofficemain.cxx:74 #23 0x0000000000400948 in sal_main () at lo/core/desktop/source/app/main.c:48 #24 0x0000000000400929 in main (argc=1, argv=0x7fffffffe618) at lo/core/desktop/source/app/main.c:47
That means that <http://cgit.freedesktop.org/libreoffice/core/commit/?id=002f8b272bc343477c649c648c5ceb2ccf0fba89> "vcl/headless: [HACK] workaround solarMutexx splat with headless mode" looks like it is the right fix indeed (rather than a hack).
Now, the interesting thing is that the Linux-specific CreateSalInstance (called from ImplSVMain -> InitVCL before calling the application's Main in ImplSVMain) explicitly locks the solar mutex thanks to <http://cgit.freedesktop.org/libreoffice/core/commit/?id=852574f46f686a936a1b267e5780ca17d0f0d5ab> "INTEGRATION: CWS ause0c2 (1.3.18); FILE MERGED: 2004/03/12 15:25:43 hjs 1.3.18.1: #115868# anti freeze," while all the other variants of CreateSalInstance apparently do not (incl. the one in vcl/headless/headlessinst.cxx that is relevant when configured --enable-headless, as is the case for this mail thread). That means that all of desktop::Desktop::Main -> Application::Execute -> Application::Yield runs with the solar mutex locked on Linux (sans --enable-headless configuration) and with the solar mutex unlocked everywhere else.
The commit message for 852574f46f686a936a1b267e5780ca17d0f0d5ab is unfortunately rather unhelpful, but temporarily reverting it and doing "make check" on a Linux box starts to produce crashes within desktop::Desktop::Main -> Application::Execute -> Application::Yield -> ..., in code that apparently expects the solar mutex to be locked.
Stephan
==28574== Thread 1: ==28574== Invalid read of size 8 ==28574== at 0x739C702: SfxStateCache::SetState_Impl(unsigned short, SfxPoolItem const*, unsigned char) (statcach.cxx:454) ==28574== by 0x738042E: SfxBindings::UpdateControllers_Impl(SfxInterface const*, SfxFoundCache_Impl const*, SfxPoolItem const*, unsigned short) (bindings.cxx:1500) ==28574== by 0x7383C5B: SfxBindings::Update_Impl(SfxStateCache*) (bindings.cxx:437) ==28574== by 0x7383FD7: SfxBindings::NextJob_Impl(Timer*) (bindings.cxx:1646) ==28574== by 0x92BF4AD: Timer::ImplTimerCallbackProc() (timer.cxx:142) The YieldMutex -should- be held at this point excluding this other thread (?). ==28574== by 0x95D152D: SvpSalInstance::CheckTimeout(bool) (saltimer.hxx:57) ==28574== by 0x95D17D7: SvpSalInstance::Yield(bool, bool) (svpinst.cxx:288) ==28574== by 0x92B4530: Application::Yield(bool) (svapp.cxx:434) ==28574== by 0x92B45E6: Application::Execute() (svapp.cxx:413) ==28574== by 0x50AED58: desktop::Desktop::Main() (app.cxx:1711) ==28574== by 0x92BDFB8: ImplSVMain() (svmain.cxx:173) ==28574== by 0x92BE8E4: SVMain() (svmain.cxx:210) ==28574== by 0x50E1CF4: soffice_main (sofficemain.cxx:83) ==28574== by 0x40069A: main (main.c:25) ==28574== Address 0x1466bdc8 is 56 bytes inside a block of size 72 free'd ==28574== at 0x4C27FF2: operator delete(void*) (vg_replace_malloc.c:387) ==28574== by 0x737F479: SfxBindings::DeleteControllers_Impl() (bindings.cxx:325) ==28574== by 0x7380572: SfxBindings::~SfxBindings() (bindings.cxx:264) ==28574== by 0x7380718: SfxBindings::~SfxBindings() (bindings.cxx:275) ==28574== by 0x758B91E: SfxFrame::DoClose_Impl() (frame.cxx:178) ==28574== by 0x75AAFC1: SfxBaseController::dispose() (sfxbasecontroller.cxx:1041) sfx2/source/view/sfxbasecontroller.cxx: This method has a live: SolarMutexGuard aGuard; ==28574== by 0x16B8E785: framework::Frame::setComponent(com::sun::star::uno::Reference<com::sun::star::awt::XWindow> const&, com::sun::star::uno::Reference<com::sun::star::frame::XController> const&) (frame.cxx:1380) ==28574== by 0x16B8E1CF: framework::Frame::close(unsigned char) (frame.cxx:1633) ==28574== by 0x758BE87: SfxFrame::DoClose() (frame.cxx:140) ==28574== by 0x797F51C: SfxBroadcaster::Broadcast(SfxHint const&) (brdcst.cxx:49) ==28574== by 0x7511211: SfxModelListener_Impl::notifyClosing(com::sun::star::lang::EventObject const&) (objxtor.cxx:173) ==28574== by 0x7534972: SfxBaseModel::close(unsigned char) (sfxbasemodel.cxx:1491) ==28574== by 0x1E776909: SwXTextDocument::close(unsigned char) (unotxdoc.cxx:578) ==28574== by 0x7537FB0: SfxBaseModel::dispose() (sfxbasemodel.cxx:788) ==28574== by 0x1211D6E7: gcc3::callVirtualMethod(void*, unsigned int, void*, _typelib_TypeDescriptionReference*, bool, unsigned long*, unsigned int, unsigned long*, unsigned int, double*, unsigned int) (callvirtualmethod.cxx:128) ==28574== by 0x12120FEE: cpp_call(bridges::cpp_uno::shared::UnoInterfaceProxy*, bridges::cpp_uno::shared::VtableSlot, _typelib_TypeDescriptionReference*, int, _typelib_MethodParameter*, void*, void**, _uno_Any**) (uno2cpp.cxx:246) ==28574== by 0x12121ACD: bridges::cpp_uno::shared::unoInterfaceProxyDispatch(_uno_Interface*, _typelib_TypeDescription const*, void*, void**, _uno_Any**) (uno2cpp.cxx:440) ==28574== by 0x15D5CE5E: binaryurp::IncomingRequest::execute_throw(binaryurp::BinaryAny*, std::vector<binaryurp::BinaryAny, std::allocator<binaryurp::BinaryAny> >*) const (incomingrequest.cxx:251) ==28574== by 0x15D5D8AB: binaryurp::IncomingRequest::execute() const (incomingrequest.cxx:80) ==28574== by 0x15D61B4C: request (reader.cxx:89) ==28574== by 0x663A506: cppu_threadpool::JobQueue::enter(long, unsigned char) (jobqueue.cxx:115) ==28574== by 0x663B00D: cppu_threadpool::ORequestThread::run() (thread.cxx:179) ==28574== by 0x663B4E9: threadFunc (thread.hxx:196) ==28574== by 0x4E467AE: osl_thread_start_Impl (thread.c:261) ==28574== by 0x5E53EFB: start_thread (pthread_create.c:304) ==28574== by 0x5B8E89C: clone (clone.S:112)