Re: help needed: hsqldb shutdown race condition, how to fix without deadlock

Michael Stahl <mstahl -AT- redhat.com>
Wed, 03 Jun 2015 23:08:36 +0200

On 03.06.2015 17:31, Lionel Elie Mamane wrote:

On Wed, Jun 03, 2015 at 03:01:39PM +0200, Michael Stahl wrote:

On 03.06.2015 14:47, Lionel Elie Mamane wrote:

Since the problem is essentially that the two threads take the same mutexes
in different order, here is a dirty hack that forces the hsqldb thread
to take the "affine bridge" mutex before taking the "HSQL driver"
mutex.

i'm not entirely sure this is sensible...


Thinking the issue from yet another angle, if we go back to the
situation at the start of this thread (hsqldb is *not* affine). I
don't understand why the
connectivity::hsqldb::ODriverDelegator::disposing thread holds the
affine lock, and goes through the affine bridge:

#6  0x00002b4ff40b8342 in connectivity::hsqldb::ODriverDelegator::disposing (this=0x1ca1cf0, 
Source=...)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/connectivity/source/drivers/hsqldb/HDriver.cxx:566
(...)
#13 0x00002b4ff45f5f4b in AffineBridge::outerDispatch (this=this@entry=0x1ca77b0, 
loop=loop@entry=1)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/AffineBridge/AffineBridge.cxx:201
#14 0x00002b4ff45f6186 in AffineBridge::v_callInto_v (this=0x1ca77b0, pCallee=0x2b4fd23f9dda 
<s_pull(va_list*)>,
    pParam=0x2b4fecb5d3c0)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/AffineBridge/AffineBridge.cxx:270
(...)
#26 0x00002b4ff47fdccc in Proxy::dispatch (this=this@entry=0x1cb3ac0,
    pReturnTypeRef=pReturnTypeRef@entry=0x8ce880, pParams=pParams@entry=0x20789e0, 
nParams=nParams@entry=0,
    pMemberType=pMemberType@entry=0x236b4c0, pReturn=pReturn@entry=0x2b4fecb5dd90, 
pArgs=0x2b4fecb5d9a0,
    ppException=0x2b4fecb5da30)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/helper/purpenv/helper_purpenv_Proxy.cxx:445
#27 0x00002b4ff47fcc7d in s_Proxy_dispatch (pUnoI=0x1cb3ac0, pMemberType=0x236b4c0, 
pReturn=0x2b4fecb5dd90,
    pArgs=0x2b4fecb5d9a0, ppException=0x2b4fecb5da30)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/helper/purpenv/helper_purpenv_Proxy.cxx:170
(...)
#32 0x00002b4fe831ac12 in privateSnippetExecutor ()
   from /home/master/src/libreoffice/workdirs/libreoffice-5-1/instdir/program/libgcc3_uno.so
#33 0x00002b4fecd6e866 in dbaccess::OConnection::disposing (this=0x2058670)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/dbaccess/source/core/dataaccess/connection.cxx:492


Why does the call from dbaccess::OConnection::disposing into
connectivity::hsqldb::ODriverDelegator::disposing go through the
affine bridge?


because OConnection is outside the affine environment, so it has to go
through the bridge to call a component inside the affine environment.

however the ODriverDelegator isn't inside the affine environment, what
happens is that while AffineBridge::v_callInto_v() waits for the call on
the "inner" thread to return, it assigns the current (calling) thread as
the "outer" thread and in AffineBridge::outerDispatch() it waits for
outgoing calls from the "inner" thread.

(this is something i mis-remembered: i wrote that outgoing calls happen
on a thread different from the calling one but evidently that is not
always the case; a new "outer" thread may be launched but only if no
"outer" thread exists.)

the "inner" thread executes only calls against the affine component; if
the affine component does a UNO call to another component, the "inner"
thread stores some closure in the AffineBridge instance and signals the
"outer" thread, which will then do the call and once it's done signal
back to the "inner" thread.

because the thread locked the AffineBridge mutex, which is a recursive
mutex, it can call into the affine component again, so you can have
re-entrant calls in the affine component, but no other thread can call
into it.

a component is (in general) responsible for its own thread safety, and
should not make assumptions about how other components are implemented,
in particular wrt. locking.


The hsqldb module is in great part a wrapper around the jdbc module,
so I'm not too worried about hsqldb "knowing too much" about the jdbc
module.

Alternatively, is there any way to force the hsqldb module to run in
the same thread as the jdbc module?


that depends on how many AffineBridges there are...

this implies that if a component takes a lock, it must release the
lock before calling a method that could end up calling into a
different component, to avoid deadlock.


This sounds very restrictive. Essentially, you are saying that a
method from class A in component CA cannot protect itself from being
called "reentrantively" (or another method being called on the same
object) if it has to call a method of an object B from component CB
that is a member of class A:


yes, it is quite non-intuitive and likely violated in thousands of
places currently, even if we ignore the SolarMutex, where fixing this is
essentially impossible.

class A
{
    private:

    mutex m_aMutex;
    B m_aB0;
    B m_aB1;

    public:

    void foo()
    {
       MutexGuard(m_aMutex);
       m_aB1 = m_aB0;
       m_aB0.non_const_method();
    }

    void bar()
    {
       MutexGuard(m_aMutex);
       m_aB0 = m_aB1;
       m_aB1.other_non_const_method();
    }
}

How do you suggest this is made thread-safe without holding m_aMutex?
foo() and bar() need to be mutually exclusive, else the results can
differ. Even assuming B::operator=, B::other_non_const_method and
B::non_const_method are thread-safe, one could have interleaved
execution of foo() and bar():


usually what you need to do is copy member variables to local variables
on the stack

  void foo()
  {
    B tmp;
    {
       MutexGuard g;
       m_aB0 = m_aB1;
       tmp = m_aB0;
    }
    tmp.non_const_method();
  }

of course if the assignment and call must be atomic you're in trouble :-/

if you have parameters that are components you may need to do some silly
things too, e.g. look at commit 7db6b2f7968d8063b12312737136f09cb7549805
which fixed a reported deadlock.

 there may be multiple re-entrant calls into the apartment, where
the affine component calls out into a different component (which is
bridged to a thread that is different from both the original calling
thread and the affine thread), which calls back agian into the
affine component, so a non-recursive mutex could deadlock but a
recursive mutex can not.


I don't think that is possible, since the affine bridge will allow
only one call (which is the issue at hand here). As long as that call
is not finished, the affine bridge will block:

void AffineBridge::v_callInto_v(uno_EnvCallee * pCallee, va_list * pParam)
{
    osl::MutexGuard guard(m_outerMutex); // only one thread at a time can call into
    (...)
}


see discussion of "outer-thread" / "inner-thread" ping-pong above.

I understand there is only one AffineBridge?


not sure about that... it could be per-component-instance or per-factory
(i,e, per-library usually) or per-environment-type?

hmm... how about we solve the JDBC performance problem by, the first
time a Java component is created, the component-factory stuff posts a
message to the main thread via PostUserEvent with a handler that does a
"dummy" JNI call into some Java function that calls back via JNI into
another function that calls Application::Yield again; now JNI calls will
never *enter* the JVM on the main thread because it's always *in* the
JVM already - just needs cleaning up on quit, what could possibly go
wrong...

Context

help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
- Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Noel Grandin
  - Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
    - Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
      - Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Robert Antoni Buj i Gelonch
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Robert Antoni Buj i Gelonch
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.