Re: help needed: hsqldb shutdown race condition, how to fix without deadlock

Lionel Elie Mamane <lionel -AT- mamane.lu>
Wed, 3 Jun 2015 17:31:37 +0200

On Wed, Jun 03, 2015 at 03:01:39PM +0200, Michael Stahl wrote:

On 03.06.2015 14:47, Lionel Elie Mamane wrote:

Since the problem is essentially that the two threads take the same mutexes
in different order, here is a dirty hack that forces the hsqldb thread
to take the "affine bridge" mutex before taking the "HSQL driver"
mutex.

i'm not entirely sure this is sensible...


Thinking the issue from yet another angle, if we go back to the
situation at the start of this thread (hsqldb is *not* affine). I
don't understand why the
connectivity::hsqldb::ODriverDelegator::disposing thread holds the
affine lock, and goes through the affine bridge:

#6  0x00002b4ff40b8342 in connectivity::hsqldb::ODriverDelegator::disposing (this=0x1ca1cf0, 
Source=...)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/connectivity/source/drivers/hsqldb/HDriver.cxx:566
(...)
#13 0x00002b4ff45f5f4b in AffineBridge::outerDispatch (this=this@entry=0x1ca77b0, loop=loop@entry=1)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/AffineBridge/AffineBridge.cxx:201
#14 0x00002b4ff45f6186 in AffineBridge::v_callInto_v (this=0x1ca77b0, pCallee=0x2b4fd23f9dda 
<s_pull(va_list*)>,
    pParam=0x2b4fecb5d3c0)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/AffineBridge/AffineBridge.cxx:270
(...)
#26 0x00002b4ff47fdccc in Proxy::dispatch (this=this@entry=0x1cb3ac0,
    pReturnTypeRef=pReturnTypeRef@entry=0x8ce880, pParams=pParams@entry=0x20789e0, 
nParams=nParams@entry=0,
    pMemberType=pMemberType@entry=0x236b4c0, pReturn=pReturn@entry=0x2b4fecb5dd90, 
pArgs=0x2b4fecb5d9a0,
    ppException=0x2b4fecb5da30)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/helper/purpenv/helper_purpenv_Proxy.cxx:445
#27 0x00002b4ff47fcc7d in s_Proxy_dispatch (pUnoI=0x1cb3ac0, pMemberType=0x236b4c0, 
pReturn=0x2b4fecb5dd90,
    pArgs=0x2b4fecb5d9a0, ppException=0x2b4fecb5da30)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/cppu/source/helper/purpenv/helper_purpenv_Proxy.cxx:170
(...)
#32 0x00002b4fe831ac12 in privateSnippetExecutor ()
   from /home/master/src/libreoffice/workdirs/libreoffice-5-1/instdir/program/libgcc3_uno.so
#33 0x00002b4fecd6e866 in dbaccess::OConnection::disposing (this=0x2058670)
    at 
/home/master/src/libreoffice/workdirs/libreoffice-5-1/dbaccess/source/core/dataaccess/connection.cxx:492


Why does the call from dbaccess::OConnection::disposing into
connectivity::hsqldb::ODriverDelegator::disposing go through the
affine bridge?

a component is (in general) responsible for its own thread safety, and
should not make assumptions about how other components are implemented,
in particular wrt. locking.


The hsqldb module is in great part a wrapper around the jdbc module,
so I'm not too worried about hsqldb "knowing too much" about the jdbc
module.

Alternatively, is there any way to force the hsqldb module to run in
the same thread as the jdbc module?

this implies that if a component takes a lock, it must release the
lock before calling a method that could end up calling into a
different component, to avoid deadlock.


This sounds very restrictive. Essentially, you are saying that a
method from class A in component CA cannot protect itself from being
called "reentrantively" (or another method being called on the same
object) if it has to call a method of an object B from component CB
that is a member of class A:

class A
{
    private:

    mutex m_aMutex;
    B m_aB0;
    B m_aB1;

    public:

    void foo()
    {
       MutexGuard(m_aMutex);
       m_aB1 = m_aB0;
       m_aB0.non_const_method();
    }

    void bar()
    {
       MutexGuard(m_aMutex);
       m_aB0 = m_aB1;
       m_aB1.other_non_const_method();
    }
}

How do you suggest this is made thread-safe without holding m_aMutex?
foo() and bar() need to be mutually exclusive, else the results can
differ. Even assuming B::operator=, B::other_non_const_method and
B::non_const_method are thread-safe, one could have interleaved
execution of foo() and bar():

  m_aB0 = m_aB1;
  m_aB1 = m_aB0;
  m_aB0.non_const_method();
  m_aB1.other_non_const_method();

which is the same as:

  m_aB0 = m_aB1;
  m_aB0.non_const_method();
  m_aB1.other_non_const_method();


which gives a different result than:

  m_aB0 = m_aB1;
  m_aB0.non_const_method();
  m_aB1 = m_aB0;
  m_aB1.other_non_const_method();

or

  m_aB1 = m_aB0;
  m_aB1.other_non_const_method();
  m_aB0 = m_aB1;
  m_aB0.non_const_method();

secondly, those components which run inside the ":affine" purpose
environment can only be called from a single thread, so they do not
need a mutex of their own.


Yeah, hsqldb has no good reason to be affine, except to work around
the issue. Yes, its own mutex is redundant with being affine.

 there may be multiple re-entrant calls into the apartment, where
the affine component calls out into a different component (which is
bridged to a thread that is different from both the original calling
thread and the affine thread), which calls back agian into the
affine component, so a non-recursive mutex could deadlock but a
recursive mutex can not.


I don't think that is possible, since the affine bridge will allow
only one call (which is the issue at hand here). As long as that call
is not finished, the affine bridge will block:

void AffineBridge::v_callInto_v(uno_EnvCallee * pCallee, va_list * pParam)
{
    osl::MutexGuard guard(m_outerMutex); // only one thread at a time can call into
    (...)
}

I understand there is only one AffineBridge?

-- 
Lionel

Context

help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
- Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Noel Grandin
  - Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
    - Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
      - Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Robert Antoni Buj i Gelonch
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Robert Antoni Buj i Gelonch
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Michael Stahl
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane
        
        Re: help needed: hsqldb shutdown race condition, how to fix without deadlock · Lionel Elie Mamane

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.