Date: prev next · Thread: first prev next last
2015 Archives by date, by thread · List index


Hi Stephan,

On Fri, 2015-09-11 at 16:04 +0200, Stephan Bergmann wrote:
But I doubt we want to make our code base more capricious than 
necessary, to shield us from behavior exhibited by the Windows debugging 
environment.

        Ah ;-) well - I saw std::abort not aborting, and I added that to make
it actually - die ;-> you recall the discussion: _exit() was the
solution.

        I rather suspect that while the process is being debugged, and the
dialog is up that other threads are making progress - anyhow - the
windows behavior is somewhat unusual here.

    Which thread would you expect the signal to be delivered
to (I wonder) - it's all a bit interesting I suspect.

The case should be pretty clear for a synchronous, std::abort-generated 
SIGABRT (hopefully even on Windows).

        I don't find much that's terribly clear about signal handling, and/or
the cross-thread synchronization mess that follows it around under the
covers =)

    My hope was that the watchdog would carry on working in these cases &
kill us again more aggressively if necessary if people insist on
ignoring these guys.

But how should it do that?  Even if the SIGABRT-handling were done on 
another thread, the watchdog thread just couldn't progress past the 
std::abort() (notwithstanding cheating in a debugging environment).

        Good point =) so best to start a new watchdog instance in the abort
handler then.

So there's only a single instance of the watchdog thread supposed to 
ever run.  The odd "static bool bFired" in OpenGLWatchdogThrad::execute 
had fooled me to assume otherwise (for why else should the variable have 
static storage duration).

        Ah - this was a reasonably harmless way to avoid using a variable in a
wider scope ;-) given that this class is a singleton.

Anyway, generalizing that "watchdog the OpenGLWatchdogThread, in case 
our signal handler gets stuck" idea obviously leads to a "watchdog our 
signal handler, in case it gets stuck" feature, i.e., spawn a thread 
early in our signal handler (assuming spawning an additional thread 
doesn't make our violation of what a signal handler is supposed to be 
allowed to do any worse), which will call _exit after a fixed amount of 
time.

        Actually, I think that's a great idea =) I've ~often seen traces out of
bugzilla for hung processes (on Linux at least) where the hang was in a
crash from the recovery process. That leads to these unfortunate dead
windows lingering around etc. and upset users.

  The question just is, what is a reasonable value for that amount 
of time.  Make it too short, and you'll prevent recovery of documents 
that take long to save and for which our document recovery would 
otherwise have happened to work fine.

        Right; hmm =) several of the traces I remember seeing were nasty ones
where eg. the malloc arena mutex was locked - making it rather hard to
make progress ;-) or we were blocked trying to get the solar-mutex.

        I guess if we were truly 31337 we would hook some interaction handler
that had a global progress-bar hook (so we would see the emergency
'save' making progress), and another that would ignore yielding waiting
for user-interaction (or do we not ask questions during the crash
handler - I forget - there is plenty of GUI stuff there still).

        It might work: I'd say if there is no progress-bar type update from a
file filter in 5 seconds of any kind, it is "really game-over" =)

And the true route ahead of course is to no longer put our document 
recovery strategy at the mercy of a brittle, undefined-behavior--riddled 
signal handler.

        Sure =) far more ideal would be to stream the keystroke / edits that
happen on the document and fsync them to an append-only file ever few
keystrokes, and then re-play them on crash-recovery =) so "nothing can
ever be lost" - would be ideal.

        Only problem is - we need to implement something like a collaborative
editor first I think =)

        ATB,

                Michael.

-- 
 michael.meeks@collabora.com  <><, Pseudo Engineer, itinerant idiot


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.