Hi Stephan,
On Fri, 2015-09-11 at 16:04 +0200, Stephan Bergmann wrote:
But I doubt we want to make our code base more capricious than
necessary, to shield us from behavior exhibited by the Windows debugging
environment.
Ah ;-) well - I saw std::abort not aborting, and I added that to make
it actually - die ;-> you recall the discussion: _exit() was the
solution.
I rather suspect that while the process is being debugged, and the
dialog is up that other threads are making progress - anyhow - the
windows behavior is somewhat unusual here.
Which thread would you expect the signal to be delivered
to (I wonder) - it's all a bit interesting I suspect.
The case should be pretty clear for a synchronous, std::abort-generated
SIGABRT (hopefully even on Windows).
I don't find much that's terribly clear about signal handling, and/or
the cross-thread synchronization mess that follows it around under the
covers =)
My hope was that the watchdog would carry on working in these cases &
kill us again more aggressively if necessary if people insist on
ignoring these guys.
But how should it do that? Even if the SIGABRT-handling were done on
another thread, the watchdog thread just couldn't progress past the
std::abort() (notwithstanding cheating in a debugging environment).
Good point =) so best to start a new watchdog instance in the abort
handler then.
So there's only a single instance of the watchdog thread supposed to
ever run. The odd "static bool bFired" in OpenGLWatchdogThrad::execute
had fooled me to assume otherwise (for why else should the variable have
static storage duration).
Ah - this was a reasonably harmless way to avoid using a variable in a
wider scope ;-) given that this class is a singleton.
Anyway, generalizing that "watchdog the OpenGLWatchdogThread, in case
our signal handler gets stuck" idea obviously leads to a "watchdog our
signal handler, in case it gets stuck" feature, i.e., spawn a thread
early in our signal handler (assuming spawning an additional thread
doesn't make our violation of what a signal handler is supposed to be
allowed to do any worse), which will call _exit after a fixed amount of
time.
Actually, I think that's a great idea =) I've ~often seen traces out of
bugzilla for hung processes (on Linux at least) where the hang was in a
crash from the recovery process. That leads to these unfortunate dead
windows lingering around etc. and upset users.
The question just is, what is a reasonable value for that amount
of time. Make it too short, and you'll prevent recovery of documents
that take long to save and for which our document recovery would
otherwise have happened to work fine.
Right; hmm =) several of the traces I remember seeing were nasty ones
where eg. the malloc arena mutex was locked - making it rather hard to
make progress ;-) or we were blocked trying to get the solar-mutex.
I guess if we were truly 31337 we would hook some interaction handler
that had a global progress-bar hook (so we would see the emergency
'save' making progress), and another that would ignore yielding waiting
for user-interaction (or do we not ask questions during the crash
handler - I forget - there is plenty of GUI stuff there still).
It might work: I'd say if there is no progress-bar type update from a
file filter in 5 seconds of any kind, it is "really game-over" =)
And the true route ahead of course is to no longer put our document
recovery strategy at the mercy of a brittle, undefined-behavior--riddled
signal handler.
Sure =) far more ideal would be to stream the keystroke / edits that
happen on the document and fsync them to an append-only file ever few
keystrokes, and then re-play them on crash-recovery =) so "nothing can
ever be lost" - would be ideal.
Only problem is - we need to implement something like a collaborative
editor first I think =)
ATB,
Michael.
--
michael.meeks@collabora.com <><, Pseudo Engineer, itinerant idiot
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.