Date: prev next · Thread: first prev next last
2018 Archives by date, by thread · List index


On 28.02.2018 02:36, Tomaž Vajngerl wrote:
On Tue, Feb 27, 2018 at 4:53 PM, Stephan Bergmann <sbergman@redhat.com> wrote:
SwarmSolverTest::testUnconstrained in sccomp/qa/unit/SwarmSolverTest.cxx has
already been weakened in the past,
<https://cgit.freedesktop.org/libreoffice/core/commit/?id=1fa761af825641da5c87f80c2a17135f92418960>
"Ridiculously large delta for SwarmSolverTest::testUnconstrained for now"
and
<https://cgit.freedesktop.org/libreoffice/core/commit/?id=0c3444c9bcee093ad5976af8948138e6f2a97706>
"Weaken SwarmSolverTest::testUnconstrained even further for now".  The first
one has the following in its commit message: "suggestion by Tomaž Vajngerl
was: 'Let's adapt the delta for now. Generally anything close to 3 should be
acceptable as the algorithm greatly depends on random values.'"

Now <https://ci.libreoffice.org/job/lo_ubsan/833/console> failed with


/home/tdf/lode/jenkins/workspace/lo_ubsan/sccomp/qa/unit/SwarmSolverTest.cxx:106:(anonymous
namespace)::SwarmSolverTest::testUnconstrained
double equality assertion failed
- Expected: 3
- Actual  : 94.6605927051114
- Delta   : 0.9


Is that also an acceptable outcome, or does it indicate a bug somewhere that
would need to be fixed?  What good is a test whose success criterion is the
result of ad-hoc guesswork, instead of being determined precisely up-front
when the test was written?
Can that test please be fixed properly, so that it would be actually useful?

Well, it is neither - that's just the nature of stochastic algorithms.
It is not the fault of the test - how it was defined at the beginning
was the exact outcome we would expect (just like a global maximum of
an function is exactly one value). The problem is that the algorithm
itself doesn't guarantee to find that solution or comes as near to the
solution in its allotted time, allotted number of generations or just
gets stuck in some local extreme value, however this should usually
happen with a small statistical probability in a normal run of the
algorithm that has a fast enough CPU.

Then those qualities of the algorithm need to be taken into account when writing the test, I think. A small probability of failure is apparently still a problem. We need tests to be reliable.

Maybe I'm wrong but I don't see this failing in tinderboxes or
jenkins, so I wonder what ubsan does to make it fail. The algorithm
has a time limit, could it be that the execution is slowed down so
much that the result didn't develop enough (I didn't expect this to be
so). Could we skip it for ubsan only?

Those ASan+UBSan tinderbox builds execute rather slowly, yes. (<http://clang.llvm.org/docs/AddressSanitizer.html> claims "Typical slowdown introduced by AddressSanitizer is 2x.")

But also as reported by others today on #libreoffice-dev:

Feb 28 09:17:32 <buovjaga>        sberg: I got a swamsolver failure yesterday. Then I pulled later 
and the next build went fine.
Feb 28 09:19:03 <buovjaga>        After the failure, soffice refused to start. I don't have logs, 
unfortunately

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.