On 28.02.2018 02:36, Tomaž Vajngerl wrote:
On Tue, Feb 27, 2018 at 4:53 PM, Stephan Bergmann <sbergman@redhat.com> wrote:
SwarmSolverTest::testUnconstrained in sccomp/qa/unit/SwarmSolverTest.cxx has
already been weakened in the past,
<https://cgit.freedesktop.org/libreoffice/core/commit/?id=1fa761af825641da5c87f80c2a17135f92418960>
"Ridiculously large delta for SwarmSolverTest::testUnconstrained for now"
and
<https://cgit.freedesktop.org/libreoffice/core/commit/?id=0c3444c9bcee093ad5976af8948138e6f2a97706>
"Weaken SwarmSolverTest::testUnconstrained even further for now". The first
one has the following in its commit message: "suggestion by Tomaž Vajngerl
was: 'Let's adapt the delta for now. Generally anything close to 3 should be
acceptable as the algorithm greatly depends on random values.'"
Now <https://ci.libreoffice.org/job/lo_ubsan/833/console> failed with
/home/tdf/lode/jenkins/workspace/lo_ubsan/sccomp/qa/unit/SwarmSolverTest.cxx:106:(anonymous
namespace)::SwarmSolverTest::testUnconstrained
double equality assertion failed
- Expected: 3
- Actual : 94.6605927051114
- Delta : 0.9
Is that also an acceptable outcome, or does it indicate a bug somewhere that
would need to be fixed? What good is a test whose success criterion is the
result of ad-hoc guesswork, instead of being determined precisely up-front
when the test was written?
Can that test please be fixed properly, so that it would be actually useful?
Well, it is neither - that's just the nature of stochastic algorithms.
It is not the fault of the test - how it was defined at the beginning
was the exact outcome we would expect (just like a global maximum of
an function is exactly one value). The problem is that the algorithm
itself doesn't guarantee to find that solution or comes as near to the
solution in its allotted time, allotted number of generations or just
gets stuck in some local extreme value, however this should usually
happen with a small statistical probability in a normal run of the
algorithm that has a fast enough CPU.
Then those qualities of the algorithm need to be taken into account when
writing the test, I think. A small probability of failure is apparently
still a problem. We need tests to be reliable.
Maybe I'm wrong but I don't see this failing in tinderboxes or
jenkins, so I wonder what ubsan does to make it fail. The algorithm
has a time limit, could it be that the execution is slowed down so
much that the result didn't develop enough (I didn't expect this to be
so). Could we skip it for ubsan only?
Those ASan+UBSan tinderbox builds execute rather slowly, yes.
(<http://clang.llvm.org/docs/AddressSanitizer.html> claims "Typical
slowdown introduced by AddressSanitizer is 2x.")
But also as reported by others today on #libreoffice-dev:
Feb 28 09:17:32 <buovjaga> sberg: I got a swamsolver failure yesterday. Then I pulled later
and the next build went fine.
Feb 28 09:19:03 <buovjaga> After the failure, soffice refused to start. I don't have logs,
unfortunately
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.