Hey,
On Sat, Sep 26, 2015 at 11:39 PM, Michael Meeks <michael.meeks@collabora.com
wrote:
Hi Markus,
On Sat, 2015-09-26 at 20:51 +0200, Markus Mohrhard wrote:
so we have been running our in-build performance tests now for a few
weeks and recently discovered that our internal memory allocator is
causing spikes in the runtime.
What fun =) the irony is that it was written to avoid exactly such
spikes (which were primarily on Windows) ;-) Thanks for finding this
one !
It became even worse during the weekend with the tests taking 200
times the instructions. Most of it seems to be spend in our memory
handling code and not really in the actual code. (see for example
http://perf.libreoffice.org/perf_html/ftest_of_cppu_sc_on_vm139.details.html
with the annotated callgrind ouput at http://pastebin.com/ELC64s1n).
We had a profile that showed the issue inside of the memory allocator
much better but I have to find it again.
Interesting. We found a really silly one in the ::Interpret just
recently, and have a simple fix - that could cause some excessive
allocation - but I forget if Tor merged that to master (yet).
TBH - I find using kcachegrind -incredibly- more useful than the
annotated output above.
Me too. The annotated output is what we currently get from the performance
testing in jenkins. So they are much better than nothing as they allow us
to look into the past by just inspecting the build logs. I'm already
incredibly thankful to Norbert for making them available as it allows me to
see what is going on the VM.
Is the internal memory allocator really still useful despite showing
sometimes really bad behavior ?
I'd say not myself. My hope is that the windows allocator has also
had
some work done on it since ~2005? when the issue was worked around by
mhu.
Personally I would just fall back to the system memory allocator
except for the few cases where we know that it makes a difference
(small memory blocks in calc formula tokens, ...)
Right - it should be far quicker, particularly on Linux.
I'd love to see how a change like that impact the profiles; worth a
commit to master and a quick revert later if there is a visible issue
anywhere I guess =)
I have just committed such a change. We only have "reliable" data for linux
but if we see some huge improvement there we should at least consider
keeping it enabled on linux. The other idea that I had is that it is
related to using swap as we were surprisingly close to the RAM limit on the
VM. However after discussing this idea with Norbert I'm no longer sure if
using the swap would result in a changed callgrind IR count.
Then again - I think we're going to need a custom allocator of some
kind (though prolly rather slow & dumb) for LibreOfficeKit
pre-initialization for cloudy bits - so; perhaps that allocator may be
useful in the end temporarily for that.
Of course there are a few places where we need a custom allocator but if it
really performs badly we might want to limit these places.
Markus
ATB,
Michael.
--
michael.meeks@collabora.com <><, Pseudo Engineer, itinerant idiot
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.