Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


Hi Daniel,

On Fri, Aug 17, 2012 at 5:50 PM, Daniel Bankston
<daniel.dev.libreoffice@gmail.com> wrote:
Hi, Kohei,

From what I understand, ScXMLImport::SetStyleToRanges was being called for
each row.  For each of it's calls, the result was multiple calls to where
the time was being spent in ScMarkArray::SetMarkArea and
ScMarkArray::Search.  This is from the call chain of
ScXMLImport::SetStyleToRanges's use of
XMLTabelStyleContext::FillPropertySet.

My first goal was to try to reduce the number of
XMLTabelStyleContext::FillPropertySet calls by only calling it if the
current style name was different from the previous style name. This way, if
several rows have the same style, we can just keep saving the ranges and
then make one call to apply styles for all of those rows.

Unfortunately, that didn't improve performance.  As far as I can tell, it
just pushed the amount of time that would have been spent on each row into
one call that took approximately the sum of all of the rows' times.

Ok, I admit I haven't tried to figure out the inner workings of ScMarkData
and ScMarkArray, but I decided maybe changing what methods we use from those
classes would help. ScCellRangesBase::GetMarkData uses
ScMarkData::MarkRangeList where we iterate through an ScRangeList that
ScCellRangesBase maintains. Each iteration causes multiple calls to
ScMarkArray::SetMarkArea and Search.

So I used ScRangeList::Combine to create an ScRange and use that range with
ScMarkData::SetMarkArea.  This combined with my first goal did improve
import time by 2 or 3 seconds, but I was expecting much more!

With these code changes, callgrind output reveals to me that we are no
longer spending lots of time in ScMarkData/Array methods (or ScRangeList for
that matter).  Although we were already spending time in malloc's and
free's, after these changes, we seem to spend just a little bit more time
there now.

So I wonder if this means that styles wasn't necessary this file's biggest
slow down.

Well, the styles are still a significant cause of poor performance.
Maybe not that much with this particular document, since this document
only has a bunch of raw string cells with not much styles.  But even
then, the time it spends on importing styles is not insignificant.

  No doubt that converting sc style UNO calls would help
performance at least a bit.

Yes.  That should certainly help, hopefully more than just "a bit".
:-)  Also note that, removing the UNO calls would also remove the use
of ScMarkData.  ScMarkData is basically a view data, to store current
range selections.  To me it makes no sense to use this when importing
a document.

The biggest win with removing the UNO calls (and also the ScDocFunc
calls) is that doing that would lay down the foundation for further
performance tweak.  By removing these layers and exposing ScDocument
to the XML import code, we could then further optimize style settings
directly to ScDocument.  That would be much easier to do than trying
to do the same thing via UNO layer.

Interesting thing is, we used to have similar performance problem with
styles import with Excel files.  And I did take a very similar
approach to what you just tried.  The difference is that, I set the
styles directly to ScDocument using the same data structure that's
used in ScAttrArray to store the styles attributes, to avoid
unnecessary conversation of data, and also unnecessary broadcasting
etc that it would normally do when modifying its internal content.
That resulted in a pretty big performance improvement.

You can take a look at ScDocument::SetAttrEntries and its call chain
all the way down to ScAttrArray.

So, the approach I would aim for is to change the XML import code to
use ScDocument::SetAttrEntries to set the styles directly, then see if
that would improve the performance.  I'm pretty sure that, as long as
we keep using the UNO layer to set properties there isn't much we can
do to improve performance further.  We need to break that layer first.

I wonder if the biggest problem now for a file
like this is xmloff because it's methods are all pretty high up there still
as far as callgrind time is concerned.   (Strings, lookups, etc.)

Yup. The string allocations during parsing unfortunately plays a large
part of the overall poor import performance. But reworking that would
require reworking the whole XML import code stack, which would require
a bit more effort than we have time for GSOC, to say the least...

Anyway, don't feel too bad about it.  Sometimes the perf work goes
like that, and I have had my share of disappointment over the years.
This styles import code is especially a tough nut to crack....

Best,

Kohei

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.