Date: prev next · Thread: first prev next last
2013 Archives by date, by thread · List index


On Mon, Jul 29, 2013 at 05:53:21PM +0100, Michael Meeks wrote:
I couldn't immediately find the duplication of the names.
In this case the strings are the full zip file entry paths. e.g.
"sw/res/sidebar/pageproppanel/portraitcopy_24x24.png"

      Riight - that's interesting :-) IIRC in the past there were two chunks
of code in package/ that duplicated those names (I think). The fragment
from the (AMD) report from December 2006 shows:

      'package' zip code 
              -1022k
              +500k
      reading the large images.zip file creates a huge hash 
        table with lots of duplicated string stems – 3 days

      Of course, I couldn't tell you if this is still the case; possibly
we're no longer duplicating those strings in that way. The problem was
around 'images.zip' - the archive that has all of our icons in it for
the UI - at least back ~7 years ago ;-)

That seems to make sense that this is about image paths. Most paths seem
to come from opt/share/config/images.zip. But that file contains 3800+
entries and only a few seem to be reused later.

And as far as I can see all the full path names are unique, so no
actual sharing is taking place here. But is there a place where these
strings are reused (and also interned)?

      Interesting; of course - we can dump the contents of the interned table
to see if they have ref-count 1 quite simply (?). 

Replacing the intern with a normal OUString constructor like:
...
Seems to save ~200K of memory at least for a quick:

      Nice :-) well - we should just do that then :-)

I am tempted to. Will do some more testing first to make sure I am not
missing something.

But that might be too quick to see any effects of this intern action.

      The reason it was added was for images.zip - if the package code has
improved then we should take & save that space/time.

I haven't yet found the code which references the image/resources
maybe it needs interning itself. But it certainly looks like the current
code is a bit too eager interning everything.
 
So I guess my general question is how to measure the effects of
OUString::intern?

      I'd dump the ref-count + string contents of the intern table to see if
there is more wasteage.

I'll try that next. For now I used systemtap which happens to have utf16
user string support. It looks all interned strings go through the function
rtl_ustring_intern_internal. So probing that and printing the string gives
an interesting overview.

$ stap -e 'probe 
process("./solver/unxlngx6.pro/lib/libuno_sal.so").function("rtl_ustring_intern_internal") { 
log("interning: ". $str$$ . " " . user_string_utf16($str->buffer)); }' -c ./install/program/soffice

interning: {.refCount=1, .length=9, .buffer=[108, ...]} links.txt
interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_16.png
interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_32.png
interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/sx03251.png
interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/lx03251.png
interning: {.refCount=1, .length=18, .buffer=[99, ...]} cmd/lc_openurl.png
interning: {.refCount=1, .length=20, .buffer=[99, ...]} cmd/lc_adddirect.png
interning: {.refCount=1, .length=17, .buffer=[99, ...]} cmd/lc_newdoc.png
[...]

That shows (full output attached if the mailinglist allows that) interning
(at least during startup) is done 4192 times. Only 128 strings are reused.
And only 6 are interned 5 times or more:

    115  Regular
     57  Bold
     14  Bold Italic
     13  Italic
      5  Light
      5  Book

      You saw the OUString debugging code: RTL_LOG_STRING_NEW /
_STRING_DELETE etc. that can produce a long but crunch-able set of
printfs on stdout: many of which are sadly not that useful due to
OUStringBuffer mutation (IIRC - but presumably some more work could
clean that up).

I hadn't seen that yet, but that might be useful to see which strings
are recreated multiple times and so are candidates for interning.
Is there already code to enable/trigger RTL_LOG_STRING_NEW?
Or should I just write my own hooks?

Thanks,

Mark

Attachment: interned.out.bz2
Description: BZip2 compressed data


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.