On Mon, Jul 29, 2013 at 05:53:21PM +0100, Michael Meeks wrote:
I couldn't immediately find the duplication of the names. In this case the strings are the full zip file entry paths. e.g. "sw/res/sidebar/pageproppanel/portraitcopy_24x24.png"Riight - that's interesting :-) IIRC in the past there were two chunks of code in package/ that duplicated those names (I think). The fragment from the (AMD) report from December 2006 shows: 'package' zip code -1022k +500k reading the large images.zip file creates a huge hash table with lots of duplicated string stems – 3 days Of course, I couldn't tell you if this is still the case; possibly we're no longer duplicating those strings in that way. The problem was around 'images.zip' - the archive that has all of our icons in it for the UI - at least back ~7 years ago ;-)
That seems to make sense that this is about image paths. Most paths seem to come from opt/share/config/images.zip. But that file contains 3800+ entries and only a few seem to be reused later.
And as far as I can see all the full path names are unique, so no actual sharing is taking place here. But is there a place where these strings are reused (and also interned)?Interesting; of course - we can dump the contents of the interned table to see if they have ref-count 1 quite simply (?).Replacing the intern with a normal OUString constructor like:...Seems to save ~200K of memory at least for a quick:Nice :-) well - we should just do that then :-)
I am tempted to. Will do some more testing first to make sure I am not missing something.
But that might be too quick to see any effects of this intern action.The reason it was added was for images.zip - if the package code has improved then we should take & save that space/time.
I haven't yet found the code which references the image/resources maybe it needs interning itself. But it certainly looks like the current code is a bit too eager interning everything.
So I guess my general question is how to measure the effects of OUString::intern?I'd dump the ref-count + string contents of the intern table to see if there is more wasteage.
I'll try that next. For now I used systemtap which happens to have utf16 user string support. It looks all interned strings go through the function rtl_ustring_intern_internal. So probing that and printing the string gives an interesting overview. $ stap -e 'probe process("./solver/unxlngx6.pro/lib/libuno_sal.so").function("rtl_ustring_intern_internal") { log("interning: ". $str$$ . " " . user_string_utf16($str->buffer)); }' -c ./install/program/soffice interning: {.refCount=1, .length=9, .buffer=[108, ...]} links.txt interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_16.png interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_32.png interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/sx03251.png interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/lx03251.png interning: {.refCount=1, .length=18, .buffer=[99, ...]} cmd/lc_openurl.png interning: {.refCount=1, .length=20, .buffer=[99, ...]} cmd/lc_adddirect.png interning: {.refCount=1, .length=17, .buffer=[99, ...]} cmd/lc_newdoc.png [...] That shows (full output attached if the mailinglist allows that) interning (at least during startup) is done 4192 times. Only 128 strings are reused. And only 6 are interned 5 times or more: 115 Regular 57 Bold 14 Bold Italic 13 Italic 5 Light 5 Book
You saw the OUString debugging code: RTL_LOG_STRING_NEW / _STRING_DELETE etc. that can produce a long but crunch-able set of printfs on stdout: many of which are sadly not that useful due to OUStringBuffer mutation (IIRC - but presumably some more work could clean that up).
I hadn't seen that yet, but that might be useful to see which strings are recreated multiple times and so are candidates for interning. Is there already code to enable/trigger RTL_LOG_STRING_NEW? Or should I just write my own hooks? Thanks, Mark
Attachment:
interned.out.bz2
Description: BZip2 compressed data