Date: prev next · Thread: first prev next last
2011 Archives by date, by thread · List index


Hi Tomas,

On Tue, 2011-12-13 at 00:52 +0100, Tomas Hlavaty wrote:
that they are tiny,

What does "tiny" mean?

        Well - you're going to find it hard to make it bigger than the existing
rdb files ;-) but by tiny I really mean fast to read from disk and fast
to parse.

Currently, rdb files are giant.

        Sure; they are a disaster :-)

I'm not sure why.  If I simply concatenate all idl definitions for
udkapi and offapi into one preprocessed file I get smaller file while
still being a valid idl file containing all the information:

        Yep; this is well known. It is all done re-using some code not intended
for thus purpose, which has been tweaked to the maximum to try to make
it suit it better, but it still doesn't ;-)

Is 200kB considered tiny?

        Sounds fine :-)

And this is just original concatenated idl files.

        Sure - sounds fine; if we can parse it fast. 

How long does reading the type information take at the moment?

        That's quite hard to say; access to it is extremely scattered across
the code. callgrind gives 1.5% in libreg, 0.6% in libstore and some
lowish proportion of the 32% in libuno_sal; say perhaps 2.5%. That IMHO
hides it's true cost - we have to force pagein all that data before
start to avoid horrible I/O patterns mmap gives us as we seek about in
those big files.

What do we get to do a lot at startup?  I thought we simply load it an
that's it.

        Sure; we load it & that is it  *but* we would really like to be
starting in total in under a second, at least making choices that hurt
that goal on a fast PC are almost certain to also hurt the goal of
working well on mobile devices etc. :-)

If the new format is a text format (I would prefer text format over
another binary one), there needs to be some parsing.  unoidl2 can parse
the allpp.idl file (containing all type information) and print the
syntax tree in about 200ms:

   $ rm allpp.ast 
   $ time make allpp.ast
   cat allpp.idl | ./unoidl2ast >allpp.ast

   real  0m0.247s
   user  0m0.170s
   sys   0m0.100s

        250ms is a -really- long time IMHO; particularly since we have to parse
the entire file before startup. As Stephan says, perhaps we can overcome
this by inlining more in the generated C++ which may make that
acceptable later (after all bootstrapping python takes a good long time
itself anyway).

If 200ms is slow, we could split the allpp,idl file into something
smaller required at startup and the rest loaded lazily.

        Possibly; or we could invent yet another format for this type
information. Personally, I'd like to keep the number of representations
of the same information as low as possible: we already have IDL, we have
the binaryurp format [ used for IPC on the wire ] (potentially we could
re-use that?), do we have an XML/text IPC protocol ? I suspect we will
want that for the remote Javascript/websockets magic - possibly we could
use a condensed XML format for this that'd be quicker to parse ?
unclear. Stephan - do you have some ideas ? as soon as I see a yacc
parser, I see "slow" and "busts the branch predictor" - but perhaps I'm
paranoid ;-)

We could have a binary format, something like a mmap dump.  That would
be instant but rather ugly.

        Sure - that'd be bad :-) I like the 'concatenate text files' approach
for building the the database (personally). 

Are there any other requirements?  Like functionality related to
rdbmerge and how extensibility works?  Or is that not relevant anymore?

        rdbmerge is/was IIRC just a compile-time tool. Clearly we need to
continue to be able to read old types.rdb files for some time to come,
but that can be de-coupled and removed later I think.

I was under impression that these projects somehow depend on the rdb
code, but if they depend on the typedescription api, then it is better
then I hoped (if that typedescription api is somehow separate from the
rdb file code).

        Sure - there is only one place that we go grubbing with that nasty rdb
format - and it's at the bottom of the stack :-) if we can hot plug that
out with something else, life is good :-)

        Thanks,

                Michael.

-- 
michael.meeks@suse.com  <><, Pseudo Engineer, itinerant idiot


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.