Hi Noel,
        This belongs CC'd to the dev. list; please do fwd it there to contine
the discussion =)
On Sun, 2016-02-28 at 09:05 +0200, Noel Grandin wrote:
When you guys did the SAX parsing improvements (XFastParser2), why did
we maintain the UNO API?
        Is there an XFastParser2 API ?
Why not use libxml/expat directly ?
        The libxml2 API (the faster parser) is horrendous - the XFastParser API
is at least a tokenized API - which is essentially what we want the code
to consume; ultimately we want to patch libxml2 some more as well to
improve load performance - removing some of the more stupid pieces;
quite possibly we also want to implement an even faster compressed XML
parsing scheme I have up my sleeve behind that API.
        We did short-circuit UNO for the tokenization piece - which saved a
huge chunk of time, and profiled it rather intensively. Last I looked, I
saw no significant performance cost from the UNO interface.
        Finally - the libxml2 and expat APIs are (like most SAX APIs)
synchronous, and same-thread; a big part of our load-time speed win
comes from doing the XML parse + tokenize in another thread, and
emitting the events in the main thread [ cf. slide decks at several
LibreOffice conferences on the topic ].
        ie. nothing to 'fix' there =)
I'm assuming there is something I'm missing?
        Depends what you're trying to achieve =) if you want to improve
performance and cleanliness -by-far- the most useful thing remaining to
be done there is to switch the ODF filters in xmloff/ to use the
FastParser API - currently they do tokenization themselves in a horribly
inefficient way; and of course they don't take advantage of the threaded
parsing etc.
        There was a Munich student (Daniel Sikeler) working on that -
unfortunately with very little time for mentoring; so it may be a
challenge to try to rescue that work. xmloff/ is quite big - and built
on outside in the main components too. So - almost certainly by far the
best way here is an incremental one.
        We need to write a good, clean XFastParser <-> XParser mapping, prolly
that will require some love in sax/ some of the semantics don't map
entirely perfectly in corner cases. I believe Daniel's branch is
feature/fastparser - and you could rescue just this mapper from there I
think.
        That would then allow the threaded processing & tokenization (we would
need to de-tokenize again to the XParser interface but I think we would
still get some nice wins ;-). When that works nicely - we need to
connect the xmloff/ tokenization code to the XFastParser tokenized
results to avoid doing all of that twice, and slowly and carefully push
the interface change across the code to kill the XParser variant.
        At least - that would be my suggestion of something worthwhile & juicy
to dig teeth into =) it is
        ATB,
                Michael.
  Privacy Policy |
  
Impressum (Legal Info) |
  
Copyright information: Unless otherwise specified, all text and images
  on this website are licensed under the
  
Creative Commons Attribution-Share Alike 3.0 License.
  This does not include the source code of LibreOffice, which is
  licensed under the Mozilla Public License (
MPLv2).
  "LibreOffice" and "The Document Foundation" are
  registered trademarks of their corresponding registered owners or are
  in actual use as trademarks in one or more countries. Their respective
  logos and icons are also subject to international copyright laws. Use
  thereof is explained in our 
trademark policy.