Re: XFastParser - next steps ...

Michael Meeks <michael.meeks -AT- collabora.com>
Mon, 01 Aug 2016 15:51:15 +0100

Hi Mohammed,

On Mon, 2016-08-01 at 12:09 +0530, Mohammed Abdul Azeem wrote:

                A. optimize clearing the pending events - unlikely to
        give
                   a big win, but nice.

This is done.


        Great.

                B. merge the legacyfastparser pieces into SvXMLImport

If we do this, it will be
XParser -> XFastParser -> unknown elements -> callbackDocumentHandler
-> SvXMLImport -> tokenize (SvXMLNamespaceMap) -> FastContexts


        As a first cut, then yes we will tokenize and de-tokenize and
re-tokenize ;-) but the de-tokenize is just looking up in array:

        OUString aTokens[128];

        aTokens[nTokenIndex]

        which is quick - the rest is done in another thread. Also (obviously)
we want to only tokenize once and in the thread and share that moving
ahead.

                C. consider how to allow XFastParser tokenization
        selectively
                   just for the elements eg. ScXMLTableRowCellContext
        that
                   can get the maximum benefit in the short-run.

...

So, then we will somehow selectively tokenize elements and attributes
belonging to  ScXMLTableRowCellContext, so as to avoid
SvXMLNamespaceMap pieces.


        Hmm; - I think we need to tokenize them all - but lets get there first.
Lets get all of the tokens mapped to and fro; and then lets see if we
can't look at the profile, and work out how to tunnel through a few of
these contexts to use the fast-parser directly =)

        So - eg. currently we have:

SvXMLImport::startElement

        which calls CreateContext(...) - to create the handler for the next
element down the tree.

        We could have a virtual FastContext *CreateFastContext(...) method that
returned a distinct FastParser context and if it is not there we fall
back to the old / dummy methods there =) Using that we could convert the
XML tree context handlers from the leaves upwards. Which would save a
lot of bother. There is already some partial attempt to integrate the
FastParser into xmloff/ that looks unlikely to do anything at all (to
me) =) worth not getting confused / tangled up in that though possibly
worth reading that through 'git grep -5 startFastElement' inside xmloff.

For this I still have some questions in mind. Are we going to tokenize
everything from FastParser and then de-tokenize in the callback
handler (token based startElement)? That's the idea here?


        Initially - yes; it sounds mad, but then allocation is also expensive -
and there is no 'free' for integer tokens ;-)

If I've got anything wrong or you have some insights, please share it
here. :)


        HTH,

                Michael.


-- 
michael.meeks@collabora.com <><, GM Collabora Productivity
 Skype: mmeeks, Google Hangout: mejmeeks@gmail.com
 (M) +44 7795 666 147 - timezone usually UK / Europe

Context

Re: XFastParser - next steps ... · Michael Meeks
- Re: XFastParser - next steps ... · Michael Stahl
  - Re: XFastParser - next steps ... · Michael Meeks
    - (message not available)
      - (message not available)
        
        (message not available)
        
        (message not available)
        
        (message not available)
        Re: XFastParser - next steps ... · Michael Meeks
        (message not available)
        Re: XFastParser - next steps ... · Michael Meeks

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.