Hi Adam, On Thu, Jun 27, 2013 at 06:18:09PM +0300, Adam Fyne <Adam.Fyne@cloudon.com> wrote:
I didn't post this on the IRC because it is too long and too specific, and I feel it will be lost there…
Sure, for some kind of discussions the mailing list is a better place.
I want to fix a bug with import \ export of a 'Paragraph Tab'. I've attached a really simple DOCX with such a paragraph tab. The XML node is 'w:ptab' inside a 'run' node.
I see. Indeed, looks like this is not imported (correctly).
When it goes through Writer – it is transformed to a simple tab. I would like to fix this so that the 'ptab' is: 1. Import 'ptab' from DOCX 2. Store the 'ptab' attributes in the Writer's core 3. Render correctly on the screen (2nd run will be aligned to the right) 4. Export 'ptab' back to DOCX
Hmm, this sounds like a new feature -- doing that would be great, but I would suggest to finish your previous feature first (the character shading one), where the ODF filters are not yet updated.
After doing some digging, I found this in 'model.xml': 22530 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22530> <resource *name=*"CT_PTab" *resource=*"Stream" *tag=*"paragraph"> 22531 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22531> <attribute *name=*"alignment" *tokenid=*"ooxml:CT_PTab_alignment"/> 22532 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22532> <attribute *name=*"relativeTo" *tokenid=*"ooxml:CT_PTab_relativeTo"/> 22533 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22533> <attribute *name=*"leader" *tokenid=*"ooxml:CT_PTab_leader"/> 22534 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22534> <action *name=*"end" *action=*"tab"/> 22535 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22535> </resource> And also found this: 22574 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22574> <resource *name=*"CT_Tab" *resource=*"Stream" *tag=*"content"> 22575 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22575> <action *name=*"end" *action=*"tab"/> 22576 <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22576> </resource> I have a few questions: 1. Shouldn't "CT_PTab" call "ptab" instead of "tab"?
That's right, except that writerfilter::ooxml::OOXMLFastContextHandler has a tab() method, but no ptab() method, that will be one thing you need to implement first.
2. What is the meaning of the 'tag' attribute of the 'resource' node?
As far as I know, the <action .. action="name"/> is always a method call.
3. The way information is stored in 'model.xml' is so confusing.
You're not alone, writerfilter/documentation/ooxml/model.xml is what we found out so far, feel free to extend that if you manage to decode some more detail. In short, whenever you add support for new XML tags, you typically need to extend the file at two places: - the new tag is a child of some existing tag, so extend the parent's definition - you also need to add a matching <resource> tag in model.xml Once those two definitions match, you get new tokens in dmapper. 363dafefad14411a16f6ea9d2ee0d55b67bc9c8d is hopefully a good example. (Though your case is easier, as you add a new token in an existing namespace.)
Some of the info is stored like this (resource + attributes + action), some are stored as 'define' + 'attribute' + 'ref', some are stored as 'resource' + 'value's. This is more of a general question, but – what is the difference between these nodes?
First probably it makes sense to see how RELAX NG works, e.g. have a look at the RELAX NG definition of the ODF format. ref/define is just a way to avoid copy&paste, you define something first, then you can refer to it (by name, using "ref") multiple times. If I'm not mistaken, the only non-RELAX NG tag you need in model.xml is the <resource> one, as explained above.
From the code – I understood that 'action' calls a function in "OOXMLFastContextHandler". When do we need such actions? Why is this done on some nodes and on other nodes (like 'run', 'paragraph', 'brush' etc) not done? So – say I need to add a new function called 'ptab' to 'OOXMLFastContextHandler' – Do I simply copy the logic of 'tab()' ?
I think it's all about where do you want to handle the input. Normally, the tokenizer just generates these tokens, and dmapper does the mapping. However, in case of tabs, other (RTF, WW8) formats handle the tab as a normal character, so in case of DOCX, an action is used, that converts the OOXML tokens to a simple character, so in dmapper you always get a tab character. So actions are used to generate these "fake tokens". Other example: w:hyperlink is also handled in the tokenizer, and it generates a HYPERLINK field from it, and dmapper handles only that.
What does the 'utext' function do?
Apart from logging, see writerfilter::dmapper::DomainMapper::lcl_utext(). That's where dmapper recieves all the unicode text input.
Where do I parse the attributes themselves of the 'ptab'?
If you handle ptab as a normal element in model.xml, you'll have the usual way to get all its attributes. I would recommend going that way, as ptab is not a character (tab is), but an element with attributes.
So I hope after I read your advice from this email – I will implement the 'DOCX importer' for the 'ptab'. Should I then create a *new* core object for the 'Paragraph Tab' or should I add it as properties to some existing object of the core?
I would check how existing similar features are implement, and do something similar. Normal tabs are not a good example, as those are stored as a \t character inside SwTxtNode, but page break may be a good example.
This email is too long, so I won't burden you now with 'rendering' and 'exporter' questions…
Sure, so -- as usual, the first step would be to design how the document model should store these paragraph tabs, then either do the UNO API or some UI, so you can test it. Then you can continue with filters and layout, etc. Hope this helps, Miklos
Attachment:
Paragraph Tab.docx
Description: MS-Word 2007 document
Attachment:
signature.asc
Description: Digital signature