On Thu, May 19, 2011 at 12:52:41PM +0200, Cedric Bosdonnat <cedric.bosdonnat.ooo@free.fr> wrote:
As you'll work on the tokenizer, I think it would be nice to introduce some kind of tokens dumper replacing the dmapper that would dump what goes in the dmapper. That would possibly provide some way to isolate whether the import problem comes from the tokenizer (specific to each format) or the domain mapper (that would impact all handled formats).
Yes, that makes sense.
You would then have a much more reliable way to test that your tokenizer is working... but that wouldn't help testing the domain mapper. To test that one, I think that mostly conversions like those you are explaining are helping.
OK.
(I already heard of the xml dumper for the rendered layout, is there something similar for the internal document model?)Yes, the ODF is a pretty good representation of the internals... though we could surely implement something nearer from the actual data structures. Let me know if it would be of any use to create such a dumper... I'm sure we could come pretty quickly to something useful.
Fine, I'll use ODF for now, then if it turns out to be too much trouble, we can still work on a dumper. Other question: writerfilter seems to use a lot of XSL to extract required data from the spec, we agreed that this is a problem as XSL is hard to maintain. Now if I follow this way, RTF would introduce another bunch of XSL. :) So, what could be a solution here? Possible ideas from me: - even with its problems, we have nothing better, introducing new XSL code for RTF is not the best, but let's live with it. (the conservative one) - write C++ code to do the transformations build-time (the "i don't know any scripting languages" one) - use perl or Python to do the transformations (my perl-fu is weak, but it's doable; I would vote for Python, but not sure about reusing our internal python in the build system is a problem or not) Thanks.
Attachment:
pgpDlEjifhlGs.pgp
Description: PGP signature