Re: Document conversion engine

Michael Meeks <michael.meeks -AT- suse.com>
Fri, 06 Jul 2012 20:13:30 +0100

Hi Flavio,

On Tue, 2012-07-03 at 11:45 +0100, Flavio Moringa wrote:

my name is Flávio Moringa, I'm from Portugal and I'm starting my
Masters Dissertation next September (Master in Open Source software -
http://moss.dcti.iscte.pt ).


        Welcome :-)

I'm not a programmer, so what I'm interested in doing is something in
the lines of investigating the main conversion problems, identifying
the possible conversion flows, analysing the way the conversion flow
is implemented in LibreOffice, and eventually trying to improve this
flow somehow.


        So - it will be hard to improve the flow without being a programmer I'm
afraid :-)

From your reply I assume that testing the filters, and doing
regression tests is something I could do, maybe identifying the main
conversion issues in groups of documents and kind of creating a "major
conversion issues" table, and prioritizing those issues. Is there
already something like that?


        There is a useful QA role in prioritising bug reports and
interoperability issues; we have a real problem with masses of bug
reports many of which could be duplicates. Having said that -
interoperability has many, many known feature / impedance mis-matches
that are non-trivial development problems to fix.

        One thing that -would- be really useful, and that Microsoft have
internally, is an analysis tool for Microsoft's XML document formats -
such that we can get a good idea of which attributes are actually used
much. ie. by analysing and comparing a large corpus of documents out
there, we can answer questions such as:

        "should we implement surface charts, or 3D doughnut charts ?"

        given whatever amount of feature-development time we have - simply by
referring to the database of crunched XML files to work out which one is
used most.

        It'd be nice to have that for ODF as well too of course for when we
have to make zero-sum back-compatibility decisions; but for
interoperability crunching those MS documents would be really good.

        Is that something you could do ? a bit of perl, zip extraction, XML
parsing, etc. ?

        Developers are -much- more likely to let themselves be lead by
objective statistics on real documents out there, rather than subjective
feelings of priority - which can prove rather controversial :-)

        Thanks !

                Michael.

-- 
michael.meeks@suse.com  <><, Pseudo Engineer, itinerant idiot

Context

Document conversion engine · Flavio Moringa
- Re: Document conversion engine · Michael Stahl
  - Re: Document conversion engine · Flavio Moringa
    - Re: Document conversion engine · Robinson Tryon
      - Re: Document conversion engine · Flavio Moringa
- Re: Document conversion engine · Michael Meeks
  - Re: Document conversion engine · Flavio Moringa
    - Re: Document conversion engine · Michael Meeks

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.