Re: Document conversion engine

Flavio Moringa <flavio.moringa -AT- caixamagica.pt>
Mon, 9 Jul 2012 00:25:56 +0100

Hi Michael,


nice to ear from someone so "up the ranks" like you.. makes me feel much
more important :-)

2012/7/6 Michael Meeks <michael.meeks@suse.com>

Hi Flavio,

On Tue, 2012-07-03 at 11:45 +0100, Flavio Moringa wrote:

my name is Flávio Moringa, I'm from Portugal and I'm starting my
Masters Dissertation next September (Master in Open Source software -
http://moss.dcti.iscte.pt ).


        Welcome :-)


Thanks

I'm not a programmer, so what I'm interested in doing is something in
the lines of investigating the main conversion problems, identifying
the possible conversion flows, analysing the way the conversion flow
is implemented in LibreOffice, and eventually trying to improve this
flow somehow.


        So - it will be hard to improve the flow without being a
programmer I'm
afraid :-)


well, although not a programmer right now I've had my fair share of perl,
python, c, bash, java, php... maybe I'm not so "fluent" in programming
right now, but I'm certainly no strange to it, and definitely not afraid to
do it if the need arises... what I meant was that I'll probably wont't be
able to do a conversion engine by myself... but I can definitely mess
around with code...

From your reply I assume that testing the filters, and doing
regression tests is something I could do, maybe identifying the main
conversion issues in groups of documents and kind of creating a "major
conversion issues" table, and prioritizing those issues. Is there
already something like that?


        There is a useful QA role in prioritising bug reports and
interoperability issues; we have a real problem with masses of bug
reports many of which could be duplicates. Having said that -
interoperability has many, many known feature / impedance mis-matches
that are non-trivial development problems to fix.

        One thing that -would- be really useful, and that Microsoft have
internally, is an analysis tool for Microsoft's XML document formats -
such that we can get a good idea of which attributes are actually used
much. ie. by analysing and comparing a large corpus of documents out
there, we can answer questions such as:

        "should we implement surface charts, or 3D doughnut charts ?"

        given whatever amount of feature-development time we have - simply
by
referring to the database of crunched XML files to work out which one is
used most.

        It'd be nice to have that for ODF as well too of course for when we
have to make zero-sum back-compatibility decisions; but for
interoperability crunching those MS documents would be really good.

        Is that something you could do ? a bit of perl, zip extraction, XML
parsing, etc. ?


Yes, it's definitely something I can do... I do believe that the harder
part is getting that " large corpus of documents out
there...". At least as my experience goes, I've found that it's hard to get
users to send us documents they use... either due to privacy questions or
enterprise policies... But a tool like that makes a lot of sense


        Developers are -much- more likely to let themselves be lead by
objective statistics on real documents out there, rather than subjective
feelings of priority - which can prove rather controversial :-)


I can certainly relate to that...


        Thanks !


For now then I'll start doing as you suggest and look in bugzilla for
documents with conversion problems to try and compile as much examples as I
can. Then maybe using the latest beta to do the conversion and see which
problems are still there. Then maybe starting a perl script that can scrap
the OOXML files to find the most used tags... and start from there...


                Michael.

--
michael.meeks@suse.com  <><, Pseudo Engineer, itinerant idiot


Thanks a lot for helping out.
Cheers

-- 
*Flávio Moringa*
Project Leader



Caixa Mágica Software
Energia Open Source
Rua Soeiro Pereira Gomes, Lote 1 - 4.º B,
Edifício Espanha, 1600-196 Lisboa - Portugal
Tel.: +351 217 921 260 Fax: +351 217 921 261
http://www.caixamagica.pt
https://twitter.com/flaviomoringa
https://www.facebook.com/flaviomoringa<https://www.facebook.com/flavio.moringa>
http://pt.linkedin.com/in/flaviomoringa
http://people.caixamagica.pt/flaviomoringa

Context

Document conversion engine · Flavio Moringa
- Re: Document conversion engine · Michael Stahl
  - Re: Document conversion engine · Flavio Moringa
    - Re: Document conversion engine · Robinson Tryon
      - Re: Document conversion engine · Flavio Moringa
- Re: Document conversion engine · Michael Meeks
  - Re: Document conversion engine · Flavio Moringa
    - Re: Document conversion engine · Michael Meeks

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.