canonicalize a .odt file

"Mike Cowlishaw" <mfc -AT- speleotrove.com>
Fri, 25 Sep 2015 20:36:08 +0100

 
I am the editor of a document [the IEEE 754-2008 standard] that was created
around 15 years ago (using OpenOffice), and has had nearly 200 drafts, a
number of editors, and countless edits. It was last changed in 2008, but is
now about to go though a new revision cycle.

I was delighted to find that LibreOffice handled the 2008 .odt file almost
perfectly, with only 7 errors (all were weird spurious empty reference tags,
of unknown provenance, that OpenOffice quietly ignored).

While identifying and removing those from the content.xml, I noticed that
there are hundreds (possibly thousands) of redundant tags. These are
typically in the context: <span whatever>text1</span><span
whatever>text2</span> where 'whatever' is identical, and either or both
'text1' or 'text2' may be empty.

It there a tool to clean these up? I could write one myself (I recently
wrote an XML parser) but if one already exists ...

Many thanks -- Mike Cowlishaw

[Apologies if this is a duplicate .. I tried it on askLibo some time ago but
it is still "awaiting moderation".]



-- 
To unsubscribe e-mail to: users+unsubscribe@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context

[libreoffice-users] page subtotals · Dave Howorth
- (message not available)
  - Re: [libreoffice-users] page subtotals · Dave Howorth
    - Re: [libreoffice-users] page subtotals · Tom Davies
      - Re: [libreoffice-users] page subtotals · Dave Howorth
        
        Re: [libreoffice-users] page subtotals · mxk
- [libreoffice-users] Re: page subtotals · Andreas Säger
  - [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Mike Cowlishaw
    - Re: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Florian Reisinger
      - Re: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Charles-H. Schulz
        
        Re: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Tom Davies
        
        RE: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Mike Cowlishaw
      - Re: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Tom Davies
        
        Re: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Tim---Kracked_P_P---webmaster
        
        RE: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Mike Cowlishaw
    - Re: [libreoffice-users] Clean up/optimize/canonicalize a .odt file · Dave Howorth
  - Re: [libreoffice-users] Re: page subtotals · Tom Davies
  - (message not available)
    - [libreoffice-users] Re: page subtotals · Andreas Säger

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.