Date: prev next · Thread: first prev next last
2015 Archives by date, by thread · List index

I am the editor of a document [the IEEE 754-2008 standard] that was created
around 15 years ago (using OpenOffice), and has had nearly 200 drafts, a
number of editors, and countless edits. It was last changed in 2008, but is
now about to go though a new revision cycle.

I was delighted to find that LibreOffice handled the 2008 .odt file almost
perfectly, with only 7 errors (all were weird spurious empty reference tags,
of unknown provenance, that OpenOffice quietly ignored).

While identifying and removing those from the content.xml, I noticed that
there are hundreds (possibly thousands) of redundant tags. These are
typically in the context: <span whatever>text1</span><span
whatever>text2</span> where 'whatever' is identical, and either or both
'text1' or 'text2' may be empty.

It there a tool to clean these up? I could write one myself (I recently
wrote an XML parser) but if one already exists ...

Many thanks -- Mike Cowlishaw

[Apologies if this is a duplicate .. I tried it on askLibo some time ago but
it is still "awaiting moderation".]

To unsubscribe e-mail to:
Posting guidelines + more:
List archive:
All messages sent to this list will be publicly archived and cannot be deleted


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.