Date: prev next · Thread: first prev next last
2014 Archives by date, by thread · List index

2014/1/10 Tom Davies <>

I tried opening with GEdit (= a lot like Notepad) but the 1st 2
letters were not PK and then i checked a different Odt that someone
else sent me earlier and that did start with PK.  I'm not convinced
about the whole PK thing but it's interesting

​As we all know, ODT are ZIP files. "PK​" happen to be found at the
beginning of almost all ZIP files.
In this document
( VI.A.)
we can see that every file start with the following header: 0x04034b50,
translated (in correct endianness) to "PK..". So every compressed files in
a ZIP file start with PK.

This also mean that if a file is somewhat corrupted, looking for this
signature and checking that the following bits make a correct header allows
one to recover files. For example, if you find the sequence 504b0304
followed 22 bytes later with a 2 bytes integer, 2 more bytes, then a
filename, you can recover it.

As we know, ODT are made of multiple files, some more important than other.
Losing the manifest for example is not a big issue, so we can recover some
ODT files with this knowledge: identifying files in the ZIP structure, then
checking that we have the "important" parts.

I didn't yet try finding some tool for fixing zip files.  it might be
worth testing on a copy of a couple of files.  There might be an odt
fixing tool around the internet somewhere too.

I don't know if such tool exist for ODF, but it might be worth making one
based on my previous rant. In the case of minor corruption (which was *not*
the case from OP here), retrieving the document content and possibly losing
stuff like statusbar toolbar settings, document thumbnail, or the initial
mimetype info (representing 77bytes at the beginning of the file, enough
for it to get corrupted!) is probably an acceptable tradeoff. Even losing
some content (like pictures) might be better than losing the whole text.

To unsubscribe e-mail to:
Posting guidelines + more:
List archive:
All messages sent to this list will be publicly archived and cannot be deleted


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.