On 28/02/11 21:51, Michael Meeks wrote:
Why the surprise? How long has Huffman been around - 30 years? 40? And
you CANNOT do better than Huffman (there are other equally as good
algorithms, though).
Ho hum; glad to know Huffman compression is optimal ;-) (personally I'd
go for arithmetic compression with a good "guess a libreoffice" model
alogorithm to get it down to a handful of bits ;-). Notice - that ZIP is
in fact two layered compression algorithm: Lempel Ziv, and then Huffman,
it is not clear that this makes it obviously non-optimal but it is
certainly somewhat weaker in that regard.
Interesting ...
I checked on wikipedia after I posted and Huffman is actually SIXTY
years old! Set as a class project, and published in 1952.
So zip should be tricky to beat for compression if it actually uses
Huffman, although I would expect it to take ages.
Your figures imply either (a) zip doesn't do Huffman, or (b) bzip2 and
lzma use a much larger input block size.
Was the wink to say you knew it was optimal already? I thought that was
common knowledge :-) although reading wikipedia it reminded me that
there is a pathological case - lots of identical elements. Given that I
gather Windows files contain large sections of nulls, Huffman won't like
that! That's probably the main time you *do* want double-compression -
some form of RLE will suppress any pathological input into Huffman.
The main reason I suggested adding times to KAMI's table was that it
makes explicit the time/compression tradeoff.
Cheers,
Wol
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.