Hi Wols,
On Mon, 2011-02-28 at 18:04 +0000, Wols Lists wrote:
So - personally, I find it amazing that ZIP is better than LZMA - ever,
it is such an older compression algorithm, and this flies in the face of
everything I've seen in practise with it [ eg. we use LZMA for our RPMs
in openSUSE these days AFAIR, better than bzip2, which in turn was
better than .tgz (essentially zip) ].
Why the surprise? How long has Huffman been around - 30 years? 40? And
you CANNOT do better than Huffman (there are other equally as good
algorithms, though).
Ho hum; glad to know Huffman compression is optimal ;-) (personally I'd
go for arithmetic compression with a good "guess a libreoffice" model
alogorithm to get it down to a handful of bits ;-). Notice - that ZIP is
in fact two layered compression algorithm: Lempel Ziv, and then Huffman,
it is not clear that this makes it obviously non-optimal but it is
certainly somewhat weaker in that regard.
#!/usr/bin/perl -w
for (my $i = 0; $i < 10000; $i++) { print "$i\n"; }
type size
plain 48k
zip 23k
bzip2 11k
lzma 3k
That is not a very representative input (of course ;-), but perhaps not
so far away from the rampant duplication we face. I would (personally)
expect to see lzma do better than plain deflate. Of course, you can
easily imagine a compression algorithm, that turns the file into just
those two lines of perl above ;-)
Having tried to back up a hard disk over ethernet (dd if=/dev/sda
of=//samba/archive.hdi), it was direly slow over a 10Mb card. So I tried
to pipe it through bzip2 to speed up the transfer - it was *even*
*slower* !
bzip2 is not fast; that is certainly so, OTOH it was far better space
wise than the deflate it replaced; this is why much source code is still
packaged as .bz2 - if you want to bend your mind, read up on the
Burrows-Wheeler transform [ FWIW ;-].
Why do you think LZMA is better than zip - because it's faster?
More efficient. Slowness of compression we can cope with easily, as
long as decompression is fast: and for most algorithms this balance is
easyish to achieve.
(I think we need to add compression time to KAMI's nice table :-)
As long as it is within five or so minutes of CPU time on a fast
machine, and/or preferably is multi-thread-able [ though our choice of
tools and platforms strongly limits what we can use at the cab level
anyway ] - compression time is fairly irrelevant.
ATB,
Michael.
--
michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot
Context
Privacy Policy |
Impressum (Legal Info) |
Copyright information: Unless otherwise specified, all text and images
on this website are licensed under the
Creative Commons Attribution-Share Alike 3.0 License.
This does not include the source code of LibreOffice, which is
licensed under the Mozilla Public License (
MPLv2).
"LibreOffice" and "The Document Foundation" are
registered trademarks of their corresponding registered owners or are
in actual use as trademarks in one or more countries. Their respective
logos and icons are also subject to international copyright laws. Use
thereof is explained in our
trademark policy.