Wed, 07 Apr 2010
Pigz - Shortening backup times with parallel gzip
While searching for a completely different piece of software I stumbled
on to the pigz application, a
parallel implementation of gzip for modern multi-processor, multi-core
machines. As some of our backups have a gzip step to conserve
some space I decided to see if pigz could be useful in speeding them up.
Using remarkably unscientific means (I just wanted to know if it's worth further investigation) I ran a couple of sample compression runs. The machine is a quad core Dell server, the files are three copies of the same 899M SQL dump and the machine is lightly loaded (and mostly in disk IO).
####################################### # Timings for two normal gzip runs dwilson@pigztester:~/pgzip/pigz-2.1.6$ time gzip 1 2 3 real 2m43.429s user 2m39.446s sys 0m3.988s real 2m43.403s user 2m39.582s sys 0m3.808s ####################################### # Timings for three pigz runs dwilson@pigztester:~/pgzip/pigz-2.1.6$ time ./pigz 1 2 3 real 0m46.504s user 2m56.015s sys 0m4.116s real 0m46.976s user 2m55.983s sys 0m4.292s real 0m47.402s user 2m55.695s sys 0m4.256s
Quite an impressive speed up considering all I did was run a slightly different command. The post compression sizes are pretty much the same (258M when compressed by gzip and 257M with pigz) and you can gunzip a pigz'd file, and get back a file with the same md5sum.
# before compression -rw-r--r-- 1 dwilson dwilson 899M 2010-04-06 22:12 1 # post gzip compress -rw-r--r-- 1 dwilson dwilson 258M 2010-04-06 22:12 1.gz # post pigz compress -rw-r--r-- 1 dwilson dwilson 257M 2010-04-06 22:12 1.gzs
I'll need to do some more testing, and compare the systems performance to a normal run while the compression is happening, before I trust it in production but the speed ups look appealing and, as it's Mark Adler code, it looks like it might be an easy win in some of our scripts.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2010/04/07 08:00 | /tools/commandline | Permanent link to this entry | This entry and same date

