Pigz - Shortening backup times with parallel gzip

While searching for a completely different piece of software I stumbled on to the pigz application, a parallel implementation of gzip for modern multi-processor, multi-core machines. As some of our backups have a gzip step to conserve some space I decided to see if pigz could be useful in speeding them up.

Using remarkably unscientific means (I just wanted to know if it’s worth further investigation) I ran a couple of sample compression runs. The machine is a quad core Dell server, the files are three copies of the same 899M SQL dump and the machine is lightly loaded (and mostly in disk IO).


#######################################
# Timings for two normal gzip runs
dwilson@pigztester:~/pgzip/pigz-2.1.6$ time gzip 1 2 3

real    2m43.429s
user    2m39.446s
sys     0m3.988s

real    2m43.403s
user    2m39.582s
sys     0m3.808s

#######################################
# Timings for three pigz runs

dwilson@pigztester:~/pgzip/pigz-2.1.6$ time ./pigz 1 2 3

real    0m46.504s
user    2m56.015s
sys     0m4.116s

real    0m46.976s
user    2m55.983s
sys     0m4.292s

real    0m47.402s
user    2m55.695s
sys     0m4.256s

Quite an impressive speed up considering all I did was run a slightly different command. The post compression sizes are pretty much the same (258M when compressed by gzip and 257M with pigz) and you can gunzip a pigz’d file, and get back a file with the same md5sum.

# before compression
-rw-r--r-- 1 dwilson dwilson 899M 2010-04-06 22:12 1

# post gzip compress
-rw-r--r-- 1 dwilson dwilson 258M 2010-04-06 22:12 1.gz

# post pigz compress
-rw-r--r-- 1 dwilson dwilson 257M 2010-04-06 22:12 1.gzs

I’ll need to do some more testing, and compare the systems performance to a normal run while the compression is happening, before I trust it in production but the speed ups look appealing and, as it’s Mark Adler code, it looks like it might be an easy win in some of our scripts.