Large File Check

I’ve had a findbig files script up on my miniprojects page for a while now, it’s not exactly a difficult script to write but it deals with a couple of less obvious cases (exclude lists) that most of the similar scripts on line don’t cater for.

While the script is something thats easily down-loaded and run, if you have anything beyond a handful of machines you need to actually think about how to incorporate it into your checks and how you should run it to get the most return from the least effort.

How not to do this is to kick the script off with a file size of something like 50MB. This’ll do nothing but raise huge numbers of false positives and start to make people both fear it’s running and numb to the results. This happens way too often.

A better way (IMHO) is to start with a large number, 2GB is good as most older Linux machines had problems with files over this size, and slowly work down; clearing each stage as you go. This way you never overload people with information.

Before making lowering the file size thresh-hold you should get have at least a single empty run, whether you remove the files, change the rotation schedule or even just add them to the ignore list. This both gives people a feeling of actually getting somewhere and ensures that you’ve not done anything odd. If changing the thresh-hold on a semi-periodic basis is too difficult then simply change tools.

TODO: Make the find big files script accept regexs of files to ignore.