Short Scripts, Applications, Hacks and Code Snippets - UnixDaemon: In search of (a) life

On this page you will find a number of my shorter scripts, applications and bits of code I find helpful and that scratch my own itches. Each one is accompanied by a short explanation of the script, any dependencies, any needed configuration and a link to the code itself. Unless otherwise stated all the code on this page is GPL'd and you are free to do what you want with it.

Nagios Simple Trender - Show Aggregated Service and Host problems as horizontal barcharts. The Nagios Simple Trender parses Nagios logs and builds a horizontal barchart for host outages, service warnings and criticals. It's nothing fancy (and the results are a little unpretty) but it does make the attention seeking services and hosts very easy to find.

While the tool isn’t that technically complex I’ve found it useful in justifying my time on certain parts of the infrastructure. Being able to show how bad NTP is for example (we had 216 NTP sync problems last month, this month we had 36; and most of those are one machine with a bad clock) on a very simple chart makes it easier to get buy in from above. And next month you can show them how much of a positive impact the work had.

Posted on ‘Sun Jun 3 11:34:14 2007’ by Dean Wilson

Nagios TagClouds - Show Service and Host problems as a Tag Cloud. The Nagios monitoring systems is great for whats wrong now but sometimes you want a higher level view of what's going on. When Nagios TagCloud - which has a very descriptive name - is invoked (I typically use `nagiosclouds.pl /log/files/*.log > /webdir/nagios_tagcloud.html` from a cronjob) it'll run through the log files and produce a HTML page containing 3 tag clouds, one for host outages, one for service warnings and one for service criticals. Tag clouds don't suit everyones work style but I came away from running ours with a couple of action points so I think they're useful enough to glance at once a month.

Posted on ‘Sun Jun 3 11:11:39 2007’ by Dean Wilson

del.icio.us de.dup.er - remove duplicate links from a del.icio.us tag RSS feed. I like del.icio.us and I've been using it for a long while now, but what used to be one of the more handy features, the ability to subscribe to a tag, like 'ruby' or 'linux', has gradually become less useful as more and more people find old links or repost the same link. Again. And again. And, well, you get the idea.

So I wrote the del.icio.us de.dup.er script, a small perl cgi that sits between you and del.icio.us and weeds out any duplicate links. I don’t know how useful it’ll be for other people but I installed it and when comparing the amount of posts to those in the unfiltered tag I’m already seeing a lot less traffic. This is only the first draft (it needs a little love and a chunk of re-writing) but it works. So I though I’d post it. To run it you’ll need a webserver capable of running perl cgi script, a couple of non-core perl modules and an area on disk where it can write its state; it maintains a single state file for each tag. I considered making it run as a hosted service to remove these preqs but that was more than I need right now.

Notes: Anyone who hits the cgi can force it to update and potentially stop you seeing certain links, I get around this by putting in in a secure (HTTP Auth protected) part of my site. It’s also got a timeout built in, a defined number of days after it first logs a site (30 days by default) it’ll let it through again. And store it for another 30 days.

Posted on ‘Sun Jan 7 20:36:13 2007’ by Dean Wilson

frdns.pl - Check Forward and Reverse DNS Agree. Every host in DNS should have (at least) two parts, a forward and reverse record. Put simply, the forward record lets you resolve a name to an IP address. A reverse record resolves an IP address to a hostname. Although to be honest if you didn't know that this script isn't going to be of much use to you...

Unfortunately, as is often the case with duplicate information, at least one of the records is often neglected, leading to missing or stale data. The frdns.pl forward and reverse DNS checking script accepts a CIDR range and then polls each IP in it for a reverse DNS record. If it gets one it’ll try to forward resolve the name and compare the two results. If the forward record is missing, or the two parts don’t match, it’ll print the problem (with hopefully enough information) to allow investigation.

With the newly added -p (ping) option it can now also find active addresses that lack a reverse DNS record. It’s also got better logging and it “does the right thing” with regards to round robin DNS (multiple A records). The frdns.pl forward and reverse DNS checking script itself is GPL’d, written in Perl and should be pretty easy to follow.

Updated on ‘Sat Mar 10 22:49:56 2007’ by Dean Wilson

GoogleSets Command Line Interface. Google labs is one of the 'Nets open secrets. It's a site that gathers up some of Googles ideas for new sites and services and allows people to have a play with them. One of the services, Google Sets, has been quite useful to me recently. So I wrote the GoogleSets Command Line Interface.

The basic premise (of both the site and script) is simple, you give it a list and it tries to expand it. So if you pass in ‘Linux’, ‘HPUX’ and ‘Solaris’ it’ll give you back other operating systems. I’ve been using it in a security project to find and possibly expand links between different host names. In one case it predicted a couple of host names that were in use but not in DNS based upon five existing host names.

The GoogleSets Command Line Interface is GPL’d, does what you’d expect and will soon be turned in to a Perl module to allow more programatic access to sets.

Posted on ‘Sat Jan 28 14:08:32 2006’ by Dean Wilson

Backup RCS Directories Script. Source control is an essential part of a smart techies life. While the bigger version control systems are mostly useful to developers (SVK rocks) some of the simpler ones can often be found in the sysadmins toolkit.

A couple of companies I’ve worked for have been heavy users of RCS on their servers and while it’s made configuration safer (and easier to revert) its lack of a central repository is often an unaddressed weakness. The Backup RCS Directories Script scans a machine for any RCS directories and creates a gzipped tar archive from the results. This file can then be pulled off the machine as part of the standard backup routine.

The script itself is pretty simple, it gets a list of all the RCS directories, makes a compressed archive of them and logs the fact it’s run by adding an entry to syslog. It’s simple, free, and will hopefully save me writing it again at my next employer.

Posted on ‘Sun Dec 11 16:07:48 2005’ by Dean Wilson

Which package owns this file? Filepkg.sh is another one of those scripts borne of a personal itch. I'm spending a fair amount of time cleaning up both Redhat and Debian boxes which have custom software installed, some of it by hand and some via the package management system (we build the packages ourselves).

One of the annoyances I’ve come across while determining which files are managed and which were left by us is that while both dpkg and rpm will tell you the package that owns a file, you need to provide the full path of the file you’re asking about to get the information out. Well no more!

filepkg.sh takes a file name as an argument and tries to do a ‘which’ command on it. If this works then the full path is passed to the native package manager (filepkg.sh currently supports Redhat and Debian) and the owning package, if there is one, is returned. If filepkg.sh is called with a ‘-l’ as the first argument or ‘which’ doesn’t find a file with that name (‘which’ doesn’t deal with config files for example) then the file is passed to ‘locate’; it then looks up the file and passes it to the package manager to get a package name back.

The idea is simple, the code’s easy to read and it works how I want it so feel free to do what you want with this little chunk of GPL’d code.

Posted on ‘Fri Mar 18 00:19:36 2005’ by Dean Wilson

Display Feed Last Modified Date. I use href="http://www.sharpreader.net/" title="SharpReader Home">SharpReader as my desktop RSS (and Atom) aggregator of choice. I'm also subscribed to a reasonable number of feeds (140ish), a number of which I didn't think I'd seen posts from in a quite a long time. So I quickly wrote the last_post_date.pl script.

This short (and by no means complete) script looks through a SharpReader OPML file (which can be generated by using ‘Export’ on the file menu) and then tries to obtain and display a Last-Modified date for each feed in the file (this is gathered from the header of the same name)

I was little unsure about posting this as it’s not very complex and is only part of my checking tool chain but it might be useful to someone else (or I might need it while somewhere else :)) so I decided to post it. If you have any problems I’m afraid you’re on your own with this one.

Simple Link Information The short utility script, Simple Link Information, serves one main purpose, to extract the text used in href tags and display it without the surrounding mark-up. Like most Perl scripts it's short and relies upon existing modules for most of its functionality, in this case LWP::Simple and HTML::LinkExtractor.

You invoke it with an absolute URL and it will display the link text it finds; within single quotes to show whitespace. If you supply the ‘-l’ option then it will also print the link destinations, this allows you to look for inconsistent linking phrases using some shell magic.

Note: It will not show links that do not have text. This is a feature not a bug :)

This was first posted ‘Sat Nov 27 13:12:54 2004’ by Dean Wilson

Find Duplicate Filenames

When I take ownership of existing systems there are a number of ‘smells’, borrowing the term from the refactoring community, that I look for. One of them is a large number of files with the same name. This often points to situations where, instead of using version control, copies of whole directories are taken and then left to rot and confuse new users and admins.

The Find Duplicate Filenames script does exactly what you would expect. It scans the mounted file systems and prints a list of files and the number of times each name (with the path part stripped) was found. This allows easy sorting with ‘sort -n’ and, with a little ‘find’ and ‘locate’ magic, allows you to easily find potential trouble spots in the system.

This was first posted ‘Thu Nov 18 23:41:02 2004’ by Dean Wilson

Apache Error log to Access log date converter. As part of my daily server housekeeping I keep an eye on the Apache error logs for each of the servers I'm responsible for. If it's a quiet day I'll grep through the attempted exploits, attacks and formmail scans for any useful error messages. While attempting to track some 404's back to the corresponding access-log entries I got bored of converting the error logs date format into the default date format of the access log so I wrote a small bit of shell that I (badly) named ApacheErrorDate.sh, but without the studly caps, to do it for me.

You invoke the script on the command line with a single argument, the error log date string you want to convert. The script will then return a string in the access log format. If you want to paste the returned string directly into your editor of choice (I tested this with vim) then you can supply the -e option to have the slashes escaped to stop vim treating it as a substitution command.

This was first posted ‘Sat Oct 16 17:28:50 2004’ by Dean Wilson

restamp

This short script is a personal itch scratcher, I use Blosxom for my blog, Blosxom picks up the time stamp of the blog entry from the files modified time. While this saves me adding one by hand it stops me from tidying up a post if (when) I later spot a typo or grammatical error. If I do make an amendment then the time stamp changes and the post gets moved to the most recently added part of the page. This is annoying.

The restamp script, when invoked with an existing file as its sole argument, will store the files modification time, invoke the specified editor (vim by default) on the given file and then when the edit is over it’ll restore the time stamp. Does exactly what I want anyway :)

The script itself is tiny, it’s worth noting that the functionality of the ls command piped through awk could be substituted with some bash ‘set’ magic saving us the execution time of awk. I wrote the script this way because awk is more portable, more people understand it and the run time really doesn’t bother me.

This was first posted ‘Sun Sep 19 22:21:04 2004’ by Dean Wilson

Get PageRank Script. It's always been difficult to determine the PageRank behind a URL unless you were saddled with IE on Windows. Thanks to the release of the WWW::Google::PageRank module to CPAN you can now programmatically access the PageRank for the URL of your choice.

The getpageranks script (written in Perl) provided here allows you to pass in a file containing URL's, one to a line with whitespace and comments allowed. Each entry in the file will then be checked with Google and the PageRank will be displayed. For full usage instructions run the command with either -h or -u.

Note: If an invalid URL is given the PageRank will be returned as zero, this makes it very difficult to determine which sites are invalid and which are just unpopular. Although you may consider them equivalent ;)

This was first posted ‘Wed Sep 8 23:40:33 2004’ by Dean Wilson

Findbig Files (Unix) Script. One of the few things that grows faster than the size of disks is the ability of the users to consume their space. The findbig script scans a given Unix system, starting from the root, and reports via email any files found that are larger than the given threshold.

While running this script an annoyance became quite obvious, on certain machines the same files appear every run, the script allows you to work around this by creating a file called /etc/exclude; and files in this file will not be shown in the report. A few points of note are given below:

Be careful with locally mounted NFS drives, this script will run over those as well. This could potentially cause the same machine to be checked a number of times causing performance problems.
The locations of all required binaries are given (and checked) in the script. These may need customising for your environment.
The mktemp command may not be present on Unix machines that run an operating system other than Linux. In this case hard-code the names of the temp files and the commands that delete them on exit.

Sun Dec 18 19:09:41 2005: The findbig script has been updated slightly. On the user side the output now include the time taken to execute and scan the file system. The other change, a less visible one, is the way that the external binaries are checked. It’s a better, cleaner way but it makes very little difference to most end users.

This was first posted ‘Sun Sep 12 23:50:59 2004’ by Dean Wilson