Small Mosaic


Categories:

/books
/career
/cloud
/codinghorrors
/events
/geekstuff
/justdont
/languages
/languages/bash
/linkshot
/magazines
/meta
/misctech
/movies
/nottech
/operatingsystems
/operatingsystems/linux
/operatingsystems/linux/debian
/operatingsystems/solaris
/perl
/presentations
/programming
/python
/ruby
/security
/security/apache
/security/tools
/serversmells
/services
/services/dns
/sites
/specifications
/sysadmin
/testing
/tools
/tools/commandline
/tools/firefox
/tools/gui
/tools/network
/tools/online
/tools/online/greasemonkey
/tools/puppet
/unixdaemon

Archives:

May 20131
April 20131
March 20131
February 20133
January 20135
July 20111
June 20112
May 20113
April 20112
March 20117
January 20111
Full Archives

Wed, 09 Mar 2011

Introducing NRPE Runner
It might be a sign that I spend too much time online but the quicker a system gives me feedback the more useful I find it. While I love knowing my Nagios safety net has me covered when making changes sometimes waiting for that cgi to refresh can take too long, especially if I'm taking a iterative / test driven approach to the changes I'm making. For those use cases I wrote nrpe-runner.

The way I typically use Nagios is to have the Nagios server run the checks on the remote host via the NRPE plugin. The checks to be run on the host are normally stored in a config file with each entry looking like this:


command[local_mail]=/usr/local/libexec/nagios/check_local_mail

While this allows you to run each check to confirm that it's still OK I wanted the ability to run all the commands in the file at once, which I can now do with nrpe-runner. If every thing's fine then it exits silently, to confirm that it's actually run I can summarise and even filter the checks to run:



# show everything as it's run whatever the return status
/usr/local/sbin/nrpe-runner -a
check_swap => SWAP OK - 100% free (16041 MB out of 16041 MB) |swap=16041MB;12031;9624;0;16041
... snipped ...
freemem => OK: 12% (1732M) free memory.

# show a summary
$ /usr/local/sbin/nrpe-runner -s
Ran 39 checks - OK 39. WARN 0, CRIT 0, UNKNOWN 0

# run any checks with ntp in the name (the part between [])
$ /usr/local/sbin/nrpe-runner -s -n ntp
Ran 3 checks - OK 3. WARN 0, CRIT 0, UNKNOWN 0

# run all process checks (checks the command after the '=')
$ /usr/local/sbin/nrpe-runner -s -c proc
Ran 17 checks - OK 17. WARN 0, CRIT 0, UNKNOWN 0

# show all checks named ntp
$ /usr/local/sbin/nrpe-runner -a -n ntp
ntp_skew_primary => NTP OK: Offset -0.003149271011 secs|offset=-0.003149s;5.000000;9.000000;
ntp_process => PROCS OK: 1 process with command name 'ntpd', args '-u ntp:ntp'
ntp_skew_secondary => NTP OK: Offset -0.002887368202 secs|offset=-0.002887s;5.000000;9.000000;


nrpe-runner also has the option to dump the results as json, which I'll be exploring a little further in my next couple of blog posts. While it's not exactly the same as having the checks run by nagios (the user and environment are often different) I've found that shortening the interval between running puppet or yum and seeing the nagios feedback has helped my work-flow quite a lot when making exploratory system changes - and even more when nothing should have changed but does...

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2011/03/09 19:05 | /tools/commandline | Permanent link to this entry | This entry and same date


Wed, 07 Apr 2010

Pigz - Shortening backup times with parallel gzip
While searching for a completely different piece of software I stumbled on to the pigz application, a parallel implementation of gzip for modern multi-processor, multi-core machines. As some of our backups have a gzip step to conserve some space I decided to see if pigz could be useful in speeding them up.

Using remarkably unscientific means (I just wanted to know if it's worth further investigation) I ran a couple of sample compression runs. The machine is a quad core Dell server, the files are three copies of the same 899M SQL dump and the machine is lightly loaded (and mostly in disk IO).


#######################################
# Timings for two normal gzip runs
dwilson@pigztester:~/pgzip/pigz-2.1.6$ time gzip 1 2 3

real    2m43.429s
user    2m39.446s
sys     0m3.988s

real    2m43.403s
user    2m39.582s
sys     0m3.808s

#######################################
# Timings for three pigz runs

dwilson@pigztester:~/pgzip/pigz-2.1.6$ time ./pigz 1 2 3

real    0m46.504s
user    2m56.015s
sys     0m4.116s

real    0m46.976s
user    2m55.983s
sys     0m4.292s

real    0m47.402s
user    2m55.695s
sys     0m4.256s

Quite an impressive speed up considering all I did was run a slightly different command. The post compression sizes are pretty much the same (258M when compressed by gzip and 257M with pigz) and you can gunzip a pigz'd file, and get back a file with the same md5sum.

# before compression
-rw-r--r-- 1 dwilson dwilson 899M 2010-04-06 22:12 1

# post gzip compress
-rw-r--r-- 1 dwilson dwilson 258M 2010-04-06 22:12 1.gz

# post pigz compress
-rw-r--r-- 1 dwilson dwilson 257M 2010-04-06 22:12 1.gzs

I'll need to do some more testing, and compare the systems performance to a normal run while the compression is happening, before I trust it in production but the speed ups look appealing and, as it's Mark Adler code, it looks like it might be an easy win in some of our scripts.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2010/04/07 08:00 | /tools/commandline | Permanent link to this entry | This entry and same date


Wed, 30 Sep 2009

Rake - surprisingly enjoyable
I've never really liked make files, I don't think I've ever had to write enough C to really appreciate (or just tolerate) them, so I was a little dismissive of Rake - and I was mostly wrong.

Now we're adding a new member to the systems team I've been doing a lot of thinking about our tool chain - what knowledge assumptions it makes, which parts are still more manual than I'd like and where the tool chain has gaps (this is the most annoying one for me) and rake seemed like a potential addition to encode some of that process knowledge in to a tool. I've only added little rakefiles here and there but they do make certain tasks nicer (plus I like the inline descs).

I've not yet worked out any general rules for when to use a shell script and when to use rake but if nothing else it's helping me spend some time on my ruby skills. The best rake starting points I found were Martin Fowlers rake article and the rake release notes.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2009/09/30 21:48 | /tools/commandline | Permanent link to this entry | This entry and same date


Wed, 01 Jul 2009

dstat - a window to your system
When it comes to Unix diagnostics I was raised the old fashion way, with iostat, vmstat and similar tools. However times change and tools evolve. dstat, while not as comprehensive as using all the tools one by one, provides a wide range of system performance details in an easy to use package.

While it's useful enough in its default state there is even more functionality lurking just below the surface. To see which other modules are available (but are not enabled by default) run dstat -M list. To add an extra module to the output use a command like this one: dstat -a -M topmem -M topcpu

As part of my growing use of the tool I've started to write my own little dstat plugins. I was pleasantly surprised at how easy they were to write and deploy even with my basic python skills. While the memcached plugin was a proof of concept I've not needed much I've found the process count plugin to be very handy.

dstat is becoming one of the overview tools I use when investigating performance issues and it's worthy of a place in your toolbox too.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2009/07/01 21:32 | /tools/commandline | Permanent link to this entry | This entry and same date


Mon, 09 Mar 2009

Puppet Scripts - extract-report-issues
I spent a little while digging through the default puppet log types the other day and after reading through a batch of activity logs I whipped up extract-report-issues, a script that can be run on the command line (or daily via cron) and displays a list of errors and warnings from the specified glob of hosts and log files. By default it does all hosts for the current day, we've got it running nightly so we can work through the issues each morning. It's worth noting that sometimes in the output the same failure occurs more than once. This is because puppet retries certain operations - such as retrieving a resource.

There is actually a lot of useful information in the puppet reports. To start with I've added a todo item for a script that notes persistent errors (the same issues over two or three runs) that I'll hopefully get to this month. Maybe.

If you're running puppet in production you owe it to yourself to turn on reporting and set up some processes around it. While puppet makes it easy to perform action at a distance you still need to close the loop somehow.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2009/03/09 20:57 | /tools/commandline | Permanent link to this entry | This entry and same date


Tue, 03 Feb 2009

Simple, Single Document Bookmarks in vim
I like vim, I think it's a great editor worth investing time and effort in to learning but I also think it's one of the most horrible things to watch an inexperienced user typo his way through while you're urgently waiting for them to finish the damn edit. My favourite one this week (and it's only Tuesday) is looking for probably unique phrases that you can later search for to return to a specific part of a document.

In an attempt to stop my laptop getting any more back of the head shaped dents in it from when I've failed to restrain myself I thought I should point out a much simpler way of doing this. Once you're at the part of a document you want to return to press m<letter>. This sets a mark. To return to it press '<letter>. That's it. No more pasting in chunks of a string hoping it only occurs once in the damn document. If you need to mark a couple of locations then fine just use different letters to set and return to the places you want. And save me sending another laptop back in for warranty.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2009/02/03 22:05 | /tools/commandline | Permanent link to this entry | This entry and same date


Wed, 14 Jan 2009

Soon to be With Added Git?
Despite setting up my own gitweb install I'm still not using git regularly enough to be comfortable with it so today I went through the Peepcode Press Git Internals book/PDF. While the diagrams and details of what happens under the cover are useful it's the wrong level for me as a basic user. To ease myself in to the move from subversion for some of my personal projects I found Git Magic to be more useful.

I know git requires a mental shift and it's a very complex and powerful tool but for my own needs I'll probably never use more than 10% of its capabilities. Unfortunately most of the projects I use and need to submit patches to have switched - so I'll be a happy sheep and go along for the ride. Even if it turns out to be a roller coaster.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2009/01/14 18:14 | /tools/commandline | Permanent link to this entry | This entry and same date


Tue, 06 Jan 2009

Diffing Files Over Multiple Servers - rd-differ
Adhoc changes are a very bad thing in many ways, one of the worst is how often they are not fully implemented across all the servers or even pulled back to staging. In an attempt to sanity check the config files when we have to make these little hacks I oddly-proudly present - rd-differ. A tool for diffing config files over multiple machines.

The idea is simple, you tell it the file or directory you're interested in, specify a single machine as the baseline and then specify a number of others as the machines to check against it. A sample invocation looks like this rd-differ /etc/apache2 10.10.100.111 10.10.100.112 10.10.100.113 and the output is show as a diff.

The files are rsynced down using ssh so your usual keys will work and while the normal output is that of the raw diff it's very easy to wrap the results and add other checks on top of it. The shell's not written to be very defensive (unusual for me) but the code is short enough that it's worth the compromise.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2009/01/06 18:26 | /tools/commandline | Permanent link to this entry | This entry and same date


Sat, 08 Nov 2008

Rebooting Via Proc and the magic sysreq key
You know what the best way to start the day is? I'm pretty sure that it doesn't include a production web server putting its file systems in to read only mode. When this happens most local commands don't work - init, shutdown, telnit and reboot all stop being useful and you have to resort to desperate measures... and here's the desperate measure of the day.

First, check that your system supports the magic sysreq key -


$ cat /proc/sys/kernel/sysrq
1  # nonzero is good

Now you know you have the power to destroy your system through a single incorrect character, have a look at the Redhat Sysrq command reference (you want the 'sysrq' section). We tried to make it sync the disks and reboot - your requirements may vary.


root@web02:~# echo s > /proc/sysrq-trigger
root@web02:~# echo b > /proc/sysrq-trigger

# machine reboots

As techniques go this one's a little obscure but it's very useful in the right circumstances.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/11/08 12:25 | /tools/commandline | Permanent link to this entry | This entry and same date


Sat, 23 Aug 2008

Nagios Service and Hosts stats - Graphed in Munin
We've been hitting some load issues on one of our monitoring machines recently and while it looks like the munin graph generation is the culprit we also decided to keep an eye on how many services and hosts Nagios was checking.

One of the downsides of having a very automated server deployment system is how easy it is to suddenly find yourself with an extra dozen hosts you no longer really need. While each check is quite small and quick, add up the frequent runs and multiply it by a reasonable number of servers and you can soon hit problems.

So as a first step towards keeping an eye on those numbers we now have a munin Nagios hosts plugin and a munin Nagios services plugin that show the total number of hosts and services monitored and the states those resources are in.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/08/23 14:20 | /tools/commandline | Permanent link to this entry | This entry and same date


Nagios Checks - Validate HTML and Validate Feed
As part of my ongoing attempt to stop myself from silently making mistakes (I don't so much mind the ones I notice) I've added another couple of Nagios Plugins. This time validate_feed and validate_html.

As both of these checks call out to an external, third party resource, if you use them be sure to tweak your Nagios polling interval down to a respectful level.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/08/23 14:11 | /tools/commandline | Permanent link to this entry | This entry and same date


Thu, 14 Aug 2008

Filter syslog logs with syslogslicer
While digging through a pile of syslog log files recently I needed something a little more data format aware than pure grep. So I present the first version of syslogslicer - a simple perl script that knows a little bit about the syslog log file format.


 # some example command lines
 syslogslicer -p cron -f program,message /var/log/syslog
 # print the program and message for all lines with program 'cron'

 syslogslicer -p cron -m hourly /var/log/syslog
 # all fields for all lines with program 'cron' and message 'hourly'

 syslogslicer -p cron -m hourly -s 20080810100000 -e 20080810123000 /var/log/syslog
 # all fields for all lines with program 'cron' and message 'hourly'
 # between 20080810100000 and 20080810123000

syslogslicer allows you to filter the output by matching text in the program or log message, only print certain output fields and do basic time based filtering. If you've ever wanted to see all the logs raised by postfix with the word 'database' in them between 10 and 11 am then this might be the tool for you.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/08/14 12:28 | /tools/commandline | Permanent link to this entry | This entry and same date


Nagios - Check Proxy Check
"This script retrieves a URL via a specified proxy server and alerts (using the standard Nagios conventions) if the request fails."

We're running a couple of services through a proxy server for a number of good, and to be honest a couple of not so good but mandated, reasons. The Check Proxy Check Nagios Plugin ensures that if the proxy goes down in a way that stops us pulling pages through it we know.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/08/14 09:30 | /tools/commandline | Permanent link to this entry | This entry and same date


Wed, 13 Aug 2008

Nagios Disk Check - Mountpoint or Filesystem?
If you mount filesystems under a specific mount point, and monitor them with Nagios, then be sure you understand what happens if the underlying file system goes away. With:

  
    /usr/lib/nagios/plugins/check_disk -w 15% -c 10% -p /a_mount_point
  

you'll get the value from the containing file system. In this case /. If you'd rather know that your chosen mount point has actually gone away, and that you're no longer checking what you thought you were, then add the -E option to the command. This will turn on exact path matching and catch that kind of error.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/08/13 21:54 | /tools/commandline | Permanent link to this entry | This entry and same date


Testing the 'Net isn't there with Nagios
We've recently had to deliberately disable some machines this week to ensure they can't connect out to the internet - we're building testing versions of some of our more restricted secure environments and this is one of the steps.

It was actually easier to do with IPTables than I thought (mostly because I didn't have to do it - my co-worker did) but once the work was done we needed to ensure it didn't accidently get broken so that networking was functional again. And yes that's an odd thing to type. So naturally we turned to Nagios and so, for my own memory as much as anything else, here is the check we're using:


# put this in the machines nrpe config file.

/usr/lib/nagios/plugins/negate -t 30 "/usr/lib/nagios/plugins/check_http -w 5 -c 10 -H www.google.com -u /"

In the Nagios 'Status Information' field you'll get a message that looks like this - CRITICAL - Socket timeout after 10 seconds - but the check returns the correct error code so it's all green.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/08/13 21:50 | /tools/commandline | Permanent link to this entry | This entry and same date


Tue, 12 Aug 2008

Yumdpkg-provides
I've never really felt as proficient with apt and dpkg as I did with RPM. There always seems to be another option I've never seen before. Luckily there are also big holes in my knowledge of yum to make me feel well rounded.

After reading yum options you may not know exist and spending a while puzzling out how to get the same results in Debian (apt-file seems to be the closest fit but I never got the invocation right) I decided to write dpkg-provides.

It's not packaged, doesn't have a manpage, requires the network and isn't integrated with the existing tools. At least I know how I'd get the information now - from the web. Who'd thought it?

Note: it's actually quite simple to work out which package provides a file that you've got installed locally (dpkg -S '*/df') - it's more of a pain to probe packages you don't have installed.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/08/12 15:13 | /tools/commandline | Permanent link to this entry | This entry and same date


Tue, 08 Jul 2008

Dear Lazyweb - Command Line YSlow!
The title pretty much says it all, I'd like a command line version of YSlow! (what is it with Yahoo and !s) that I can run from cron and import in to a nice spreadsheet for trending and site comparisons.

I don't have XUL on my list of things to play with so I'll give it a couple of months and watch someone else implement it. Hopefully.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/07/08 19:59 | /tools/commandline | Permanent link to this entry | This entry and same date


Mon, 25 Jun 2007

Navigating Commented Config Files
The current trend with config files is to fill them with comments (let's ignore the fact this isn't a substitute for documentation) and while this is helpful watching people arrow through them line by line looking for active options drives me nuts.

If you're using vim (as all good people do ;)) you can jump from uncommented directive to uncommented directive with /^[^#] as a search. Pressing n will then move you to the next uncommented option. And save me from pulling out those precious few hairs I have left.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2007/06/25 21:32 | /tools/commandline | Permanent link to this entry | This entry and same date


Sun, 03 Jun 2007

Nagios - Simple Trender
Continuing the release of my Nagios code - here's my Nagios Simple Trender. It parses Nagios logs and builds a horizontal barchart for host outages, service warnings and criticals. It's nothing fancy (and the results are a little unpretty) but it does make the attention seeking services and hosts very easy to find.

While the tool isn't that technically complex I've found it useful in justifying my time on certain parts of the infrastructure. Being able to show how bad NTP is for example (we had 216 NTP sync problems last month, this month we had 36; and most of those are one machine with a bad clock) on a very simple chart makes it easier to get buy in from above. And next month you can show them how much of a positive impact the work had.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2007/06/03 10:29 | /tools/commandline | Permanent link to this entry | This entry and same date


The Nagios Tag Cloud
We use the Nagios monitoring system at work (in fact we use four installs of it for physically isolated networks) and while it's damn useful (and service checks are easy to create or extend) it's a little lacking in higher level trending and visualisation tools. Well, at least the very old version we run suffers from this.

Thankfully I work for a company that invests time in its core tools. Over the last couple of hackdays I've written two small scripts for parsing Nagios logfiles and presenting the information in a different, slightly more grouped way. The first of these is the Nagios TagCloud - which has a very descriptive name :)

When invoked (I typically use nagiosclouds.pl /log/files/*.log > /webdir/nagios_tagcloud.html from a cronjob) it'll run through the log files and produce a HTML page containing 3 tag clouds, one for host outages, one for service warnings and one for service criticals. Tag clouds don't suit everyones work style but I came away from running ours with a couple of action points so I think they're useful enough to glance at once a month.

I should note the perl module that generates the tag cloud is Leon Brocards HTML::TagCloud and the CSS was graciously given to me by Alex Monney after he burned his eyes looking at my first version.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2007/06/03 10:08 | /tools/commandline | Permanent link to this entry | This entry and same date


books career cloud codinghorrors events geekstuff justdont magazines meta misctech movies nottech operatingsystems/linux operatingsystems/linux/debian operatingsystems/solaris perl programming python ruby security security/tools serversmells services/dns sites sysadmin testing tools tools/commandline tools/firefox tools/gui tools/network tools/online tools/online/greasemonkey tools/puppet unixdaemon

Copyright © 2000-2010 Dean Wilson XML feed logo