Wed, 03 Jun 2009
It's been Critical for how long?
Nagios has a wonderful 'duration' column in its web interface that's
always bemused me. At what point does a check being in a warning, or
even worse, a critical state stop being a problem worthy of head space
and start being normal operating procedure?
Checks can stay in an extended broken state for many reasons but they all seem to be symptoms of a larger problem. If it's a small thing then are you getting enough time to do housekeeping? If it's a big thing do you have enough business buy in to keep things running optimally? Are you monitoring the wrong thing? Is there even anything you can do to fix it? If not then maybe Nagios isn't the best place to put the monitoring, maybe a status report is a better place.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2009/06/03 07:55 | /sysadmin | Permanent link to this entry | This entry and same date

