Nagios check_http flaps
We recently had an odd one where the Nagios check_http check, which was both checking for the presence of a string in the response and that the page loaded in a certain time frame, went from reporting a ‘CRITICAL - string not found’ to a ‘HTTP WARNING: HTTP/1.1 200 OK’. My first thought, as this was a site pending migration, was that the URL had moved to a slower machine with the fixes released to it. Alas, it’s seldom that obvious.
It turns out that somewhere in the Nagios check a slow page that exceeds the -w options threshold overrides the fact that the string is missing, even though that’s a warn replacing a crit. Bah.