Small Mosaic


Categories:

/books
/career
/codinghorrors
/events
/geekstuff
/justdont
/languages
/languages/bash
/linkshot
/magazines
/meta
/misctech
/movies
/nottech
/operatingsystems
/operatingsystems/linux
/operatingsystems/linux/debian
/operatingsystems/solaris
/perl
/presentations
/programming
/python
/ruby
/security
/security/apache
/security/tools
/serversmells
/services
/services/dns
/sites
/specifications
/sysadmin
/testing
/tools
/tools/commandline
/tools/firefox
/tools/gui
/tools/network
/tools/online
/tools/online/greasemonkey
/tools/puppet
/unixdaemon

Archives:

August 20101
July 20101
June 20104
May 20102
April 20101
March 20108
February 20101
January 20102
October 20092
September 200910
August 200910
July 20094
June 20091
April 20093
March 20097
February 20094
January 200917
Full Archives

Tue, 08 Jul 2008

More Memory Than Sense
My recent bugbear is - servers with inaccessible memory.

You go and spec a nice new server with say 8Gb of RAM (a little box), you install Debian, you start adding applications to the machine and then a couple of months later some anal sysadmin comes along, does a free -m and mutters about under-specced virtualization servers when he sees -


             total       used       free     shared    buffers     cached
Mem:          3287        225       3062          0         24        149

For those of you not paying attention - the machine isn't using over half of it's memory. So first of all how do you spot this and secondly how do you fix it?

If you're on Debian then the spotting is easy (for some hardware) - apt-get install lshw

and then run lshw -class memory | grep -A 4 '\-memory'. If the size is bigger than the total from free then you've got wasted resources.

The fix? Install the right bigmem kernel. And then recompile VMware server. Dammit.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2008/07/08 19:55 | /serversmells | Permanent link to this entry | This entry and same date


Sat, 15 Jan 2005

The Hidden Curse of High Uptime
A number of Unix/Linux people seem to pride themselves on obtaining the highest uptime they can. While this may seem like a little harmless fun, in a production environment (which are mostly fun-free places), it can hide a number of problems that will later become major issues.

At some point the machine will have to come down and face a power off or reboot, and then it's expected to come back up, and this is where the problems can start. In almost any environment, no matter how simple, and this problem gets worse as more complexity and people are involved, a number of changes will be made to the running system and given some testing time; and then they will be forgotten about and never made persistent and able to survive a reboot.

Whether it's the simple addition of a firewall rule thats never written to the config file, an unsaved routing table entry or forgetting to enable a service in rc.local, on any machine with a high up time their is a chance that something won't come up. And if it's a remote box it'll be something that stops you getting in to fix it, Murphy ensures this.

My recommendation? Pick a schedule (a month, three months, maybe once a quarter) and take the machines off line and then see what doesn't come up (you do have monitoring in place don't you?) If you have the opportunity you should combine this with your UPS testing (and you better be testing those!). If you can't afford to take a server down for testing then you've got a resilience problem and a single point of failure that needs addressing.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

Posted: 2005/01/15 15:55 | /serversmells | Permanent link to this entry | This entry and same date


Copyright © 2000-2010 Dean Wilson XML feed logo