Non-intuitive downtime and possibly not lost sales

One of the things you’ll often read in web operation books is the idea that while you’re experiencing downtime your customers are fleeing in droves and taking their orders to your competitors out of frustration. However this isn’t always the truism that people take it for.

If your outages are rare, and your site is normally performant and easy to use (or has a monopoly), you’ll find this behaviour a lot less common than you’ve been told. Most people have a small set of sites they are comfortable using and have gradually built up trust and an order history with. This is especially true if you operate in certain niches, such as being the fashion site, or have a very strongly defined brand.

After a period of a few months of short but recurring outages we went back over our traffic logs and ran some queries to see how badly we’d been impacted and help us create our business case for more resources. The results were a little surprising for the more ‘conventional wisdom’ trusting members of the team.

Expected behaviour

Instead of seeing a reverse hockey stick graph of our customers deserting us in our hours of need before stabilising at a lower than before constant we saw that while orders did drop off during production outages, as you’d expect from a dead system, as long as recovery times stayed in the range of minutes, and very rarely a small number of hours, we always saw the daily order volume and sales totals bounce back to within a few percentages points of a normal day. In some cases we even saw brief periods of higher than usual levels as everyone finished their pending transactions as soon as we returned.

Actual behaviour

After witnessing this we had a few discussions and made some minor changes while waiting for the larger issues to be resolved. For example one aspect to consider is that if you can architect your failures to help users preserve even some of their effort you heavily increase the odds of them finishing. Keeping services like baskets and wishlists active make it increasingly likely they’ll return to complete their transaction with you. Once they’ve gone to the effort of finding their newest ‘must have’ you have a small amount of grace points to spend while you’re getting everything back to normal before they’ll discard their own time investment and move on.

It seems that as an industry we’ve managed to train our users to accept small amounts of failure, especially if your customers favour mobile devices on cellular networks. While i don’t want to try and convince you that downtime has no impact I do think it’s worth going over the numbers after your incidents to see what the slightly longer term impact was and how far away from a normal day your recovery curve gets you.

I should also note that this doesn’t cover security issues. Those have very different knock on effects and are typically orders of magnitude worse.