Calculating the cost of downtime

Mon, 2009/02/23

Ask any specialist business consultant to map out a strategy for enterprise risk aversion and invariably they will start out by recommending that the business heads embark on a thorough evaluation of their organisation, identifying the major threats and more importantly, quantifying the impact of each of those threats on the business’s livelihood.

While the findings of such an analysis are invariably quite sobering and often surprising, such an exercise will consistently identify the IT fleet as a high-risk area. This is quite simply because today's organisation places a far higher reliance on their IT systems than ever before. We live in a high demand world and providing an always available service is simply par for the course. Understanding the risks and potential losses associated with being an always available service company is imperative.

“There’s no bigger threat to any organisation than not being able to deliver service, and todays businesses are inextricably linked to IT systems,” says Dick Sharod, country manager at Stratus. “If your IT systems fail and you are unable to deliver service during the downtime, the effects can be catastrophic,” he adds.

“What's even more worrying is that most organisations don't take the time to calcuate the costs of downtime,” he says. “In reality however, it's not that difficult to do.”

Sharod says that the only thing the organisation needs is an up-to-date view of their revenue streams and a simple formula.

The cost of downtime formula

Working out the cost of downtime for any organisation is relatively simple. One takes the average revenue per transaction and calculates what the revenue per minute would be, based on the organisation’s total TPM (transactions per minute figure). Once that figure has been reached, one multiplies it buy 60 to arrive at a revenue per hour figure.

To explain the formula in detail, Sharod uses the example of a hypothetical banking institution, which has 1800 ATMs, located across the country.

 “Let’s assume that the bank in question has an average revenue per transaction of one US dollar,” he says, “and that each ATM is capable of completing one transaction per minute.

“That means the average revenue over an hour-long business period, per ATM is $60; and across the entire ATM fleet is $108 000 per hour,” he continues.

“This logically means then, that if the central ATM system is down for whatever reason, every transaction the ATM fleet can not carry out is revenue lost – over the space of an hour, that’s $108 000 of direct lost revenue as a result of downtime,” he says.

“We started with one dollar and ended up with $108 000 – it’s amazing how quickly it adds up,” Sharod adds.

For the purpose of keeping this exercise simple, Sharod says one can easily leave out other forms of damage that a period of downtime brings with it, like damage to the company’s brand, the costs incurred by the company in enabling transactions to take place across its competitors’ systems and the cost of deploying service and support engineers that often need to visit field units to rectify issues that may have resulted from the system outage, or even the cost of permanently losing your customers.

“If you add these costs into the equation, the cost of downtime can rise substantially. Since they tend to be variable in nature however, a downtime cost calculation should primarily be conducted using the direct loss in revenue as a basis,” he says.

Curtailing against downtime

The most common route in avoiding downtime, especially in the case of key IT systems, is the deployment of resilient, fault-tolerant servers, which are designed to keep systems on-line despite hardware and other forms of failure.

“And depending on which solution your organisation chooses, it can expect different levels of uptime,” he says.

Sharod explains that the uptime of these kinds of servers is generally expressed in percentages points.

“We’ve all heard figures like 99.99% uptime,” he says, “which sound impressive, but in reality can be quite sobering when translated into actual days, hours and minutes.”

Sharod gives another example to bring this point home. “It’s a generally accepted and independently-backed fact that a well-managed, resilient and clustered set of server infrastructure from any one of the leading OEMs in the world yields an organisation in the region of 99.95% uptime, or 0.05% downtime,” he says.

“These figures are readily available from any of the vendors who manufacture high-availability servers in the world,” he says.

By doing a little math, this logically means that an organisation can expect that system to be down when they least expect it to be (called unplanned downtime) for 0.05% of the time or roughly 4.38 hours per year, in the case of a 24/7/365 system.

“It doesn’t sound like a great deal does it?” Sharod asks. “Well, that is of course until you plug into the cost calculation we carried out a little earlier,” he says.

“In our example, 4.38 hours of downtime in a 365-day year, translates in $473 040 – no small amount by anyone’s estimation.”

Sharod says that this is one of the reasons organisations should aim for the highest availability figure possible from the hardware running their core IT systems.

“Stratus currently boasts a blanket availability number of 99.9999% or 0.0001% downtime per year,” he says.

“By contrast, that’s roughly less than 1.5 minutes of downtime per 365-day year. In lost revenue, the hypothetical banking institution used in the example could expect to approximately lose only $2700 of their revenues per year by using Stratus' purpose built fault tolerant systems,” says Sharod.

Choosing a high-availability solution

As should now be evident, Sharod says, more goes into choosing a high-availability solution and thereby averting the risk of IT downtime than meets the eye.

“It’s imperative that organisations are aware of this,” he continues, “since snap decisions often result in organisations protecting their ‘family jewels’ with the lowest cost high-availability solution they are offered.

“After all, to the naked eye, 99.9% uptime and 99.9999% uptime don’t look all that different.

“In reality however, the savings encountered by organisations that choose a solution, such as Stratus’, that costs more in terms of plain acquisition price, but offers the higher availability, make the decision a no-brainer, Companies should look beyond the simple acquisition cost,” he says.

“Companies that therefore focus on the cost of downtime averted by a solution, rather than the plain acquisition cost of the solution are beginning to think in the direction of reducing enterprise risk.

“Enterprise risk management from an IT perspective is not about choosing the lowest-cost solution for the job – it’s about choosing the solution that’s most capable of averting risk.

“In the scenario where a solution’s initial acquisition costs are drastically less than the costs of not having that system in place, choosing that system is undoubtedly the best route to follow,” he concludes.