Ecommerce Downtime and how to avoid it

I often get asked how much you should expect to pay to host, for example, a Magento Website. As with many similar questions the answer is “it depends”…. A range of factors will drive your decision in terms of the appropriate environment for your Magento build including traffic volume, integration with external systems, volume of products and number of concurrent users you expect to support on the Magento Admin System to name a few.

Although it’s difficult to put a figure on this I would say that the old adage “you get what you pay for” is often very true. Although what you need will vary this is often an area where companies look to save money, which I believe can be a big mistake.

Two major considerations which are often not given sufficient focus when selecting a provider in this area are uptime and performance. Performance is often given a great deal of focus by business owners, however I would argue uptime is just as important, this is why.

Uptime Guarantees

For many years I have been creating service level agreements stating our targeted uptime for client sites. On many occasions I have drawn up agreements which offer uptime guarantees of anywhere between 99.5% and 99.9%. These metrics sound great at face value and when most companies hear their service provider is agreeing to provide uptime of more than 99.5% they think “problem solved as long as they are hitting that” and move on.

Those companies would be well advised to look at these guarantees in more detail, both in term of how the downtime is measured and what the effect of the downtime on revenue will be.
Lets take for example a medium ecommerce business turning over £3 Million online (average daily revenue £8219) via their website. The following table shows the downtime and associated lost revenue with this downtime (assuming the downtime occurs during a period of average sales).

Percentage Uptime Downtime hours per year Direct revenue loss
99.9% 8.76 £3,000
99.8 17.52 £6,000
99.7 26.28 £9,000
99.6 35.04 £12,000
99.5 43.8 £15,000

This should make those who feel comforted by the fact that they have been offered 99.7% uptime on their website take notice and take this metric as seriously as site performance.
Clearly the actual lost revenue is likely to be higher, particularly in cases where downtime relates to inadequate resources on the server as downtime is more likely to occur when the server experiences high traffic.
Another important note is that downtime not only causes lost revenue, it also reduces your Google ranking and damage customer loyalty.

Want to work towards 100% uptime? Here’s what to do

On server monitoring


There are plenty of hosted services and open source installable products to choose from such as Zabbix or Copper Egg.
I particularly like Copper Egg and the hosted model for monitoring, this requires less maintenance on your server and is very powerful and fairly simple to set up. These monitoring tools provide a whole range of metrics such:
– Disk space
– Memory Usage
– File system access
– Metrics relating to database activity.
You can also set up customer performance monitors to keep a check on metrics such as the responsiveness of API’s.

External monitoring

External monitoring services such as the popular Pingdom service are an essential tool and are often used to calculate uptime stated on this blog post.

When setting up external monitoring some things to consider:

– Don’t just use a ping check, use semantic monitoring. This means setting up the monitoring system to actually load site pages and check specific content is present. If you just do a ping check the site could be registered as up but be delivering a blank page instead of the home page.
– Consider Real User Monitoring.
– Something else I have done in the past is to set up Selenium checks which complete key processes such as placing an order and schedule these processes to run on regular intervals. This is worth it if you have time as remember loading the home page every 1 minute tells you nothing about whether the checkout is online and payment gateway is operating (anyone who monitors payment gateway services and the banks 3D Secure services will tell you these services regularly have downtime issues)

Cross reference log files, external monitoring and internal monitoring

This is really the key in tuning your hosting environment and moving forwards toward 100% uptime. When you experience downtime cross reference the error logs, site traffic logs and on server monitoring to establish the cause of the downtime.
This will allow you to make changes to avoid recurrence.

Best practice in deployment of changes and site updates

A great deal of downtime is caused by developers deploying changes which have not been properly tested or deploying changes too frequently.
Ensure your developer has good practice in terms of setting up a development and staging environment.

Staff training

Providing staff with administrator access to a large hosted application such as Magento poses a considerable risk to a business. Staff are able to install plugins if they wish, turn on and off caches, back up databases, upload corrupted data etc.
Staff training in these areas is an important part to play in avoiding site downtime.