Rackspace Downtime: A Reminder That All Are Vulnerable

rackspace37signals went down for about two hours tonight (twice), due to a cooling issue in one of the Rackspace data centers. A quick blog search turned up other applications, such as Syncplicity, that were also hosted at the Rackspace datacenter and were also down. Rackspace seem to be jinxed since being named the “most reliable” hosting service by NetCraft back in September of 2007.

They had a major outage only two months later in November, due to a car accident that killed power to the data center, and experienced several outages in January of this year – which also affected 37Signals and other large sites such as Tumblr. After that January incident, Tumblr switched to another provider, even after Rackspace offered an 80% discount. In response to the outages today, Rackspace said:

July 9th 2008 — At approximately 5:00 P.M. CDT, our DFW data center experienced a brief loss of utility power that caused a transfer over to generator as designed. During this transition, some of the chillers did not start up correctly and temperatures within certain areas of the DC began to rise to the 90’s. The issue has been isolated and temperatures are decreasing steadily. We are investigating this issue and will provide future updates as they come available.

We apologize for any inconvenience this may have caused you or your customers.

Rackspace has a solid reputation when it comes to uptime and efficiency – Fabio Torlini from Rackspace, ironically said in a blog post about the recent Facebook outages – “You wouldn’t expect any downtime from someone who makes their living from a website, during one of their prime times.”

As Rackspace build towards an eventual IPO after surviving a very harsh post-bubble period, the downtime that they face serve as a reminder of just how vulnerable even the top hosting companies are. In the past week we have seen problems at Rackspace, Google Docs, Facebook and many more. While there was also a time when internet connectivity to the home wasn’t reliable, with the increasing reliance on applications in the cloud, solving the reliability issue on that end is paramount.