Monday, July 2, 2012

Cloud Blues

I have a site that was affected by the recent AWS outage that took out Netflix, Instagram, and Pinterest on Friday. I thought I was in the clear when I checked on Friday. But I when I checked again on Sunday evening, my site was down. The problem was that the database didn't come back correctly. Last night I went through the restore procedure and the restore tool has been stuck in process all night. After buying a support contract, I learned that it hangs when the backup is corrupt. All of the support documentation says to just wait it out. I am still working with support to get a good backup.

This is very frustrating. The only comfort is that sites much bigger and more important than mine have also been taken down. The news about those big sites makes it easier to explain the issue to my users. It definitely beats having to say that I did something stupid.