Control vs. Delegation
A couple of months ago, I fell in love with the Platform as a Service (PaaS) Heroku for a new web application that we are building. Compared to managing a farm of EC2 instances and other Amazon Web Services (AWS) products, Heroku seemed like a dream. Set up is ridiculously easy and a simple Git push command deploys your application. You don't need to think through things like failover and escalation procedures. It is just supposed to work. Then I read this article about Heroku on the Rap Genius blog and all of the sudden the dream started to sound too good to be true. This article on The Virtualization Practice nicely summarizes the different perspectives between Rap Genius, Heroku, and New Relic. To summarize even further, Heroku and New Relic were not transparent about the performance of the entire system and this prevented Rap Genius from being able to cost-effectively scale their application.
In my case, I stuck with Heroku for a while and may have continued with it but I ran into an issue where I could not easily install a Python library and that hassle, combined with the seeds of doubt already sowed, sent me back to hosting directly on AWS. I still think that Heroku offers tremendous practicality and value to many different types of applications. But like with any choice, there are trade-offs. The big one with a PaaS like Heroku seems to be Control vs. Delegation. You can't really have both. If you want complete control over something, there is no way to avoid responsibility for it. With Heroku, if there is an incident or the site is slow, I would just have to say "Heroku is having a bad day," and get on with my own good day. I wouldn't be frantically trying to resolve the problem because my lack of control would render me powerless. Of course, that wouldn't protect me from blame. Blame would be applied retroactively for making the decision to host on Heroku in the first place (back when I had control).
With AWS, I have a little more control. I can design an architecture that spans different availability zones and regions. I can even create a mirror site on another hosting provider. But that means more effort and money. If I wanted even more control, I could host the sites on our own servers in some colocation facility. But every step towards full control doesn't necessarily reduce risk. It just shifts it. When I have more control, the sources of risk are my own failures in design and planning. And in my case, I just don't trust my knowledge of network and server administration to be better than the specialists. Like with any decision, you need to find a balance: you need enough control to fulfill your responsibilities but not so much that you exceed your capability to handle it.