Not that I have any sympathy for Amazon Web Services (I don’t) but it appears to me that they’re going to keep making headlines with sites being blown offline until someone sits down and explains that just because you move your workload into Amazon’s cloud doesn’t mean you get to walk away from DR planning.
Maybe it has to do with cost, people seeing downtime as an acceptable trade off for race to the bottom pricing, maybe it has to do with the fact that a lot of AWS users are software developers and not infrastructure people to whom these thoughts occur naturally and who dream in Visio diagrams & not in code but regardless, it wasn’t like AWS just collapsed. It didn’t. Just one location out of a few did.
If you can’t answer the question “What happens to my workload after what it’s running on at a moment in time goes away for any reason?” then you’re doing a half assed job or have made the decision that it doesn’t matter if it goes away so long as it comes back in a window of time you can live with.
The answer to the question should be either “As this this and this detects the event this this and this mitigates it and we keep running” or it’s “We’re offline until everything comes back.”
I’ve said this already but when you hear about three (10.1 minutes a week), four (1.01 minutes a week ), five (6.05 seconds a week) and six (0.605 seconds a week) nines availability those aren’t things which come free the moment your cloud provider swipes your credit card.
Each nine cost money and the more of them you want the more you’ll pay in your time and your Disaster Recovery planning and engineering effort.
