We recently attended the re:Invent 2018 AWS conference in Las Vegas. One of the key phrases our team encountered quite frequently was the refrain ‘we’re going all in with AWS’. Since we’re now in Australian summer – these brought images of AWS customers dipping their toes into the AWS pool ( should that be a data lake? ) – before committing for that big dive in.
Not us, not our style.
We went ‘all-in’ with AWS in 2012. I’m guessing it’s soon going to be fashionable to make that claim – but back in those days we were regularly laughed at for daring to suggest that AWS and Energetiq could do what my very large corporate clients couldn’t. You know, reliably host a high performance mission critical system.
And so we did. We took back our software. We rented some time and space in Sydney with AWS, put up some walls ( that’s a VPC ) and got our old school client through the hosting process. Side note – favourite part of the security questioning ‘In regards to the hosting facility, can you describe the placement of the bolts that attach the air conditioning units to said hosting facility. Are they on the inside or outside?’ No really.
We were all in with AWS.
Which may have been a little bit risky, some might say. Were we innovating? We were bleeding edge or leading edge or just quietly prescient? Or just plain lucky?
Because the truth is, running on the AWS platform has been spectacular. Amazing. Better than even we could have hoped for. Find yourself a thesaurus, and bring out all the synonyms for ‘bloody great’ , because that’s how good it’s been. In fact, my pet phrase for how great AWS has been is that AWS is a “tech miracle.” And I use that phrase because the capabilities delivered are indeed miraculous ( Multi-AZ RDS is phenomenal ) and the service delivery is faultless. The stuff works – pretty much all the time.
So you can imagine the pain we felt recently when we made a shocking discovering about AWS.
Hope you’re sitting down -turns out they’re human.
Turns out that sometimes, just sometimes their code has bugs. You know – just like my code does. And yours.
Some details are worth recounting here. We had a DMS service running smoothly for about a year. It was important to us and our client. We broke it – our fault. We recovered it ( takes about 24 hours to recover ) – and it got ‘stuck’. We have some monster tables, so we’ve seen this before. Usually turning it on/off works a treat. Not this time. Tried again, bearing in mind it takes 24 hours for each experiment. Didn’t work. Called AWS, who gave us some ideas to try. Wait 24 hours. Nup. Call, try , wait. Do this a few times and we’re down for over a week. ( side note : our client was being super nice to us, no yelling, not yet )
Eventually, we realise we have 10 of these services still working, one that it not. So it’s not all DMS processes. Just this one. We work through the differences – and realise that the broken DMS process is actually the only one running DMS 3.1.2. All the others are running 2.4.3. Lightbulb moment followed, we downgrade the DMS engine and boom, we are back.
And a few weeks after that, we have our confirmation from AWS – it was indeed a bug in that version of the DMS engine. You needed a very specific set of source database, target database and type of data – yep, that’s what we had.
So, turns out AWS is human after all.
Was it that shocking? No, not really ( and yep, that headline was a bit of clickbait. Sorry, not sorry ). For me it reinforces just how great their service has been over the last 6 years. The fact that this is so memorable is a testament not to one defect, but to how accustomed we are to a life, to a service without defect. And that’s pretty bloody fantastic.
So thanks once again to AWS for your impressive service ( and thanks also for letting us know that you’re human, just like us. I feel so much better about my own defects now! )