Something happened today that normally we wouldn’t be proud to advertise. One of our hosted sites slowed to a crawl. To make matters worse, the site was loading so slow that the customer called and asked if there was something happening.
Yeah, I know, not the greatest PR angle to start a story with. The good part is what happened next. After all problems will always happen. It is how fast we can react to problems and prevent problems from happening again that makes the difference.
Within a few minutes I was able to connect to the server and see that the CPU was being overrun. I was able to stop the server, took a complete backup, brought the server back online, found out there were some pending updates waiting to be applied which I ran and confirmed the site was running smooth again all under 10 minutes.
I checked in with the customer who was happy, then I skied down the mountain.
You see I wasn’t in our data center, because we don’t have one. I was able to get the site back and running while I was on the side of a mountain in Utah, not in our home office in Rochester. Our data center is this little company called Amazon as we use Amazon Web Services to deploy and manage most of our web sites and applications.
This particular website is a small company who is running off of a single EC2 instance located in the data center in Virginia. There is a plethora of tools at my disposal to do all of the work I did today, automatically. This is not a high availability site with an expected large amount of traffic, so we scale the resources appropriately to reflect the needs of the customer.
In this case I have since added an alarm to the site using CloudWatch so I could get notified if it happened again before the customer notices. Then I can add detailed monitoring services to the server to get details of what is happening if this scenario returns.
For larger demand sites I can even add load balancing to the site to split up the data between multiple servers and even different regional sites. If this was a site that went through drastic changes in demand needs, I could use an Auto Scaling option to add resources when needed, only paying for the usage of what was running.
For now, I am grateful to be able to support our customers leveraging cloud based hosting, while I am above the clouds.