A typo caused the big Amazon Web Services outage
Mistakes happen to the best of us but it seems just one tiny “human error” has caused embarrassment for Amazon. As you know, the Amazon Web Services went down this week and disrupted many internet services for hours. It seems the error had been a typo. On Tuesday morning, an employee doing routine maintenance mistakenly entered the wrong command while trying to take offline “a small number of servers.” What was typed in took down a “larger set of servers,” including those that support two S3 subsystems. These subsystems have the data storage service used by a number of web-based services.
With these two systems offline, it took with it many services including the ones Amazon uses to update its own status page. And with the system not having been completely restarted “for many years,” it also took “longer than expected” to bring them back up. Amazon acknowledges and apologizes for the mistake and promises it has taken safeguards to make sure this doesn’t happen again.
Source: Mashable