In 1990, something incredible happened. AT&T, the most lauded and reliable phone system, is "dead." It started with a piece of bad code that would cause a switchover if its 114 switching centers failed. When the switch returns to normal, it sends a shutdown signal to all other central switches. All centers collapsed. This was devastating for many AT&T business customers, including American Airlines (200,000 lost reservation calls) and CBS were unable to reach any of their bureaus.
As the company investigated the issue, it realized its own internal redundancy had failed because it wasn't insulated from the new main system. In other words, the industry mailing list also crashed. Of course, we've come a long way since then, but disasters come in many forms (failures, cybercrime, data center equipment failures, power outages, etc.) and no business is immune. In 2018, Microsoft, Google Cloud, Slack, Visa, and many others were down due to failures.
The reasons behind these failures are human error, equipment failure, code errors, and improper load balancing of cloud infrastructure. Fortunately, none of these rages lasted more than a few hours. As all companies have a comprehensive disaster recovery plan that includes processes and procedures on how to ensure business continuity during a disaster. In fact, 47% of enterprises and 38% of small companies now use disaster recovery in at least half of their production infrastructure.