How Can Businesses Achieve Resilience Amid IT Outages?

February 7, 2025
How Can Businesses Achieve Resilience Amid IT Outages?

The recent and profound CrowdStrike outage in 2024 highlighted significant vulnerabilities in modern IT systems and underscored the critical importance of business continuity planning. This interruption drew insights from various industry perspectives, particularly from Charles Betz, an analyst at Forrester Research. His reflections illuminate fundamental gaps between IT and broader business operations, areas often not covered by conventional security and operations standards. As businesses increasingly rely on digital infrastructure, the cost of such disruptions is enormous, affecting countless organizations globally. This scenario presented a compelling case study for understanding the intricate relationship between IT functionality and overall business resilience.

The Importance of Integrating Business Continuity and IT Strategies

A reimagined approach to enterprise resilience involves seamlessly marrying long-established practices in business continuity and risk management with modern IT strategies. Betz’s observations suggest that the risks and potential impacts associated with IT have been grossly underestimated within enterprise risk registries. His report, “The State of Modern Technology Operations Maturity,” emphasizes an ever-growing need to redefine enterprise resilience, especially in the face of unprecedented disruptions like the CrowdStrike incident. The rapidly evolving technological ecosystem requires businesses to rethink their approach, ensuring that IT risks are elevated within their strategic frameworks.

Despite advancements in cloud and automation technologies intended to enhance operational efficiency, these innovations could ironically exacerbate the impact of outages. The CrowdStrike outage of 2024 serves as a prime example, where disruption cascaded to affect numerous businesses worldwide. A survey conducted by the incident management firm PagerDuty in December 2024 revealed that a staggering 83% of 1,000 business and IT executives were blindsided by the incident. An additional 88% anticipate the likelihood of facing another major incident within the next year. These alarming statistics underscore the urgency for business leaders to rethink their crisis management strategies and build more resilient systems.

Learning from Worst-Case Scenarios

Learning from worst-case scenarios can provide valuable insights for enhancing resilience. A noteworthy case study is Delta Air Lines, severely impacted by disruptions in its crew-tracking systems during the CrowdStrike outage. Betz emphasizes that the failure was not rooted in IT disaster recovery but rather in broader business continuity. Even with functional IT infrastructure, the airline struggled with logistical constraints in physical operations, such as repositioning crew and aircraft. This highlights the importance of comprehensive business continuity that extends beyond just IT systems.

This perspective brings to light the broader issue of resilience engineering in sectors beyond IT. In aerospace engineering, for example, there are robust practices and profound understandings of resilience that IT can benefit from. Betz argues that IT professionals should look to these domains to adopt more effective resilience strategies. By cross-pollinating resilience strategies from other engineering disciplines, businesses can better prepare for and mitigate future risks, thereby enhancing overall resilience.

The Role of Configuration Management Databases (CMDBs)

Configuration Management Databases (CMDBs) play a critical role in bolstering business continuity. Forrester’s research highlights a resurgence in the relevance of CMDBs, despite traditional criticisms of them being cumbersome and prone to failure. According to the 2022 trend report, organizations with effective CMDB systems tend to achieve superior outcomes across various IT priorities. These databases offer IT teams a deep understanding of their technology assets and dependencies, illustrated by successful use cases at Scotiabank and the Bank of England.

Betz advises against being overly fixated on the term “CMDB” itself, suggesting instead that businesses recognize the broader challenge of data management within IT. IT departments must modernize and contextualize older methods to tackle contemporary challenges effectively. By doing so, companies can ensure they derive holistic, actionable insights into IT dependencies, which is essential for maintaining resilience in today’s interconnected digital landscape.

Bridging the Gap Between IT and Business Operations

The CrowdStrike outage underscores the imperative for bridging the operational gap between IT and broader business functions to enhance enterprise resilience. Critical findings from the incident point to the need for elevating IT risks within enterprise risk management frameworks, learning from engineering domains with advanced resilience practices, and rejuvenating traditional tools like CMDBs. These steps align with the larger trend of rethinking business continuity in an increasingly hyperconnected, digital world.

This analysis aims to transcend the narrow, IT-centric mindset by adopting a multidimensional view of resilience that integrates insights from various disciplines. The cohesive narrative derived from lessons encountered during the CrowdStrike incident advocates for a more comprehensive, forward-thinking approach to business continuity planning. By focusing on diverse perspectives and drawing lessons from different fields, businesses can develop a robust strategy to navigate future challenges successfully.

Lessons for a Resilient Future

The noteworthy CrowdStrike outage in 2024 highlighted significant vulnerabilities in current IT systems and emphasized the critical need for robust business continuity planning. This interruption spurred valuable insights from diverse industry experts, notably Charles Betz, an analyst at Forrester Research. Betz’s analysis illuminated substantial gaps between IT operations and overall business activities, highlighting that these areas are often neglected by traditional security and operational standards. As companies increasingly depend on digital infrastructure, the repercussions of such disruptions are monumental, impacting numerous organizations worldwide. This incident serves as a vital case study for understanding the intricate connection between IT performance and comprehensive business resilience. It underscores the importance of integrating IT and business strategies to fortify organizations against future challenges and ensure seamless operations amid potential crises.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later