In light of the recent CrowdStrike outage, which caused significant disruptions, we deeply understand the impact this has had on Aspire customers, and organisations across the world. As a dedicated Managed Security Service Provider, our primary goal is to ensure the security and operational continuity of our customers’ systems.
We would like to address how we managed this incident, the steps we took to support our customers, and address any queries that our customers may have.
Overview of the CrowdStrike Outage:
The CrowdStrike Falcon platform is a critical security solution used by organisations of all sizes, across various industries. The CrowdStrike outage was caused by a defect found in a Falcon Rapid Response content update, specifically designed for Windows hosts.
CrowdStrike delivers security content configuration updates to their sensors in two ways: Sensor Content that is shipped with each sensor version, and Rapid Response Content that is designed to respond to the changing threat landscape at operational speed. By nature, Rapid Response Content updates are released multiple times per day to provide users with the latest threat mitigation and protection. CrowdStrike performs validation checks on these updates through their Content Validator before they are published.
On July 19, 2024, CrowdStrike released a routine Rapid Response content update for the Windows sensor. A bug in CrowdStrike’s Content Validator allowed this content update to pass validation checks, despite containing a defect. This defect triggered an exception that Windows operating systems could not handle, resulting in a Blue Screen of Death (BSOD) on affected systems.
CrowdStrike quickly identified and deployed a fix for the issue within 79 minutes. However, for many users, this reversion came too late, leading to Blue Screen of Death (BSOD) errors as devices had already updated.
Importantly, this was not a cyber attack. Microsoft estimated that the update affected 8.5 million Windows devices, representing less than 1% of all Windows machines. While the incident caused significant disruptions, CrowdStrike’s swift response helped to minimise the overall impact.
Immediate Steps Taken by Aspire:
Our customer systems utilising Falcon were affected, this equated to thousands of devices. Our immediate focus was on providing customers with technical guidance and support. Here’s a timeline of our response:
- Notification and Response: We were notified of the outage in the early hours of Friday 19 July. We immediately issued customer communications and provided regular updates. Our focus was on providing customers with technical guidance and advice which can be found here, and providing support through calls and customer visits. By 5pm that day, nearly 90% of affected devices were fully operational.
- Weekend Support: Our team worked throughout the weekend to bring the remaining devices back online. Aside from a few systems, our customers are now largely restored and fully operational.
Key Information for Our Customers:
How will the reliability of services be ensured moving forward?
To ensure the continued reliability of our services, we are working closely with CrowdStrike to thoroughly understand the root cause of the outage and to implement measures that prevent any future occurrences. In addition to this, we are conducting a comprehensive review of our response protocols to identify any areas for improvement. This review will help us refine our processes and strengthen our overall service reliability. Our commitment to delivering best-in-class solutions and maintaining excellent service remains. We will continue to communicate transparently with our customers and provide updates as new information becomes available.
How will CrowdStrike prevent this from happening again?
To prevent similar outages, CrowdStrike has released a Preliminary Post Incident Review. They are making key improvements, this includes:
- Software Resiliency and Testing: Diversifying and improving testing methods such as local developer, rollback, stress, fuzzing, fault injection, stability, and interface testing. Adding extra validation checks to the Content Validator and enhancing exiting error handling in the Content Interpreter.
- Rapid Response Content Deployment: Implementing a staggered deployment strategy for Rapid Response Content, starting with canary deployments. Improving monitoring during rollouts. Providing customers with greater control over the delivery of Rapid Response Content updates. Providing content update details via release notes, which customers can subscribe to.
In addition to their preliminary Post Incident Review, CrowdStrike is committed to publicly releasing the full Root Cause Analysis once their investigation is complete.
Why were Apple and Linux were not affected by the outage?
CrowdStrike’s software operates not only on Microsoft Windows but also on Apple’s macOS and the Linux OS. However, the outage exclusively impacted Microsoft Windows. The content update was unique to the Microsoft Windows OS and was not deployed to macOS or Linux systems. Moreover, the Falcon sensor’s integration as a Windows kernel process differs from its integration in mac or Linux.
Can Falcon access be limited?
For any anti-malware software to be effective, it needs to operate at the most primitive and most privileged level within the operating system: that of the kernel. This is necessary to afford systems the maximum protection, but inevitably kernel-level software can cause serious issues if there are corrupt or problematic files.
How can customers stay informed about updates and new information regarding the outage?
Customers can check our Service Status page for detailed information. Customers can also reach out to our support desk for any immediate questions or assistance. As always, our team is always available to help and provide support as needed. CrowdStrike is continually updating their Remediation and Guidance Hub to keep customers informed.
Next Steps:
While software updates may occasionally cause disruptions, incidents of this magnitude are rare. We will continue to work closely with CrowdStrike to ensure the highest level of protection for your organisation. Our confidence in CrowdStrike’s leadership in cyber security remains, and through our partnership, we will advocate for enhanced prevention measures. We will provide customer updates as new information becomes available.
We recommend that organisations issue guidance to their employees to be vigilant against phishing attacks. The NCSC has reported a notable increase in phishing emails related to this event, as well as an uptick in lookalike domain registrations. It is crucial to verify the authenticity of any communications claiming to be from CrowdStrike, Microsoft, or Aspire.
We appreciate the cooperation, support, and patience of our customers and team during this time. Moving forward, we will share insights and next steps based on our learnings. Should you have any questions or need further assistance, please do not hesitate to reach out to us.