Texas-based cybersecurity company CrowdStrike published final root cause analysis of a failure that led to a widespread IT outage affecting millions of Microsoft Windows hosts. The incident caused chaos across the world, impacting airlines, banking and healthcare services.
The blackout was traced to a specific software update issue involving the Falcon Sensor software designed to protect systems by detecting and mitigating the latest cyber threats.
The incident, referred to as the “Channel File 291” issue, was initially outlined in a Preliminary Post Incident Review (PIR). The analysis identified the cause as a content validation problem that emerged after the introduction of a new Template Type in sensor version 7.11, released in February 2024. The update was aimed at enhancing the detection of novel attack techniques exploiting named pipes and other Windows interprocess communication (IPC) mechanisms.
According to the PIR, the new IPC Template Type was rigorously developed and tested following CrowdStrike's standard Sensor Content development procedures. The IPC Template Instances are delivered as Rapid Response Content to sensors via a Channel File, specifically numbered 291 in this case.
The crash occurred on July 19, 2024, when two additional IPC Template Instances were deployed. One of these instances included a non-wildcard matching criterion for the 21st input parameter. This adjustment necessitated the sensor to evaluate the 21st input parameter, a function not previously required by earlier channel file versions.
“The new IPC Template Type defined 21 input parameter fields, but the integration code that invoked the Content Interpreter with Channel File 291’s Template Instances supplied only 20 input values to match against. This parameter count mismatch evaded multiple layers of build validation and testing, as it was not discovered during the sensor release testing process, the Template Type (using a test Template Instance) stress testing or the first several successful deployments of IPC Template Instances in the field. In part, this was due to the use of wildcard matching criteria for the 21st input during testing and in the initial IPC Template Instances,” CrowdStrike explained.
In short, the Content Validator assessed the new Template Instances but assumed the IPC Template Type would receive 21 inputs. However, when the updated Channel File 291 was sent to the sensors, it led to an out-of-bounds read issue within the Content Interpreter. As sensors processed IPC notifications from the operating system, they attempted to access the 21st input value, despite only expecting 20 values. This mismatch caused the system to read beyond the input data array's bounds, triggering widespread crashes.
Following the incident, CrowdStrike implemented several measures to prevent a similar occurrence in the future. The company introduced compile-time and runtime input validation for the Template Type to prevent out-of-bounds memory reads. Specifically, it added runtime input array bounds checks to the Content Interpreter and corrected the number of inputs provided by the IPC Template Type. The updates aim to prevent system crashes and ensure the size of input arrays matches expected inputs.
Additionally, CrowdStrike plans to increase test coverage during Template Type development, incorporating test cases for non-wildcard matching criteria. Sensor updates also include modifications to the Content Validator to ensure proper matching criteria and prevent out-of-bounds access by restricting wildcard criteria to the 21st field.
The Content Configuration System now includes new test procedures and additional deployment layers to ensure comprehensive testing of every new Template Instance. The Falcon platform has also been enhanced to give customers more control over Rapid Response Content delivery.
Lastly, CrowdStrike has engaged two independent third-party security vendors for a thorough review of the Falcon sensor code and is conducting an independent review of the end-to-end quality process.