CrowdStrike hints at root cause of Friday's sweeping IT outage

Outages Roil Banks, Airlines, Crowdstrike Shares Plunge
While the company is still analyzing the error that caused many Windows systems to crash, it said a logic error in a channel file was the cause.
Andrey Rudakov/Bloomberg

During a widespread IT outage that started Friday because of an issue with CrowdStrike's cybersecurity software Falcon Sensor, the company acknowledged a "content update" lay at the heart of the fiasco. On Saturday, the company offered a few additional details and promised more information to come.

The Saturday blog post discloses that a "logic error" in an updated file led to the blue screen of death (BSOD) on Windows systems, but details on how this logic error occurred remain unclear. The company is conducting a root cause analysis, the blog post reads. It has not provided a timeline for publishing this analysis.

The error prevented numerous Windows users from logging into their computers, including at Fifth Third Bank. At TD Bank, digital systems were disrupted. Synovus Financial had to implement "contingency plans" to minimize disruptions to clients. All branches and bank offices of Canandaigua National Bank, a $5 billion institution in Canandaigua, New York, were affected.

At a high level, what happened on Friday to these banks and many other companies — particularly airlines — is that CrowdStrike's buggy update caused computers onto which it was automatically installed to crash. Crowdstrike then sent out an update with a fix, but many computers didn't take up the fix because they had already crashed.

Here are the technical details on the how the crashes occurred, why many systems didn't automatically recover, what information remains unclear, and what it all means for customers and the global IT system:

What is Falcon Sensor?

Falcon Sensor is an endpoint security platform from CrowdStrike. In other words, it is software for a variety of platforms — Windows, Mac, Linux, and mobile devices — that detects and shuts down cybersecurity threats on the computer where it is installed.

Here's one example of how it works: If a bank issues an employee a Windows device with Falcon Sensor installed, and the employee attempts to install an unauthorized application, Falcon Sensor will notify the bank of this potential security hazard.

One of the many threats that Falcon Sensor attempts to detect and prevent is malicious named pipes.

What is a pipe?

A pipe is a section of memory in a computer that different pieces of software can use to communicate with each other. A pipe is so named because its function resembles real-life pipes; as items (or, in the case of computers, messages) stack up inside a pipe, the first item placed in one end is the first one to come out the other end.

While most pipes last only as long as the piece of software that created them, some remain even if that piece of software exits — whether the program finishes up its work or gets shut down. These are called named pipes; they have a name that software can use to find them. Named pipes have a large variety of uses, including enabling certain internet connections.

Key Speakers At The RSA Conference

Customer reports of technical issues with many U.S. banks have spiked as a buggy software update from CrowdStrike disrupts multiple sectors.

July 19

One malicious use of a pipe occurs when malware connects to the server of the cyberattacker that controls it. When the malware establishes this connection, the attacker can send instructions back to the malware to extract a file from the victim's computer, open up an application, or take a number of other actions to control the computer. If a pipe is used in the internet connection, all the instructions flow through the pipe.

This type of remote control over a computer is known as a command and control (C2) scheme. In command and control schemes, cyberattackers tend to reuse the same type of connection — or, in some instances, the same name for a named pipe — to take control of different victims' computers.

Even if the name of the pipe isn't obviously malicious ("Evil Pipe" or "Pipe for Stealing Data"), attackers might use the same, ordinary-seeming name in multiple attacks. Or, they might load the pipe with the same initial data in multiple attacks. CrowdStrike and other cybersecurity companies use these so-called "signals" to identify potential risks and either flag or automatically extinguish them.

As cyberattackers use new types of attacks, CrowdStrike has to update the Falcon Sensor applications installed on Windows computers all across the world, so that the Falcon Sensors know what to look out for. Falcon Sensor automatically installs these updates on a regular basis. This is where the fateful logic error comes in.

Where was the logic error?

On a Windows computer with Falcon Sensor installed, CrowdStrike uses so-called "Channel Files" to store information about the signals that malware might be installed or active on the computer. These Channel Files list the various red flags of malware, such as a new connection to a black-listed IP address, or a newly downloaded application that has been used in other cyberattacks.

One of the many Channel Files that CrowdStrike maintains — Channel File 291 — lists the red flags that a named pipe might be malicious. At 12:09 a.m. on Friday morning, CrowdStrike pushed out an update to Channel File 291. The company pushes out such updates multiple times a day as it detects new threats; these updates to Channel Files help the Falcon Sensor application detect these new threats on the computers where they are installed.

The problem was that, for some reason that CrowdStrike has not yet identified publicly, the updated Channel File 291 caused an error on the computers where it was downloaded.

Roughly an hour later, at 1:27 a.m., CrowdStrike pushed out another update to Channel File 291 that reverted the change that had caused the error. The problem was that, in the intervening hour, Windows computers across the world had already downloaded the flawed file, causing them to crash.

Why didn't computers recover automatically after the update?

Even though CrowdStrike released a fix at 1:27 a.m., Windows computers that had already crashed because of the 12:09 a.m. update could not download the fix because, as they began to boot up, they would crash before downloading the fix.

At least, that is what happened to many computers.

Some Windows computers were able to recover automatically, but often only after getting rebooted manually, and sometimes multiple times in a row. CrowdStrike has not offered an explanation of why some Windows computers recovered automatically but some did not, but there is a potential hint.

CrowdStrike's official recommendation for fixing individual computers (rather than cloud computers) is to reboot the computer while it is connected to the network via a wired connection rather than Wi-Fi. This, the company said, allows the computer to acquire internet connectivity considerably faster.

In other words, the ability of some machines to recover automatically may be effectively random, but the odds of recovery may be improved by a faster internet connection, which allows the computer to download Channel File updates faster than Falcon Sensor can try to open the old versions.

For reprint and licensing requests for this article, click here.
Cyber security Technology
MORE FROM AMERICAN BANKER