We have seen the public impact of the recent global IT meltdown – days of cancelled/delayed flights, 911 systems that were down, and other government and commercial organizations essentially put on hold. Unfortunately, this is very unlikely to be the last such incident we see.
As a brief recap, on Thursday night, July 18th, CrowdStrike, a publisher of cyber security software, released an update. This update caused many computers running Microsoft Windows operating systems to crash (aka blue screen of death). 74% of government computers and 300 of the Fortune 500 run CrowdStrike. Even though CrowdStrike quickly released a fix, the damage was already done. To reverse the crash, just about every affected machine had to be individually safe booted, accessed, and specific files removed. Since most companies do these types of changes en masse using automation software, this individual attention was laborious and took a long time to perform. Hence the days of outages. Microsoft reported that only 1% of its machines were affected – 8.5 million – but it was enough to cause over $1 billion in losses.
The reason this could happen is due to a major shift in how IT handles updates. It used to be that when a new release came out, the IT department would roll it out first to a set of test machines, then relatively unimportant production machines, and so on until they felt the software was safe for general deployment, at which time it would be pushed out to all the machines under management. But now, software often updates itself with little or no warning to the IT management. For example, if you run Microsoft 365 products, that software updates itself without notification or approval – it just happens. This is similarly true for security software, where getting a security patch or plugging a vulnerability quickly is paramount to keep cyber criminals out. In effect, the responsible party for testing the safety of releases has shifted from the local IT department to the publishers. The IT departments have come to trust that the publishers will adequately test their software. 99.999% of the time this works out well. But when it doesn’t, the effects can be far reaching.
This has also happened before. In 2019 SolarWinds, who makes IT management software, released a version which contained malware that was inserted by hackers. When the software updates were pushed out, all their clients experienced a data breach. The consumer impact was not as large, but fundamentals of how a bad software release can harm its customers is largely the same.
Of course, this brings up the nightmare scenario where Malware or other faulty software gets injected into a Microsoft OS release (Microsoft was just recently chastised by the US Government for lackluster security practices), and the 850 million Windows machines out there are taken out or compromised in the matter of a few days. But even if it is a company with a smaller subset of installations – say Goggle, Adobe, McAfee, VMWare, Bitdefender, Intuit, SentinelOne, Zoom, etc., etc. – had that happen, the global impact would still be enormous.
Until a different framework is demanded of software publishers, it’s only a matter of time until we have another global IT meltdown. Most SMBs can’t do much to stop this from happening, but they can prepare by planning for these outages and how they can continue to operate for a few days without their IT systems. For many businesses, this level of preparation could mean the difference between going out of business and staying in business. If you haven’t done that yet, a common first start is preparing a business continuity plan or a business resilience plan, which looks at what you would do should you experience an IT outage, among other things.
At SpotLink, we have helped several companies prepare such plans, and are here to assist you as well. While we can’t prevent the cloud from going down or poorly released software updates from being pushed out to your systems, we can help you survive if it does happen.
Robert Hood
CEO & Founder
SpotLink