Worldwide IT outage affecting Brown services and computers
Incident Report for Brown University
Postmortem

What services were affected? For how long?

From approximately 12:15 a.m. on Friday, July 19 through 12:30 p.m. on Saturday, July 20, many Brown IT services were unavailable or experiencing intermittent problems. In addition, from 12:15 a.m. on Friday, July 19 through 4 p.m. on Tuesday, July 23, a large number of Windows computers used by Brown community members were failing to start correctly. In the case of both servers and PCs, the Windows operating system was continuously crashing and restarting.

Who was affected?

This outage affected all members of the Brown community who were trying to use any of the affected services during the outage, as well as anyone using a Windows PC that was affected by the problem.

What happened?

Soon after midnight on the morning of Friday, July 19, a misconfigured software update from the security provider CrowdStrike started causing technology outages around the world. The behavior of the problem was continuous crashing and restarting of affected computers.

This impacted most Brown IT services running on over 1100 servers, as well as over 1000 Windows desktops and laptops used by students, faculty and staff across the Brown community.

The Office of Information Technology (OIT) first observed services failing shortly after 1 a.m. on Friday and immediately assembled to diagnose the problem and start recovering critical services. CrowdStrike issued a fix for the problem by 1:45 a.m. on Friday to prevent any further impact, but the problem had already been created on any Windows computers with CrowdStrike installed that was actively running between 12:15 a.m. and 1:30 a.m. Any affected servers or PCs could not be fixed remotely or programmatically; instead, each one would need a manual fix applied by someone directly at the computer, with knowledge of individual storage encryption keys used on all Brown-owned computers. Knowing this would take many hours to resolve, OIT moved quickly to alert the Brown community. Because Brown's usual bulk email services were affected, OIT sent email to the community before 7 a.m. using the Brown Alert system, and published an outage message on the OIT Statuspage service dashboard and the phone greeting at the OIT Help Desk.

Over the course of the day on Friday, over 150 people from many OIT teams worked urgently to restore services and to repair individual PCs by hand across our College Hill and Jewelry District campus areas. OIT also worked closely with groups of departmental IT partners to ensure the highest rate of resolution. By the end of the day on Friday, over 500 people had working PCs again and most mission-critical services were running properly. OIT completed restoring all services by approximately noon on Saturday, July 20, and updated the Statuspage alert at that time. OIT and IT partners worked several more days the next week to resolve as many affected PCs as possible.

CrowdStrike has been very transparent, responsive, and supportive throughout this outage, and has released their extended technical analysis of the entire event on their outage-specific information hub. In addition, they have started work immediately on a full audit and improvement of their code release processes. They continue to keep us informed in detail as a customer of their services.

What is OIT doing about it?

As of the publication date of this After-Action Report, there are still almost 200 additional PCs expected to need attention across the Brown community. If you have a PC that is constantly restarting, please refer to the IT Help Article available from OIT, or contact your usual IT support professional or the OIT Help Desk so they can help you.

In addition, we have held analytical reviews of our service architecture and incident mitigation steps, our support work to resolve PCs in the field, and our multiple communication steps during this significant outage. Each of these reviews has led to multiple plans to improve our service resiliency and our readiness for future major incidents.

OIT would like to express our joint appreciation for the compassion, patience, and many words of encouragement and appreciation we received from students, faculty and staff during this major outage. We are proud to support you, and we are truly grateful for your kindness and your trust.

Posted Aug 14, 2024 - 15:09 EDT

Resolved
This incident has been resolved.
Posted Jul 22, 2024 - 09:26 EDT
Identified
OIT has brought all production services online, there are still some dev/qa services that will be addressed on Monday but the critical recovery effort has concluded. Thank you for your patience.
Posted Jul 20, 2024 - 10:57 EDT
Update
OIT is pausing recovery efforts for the evening. Our business-critical services are largely operational, though some functionality may still be missing. Our team is taking a well-deserved break and will resume work on Saturday morning. Thank you for your patience.
Posted Jul 19, 2024 - 19:13 EDT
Update
Due to this outage affecting our scheduling system (AppWorx), the batch shift (jobs and process flows) scheduled for last night, Thursday, July 18 did not finish fully and the batch shift (jobs and process flows) scheduled for today, Friday, July 19, will not be executed as planned.

The OIT team is actively working on resolving this issue, and we anticipate a resolution soon.
Posted Jul 19, 2024 - 13:11 EDT
Update
Anyone having trouble logging in to Brown's Workday service can use the Workday button published on https://www.brown.edu/staff
Posted Jul 19, 2024 - 10:35 EDT
Update
We are continuing to investigate this issue.
Posted Jul 19, 2024 - 08:44 EDT
Update
If you are experiencing repeated reboots or crashes on a Windows computer today, regardless of being Brown-owned or a personal computer, this IT help article will help you resolve the issue: https://ithelp.brown.edu/kb/articles/2108
Posted Jul 19, 2024 - 08:29 EDT
Update
Overnight, a software update from security provider CrowdStrike caused technology system outages at organizations across the world. A majority of Brown IT services are impacted, as well as many Windows desktops and laptops, including Brown-owned and any personal computers running CrowdStrike. The 'Brown' wifi network may also be intermittent. The Office of Information Technology has been working urgently to resolve these issues.

At this time, services such as Shibboleth authentication, VPN, file storage and many others may be intermittent or unavailable. If you have a Windows laptop or desktop computer that is rebooting or crashing repeatedly, please reach out to the OIT Help Desk (http://ithelp.brown.edu/) or your IT support team for help. OIT will publish updates at https://brownuniversity.statuspage.io as we make progress restoring services.

We recognize the significant impact that this will have on operations and appreciate your patience as we work to return all services to normal status.
Posted Jul 19, 2024 - 06:58 EDT
Update
There is a major outage of most OIT services and Windows-based desktop and laptop computers. This is related to a widespread problem in a recent update of the Crowdstrike security service which is affecting computers around the world. Most services including file storage, VPN, Shibboleth authentication, and others are unavailable. OIT is working on resolving this with the highest urgency. We will provide updates in this alert as they are available.
Posted Jul 19, 2024 - 03:16 EDT
Investigating
We’re experiencing issues with Multiple Services. A subset of users are impacted and our team is actively working to resolve this. We will provide more details as soon as more information becomes available.
Posted Jul 19, 2024 - 01:51 EDT
This incident affected: Infrastructure (Hosted Virtual Machines (Hyper-V)).