From approximately 6:45 p.m. on Sunday March 24 to 3:30 a.m. on Monday, March 25, most IT services were unavailable including the use of wired, wireless, and VPN networks, Shibboleth Single Sign-On, and many services hosted within Brown's datacenter.
All members of the Brown community working on campus, and anyone connecting to our VPN or most services hosted at Brown, were unable to connect successfully.
At approximately 6:45 p.m. on Sunday, March 24, OIT received several automated alerts and customer reports of lost connectivity and service failures. We immediately initiated a service incident response, and started to troubleshoot the problem. We issued a public service alert to the community via our Statuspage alert dashboard at 7:17 p.m., and placed an updated greeting on our phone line at the IT Service Center to communicate about the outage.
Because the problem affected connectivity and authentication, OIT staff were unable to troubleshoot remotely, so we immediately dispatched multiple team members to campus locations. While we could physically access campus spaces and systems, the underlying problem still presented obstacles to logging in to systems.
After some early analysis we determined that the network infrastructure was not operating correctly, leading to failures in network routing, domain name resolution, and general connectivity. OIT contacted our network vendor to join our response. As we investigated the problem, we found workarounds to access critical systems.
At approximately 2:50 a.m. we determined that one of our core network routers in the Brown datacenter was not passing network traffic correctly. We restarted this router, and found that services were restored. We validated multiple services and networks, and resolved our service alert at 3:27 a.m.
OIT did not find any evidence that this problem was caused by malicious actors.
We will continue to analyze available logs and the outage timeline, and will work with our vendor to identify any possible root causes of this problem.
In addition, this outage demonstrated that we need to implement a means of accessing critical systems more quickly if a similar outage happens again. OIT will explore any viable technical approaches and will schedule implementation work in the immediate future.