Facebook and its app family (including Instagram and WhatsApp) seemingly disappeared from the internet for over five hours on Monday. The outage impacted more than 3. 5 billion users worldwide who use Facebook and related platforms to connect with friends and family, to drive business through advertising, or to sway politics through outreach.
What Damage Did The Facebook Outage Cause?
With the disruption in service, panic ensued. The lapse caused a loss in revenue for some and a loss in human connection for others “showcasing just how dependent the world has become on a company under intense scrutiny…” (Facebook is under scrutiny for its negative impact on younger users.)
Influential Twitter account @TheTweetOfGod surmised that “Instagram and Facebook are currently not working, as are democracy, society and a healthy sense of self.” Facebook itself was more matter of a fact in its Twitter coverage of the incident: “We’re sorry….Thank you for bearing with us.”
In “our increasingly digitally mediated work economy”, what are the implications this Facebook service interruption has for TPRM?
What Caused The Facebook Outage?
While it’s too soon to confirm, it’s widely believed the recent outage on Facebook was related to DNS configurations and/or BGP routes. So what does this mean? DNS stands for domain name service and BGP is the border gateway protocol.
Think of it this way. When you want to get driving directions to your favorite restaurant you may or may not know the address (DNS), but that’s ok, because the address is static and not likely to change. You then rely on your smart device to get directions (BGP) with the fastest route for you. The same is true for Internet traffic.
Human Element In Technology
How does this relate back to Facebook and the human element? Business computer “street addresses” rarely (if ever) change, especially on the global scale of Facebook. Millions of users asked their phone or computer to take them to Facebook, and the route was unknown, too busy, or inaccessible (happens all the time in L.A., traffic there is brutal). DNS servers and BGP routers are closely guarded assets due to their criticality. Imagine closing down the Golden Gate bridge or the Lincoln tunnel during rush hour. Internet routers, switches, firewalls, and DNS servers don’t change configuration without human action. Whether it was intentional or accidental, internal or external, the fact remains it was a major outage and I’m certain Facebook is deep in the throes of a root cause analysis.
What Can TPRM Do To Avoid Outages?
Now more than ever, third party risk management practices must ensure basic IT security tenants such as change control, privileged access management, logging and reporting, intrusion detection/prevention, along with all of the other layers of the security onion which envelop them. The trust but verify model works, but you have to do both. It would be fascinating to see their SOC 2 Type II report.