What you don’t know about Facebook’s Outage

Date: 4 February 2022

Facebook had an outage in October 2021. And everyone heard about it. Even those who were living under a rock. Because, chances are, regardless of your location, you use Facebook Messenger or WhatsApp to communicate.

The impact? We all had a very productive six hours. Because, six hours is how long Facebook, Instagram and WhatsApp were down for.

Facebook attributed the outage to faulty configuration changes to the company’s routers. This beckons the question - if something like this could happen to Facebook, how safe is an average business that doesn't have Facebook’s technological or financial resources.

Amar Singh, CEO and Co-Founder of Cyber Management Alliance and Dawid Kowalski, Senior Technical Technical Director, EMEA, FireMon recently put their heads together to answer this question and others around the Facebook outage.

In a webinar entitled, “What You Don’t Know About Facebook’s Outage?,” the two cybersecurity experts unpacked some interesting aspects of the incident that we can all learn from.

Key topics covered in the webinar:

What really happened in the outage?
Was it a ‘good’ or a ‘bad’ outage?
The real threat behind such outages.
How can such outages be prevented?

The crux of the discussion on the webinar was simple: Don’t underestimate the ability of cybersecurity foundations to completely wreck your business. It’s not always the advanced Nation State actors that will bring ruin to your reputation or bottom line. Sometimes, a simple human error or a faulty process can be as damaging.

And damaging it was for Facebook. For organisations the size of Facebook, every second of downtime means millions of dollars lost. The recent Facebook outage revealed how dependent the platform is on the Facebook algorithm, which is designed to personalize user experiences but can lead to significant disruptions when issues arise.

This particular outage was almost unbelievable for most users. Even the ‘Login with Facebook’ service, which millions depend on for using other apps, was down. Businesses were not able to advertise with Facebook (or create Facebook ads) and Instagram which means the company lost an estimated $100 million in ad revenue. Facebook shares fell by 5% which means that $40 billion got wiped out in a matter of a few hours!

So what exactly was behind the Facebook outage?

As mentioned above, it was a faulty change management process.

Dawid explained in the webinar that every single organisation does and should have a change management process. These change processes, however, are sometimes just a tick box exercise. Very often, these processes don’t include any simulation or analysis of what will happen if that change is made. That’s probably where Facebook failed too.

The company said in its announcement, the outage “impacted many of the internal tools and systems”.

Several organisations today, like Facebook, try to build internal tools but since they don’t specialise in building such tools, there are invariably some flaws and some areas that remain uncovered. In such a case, the IT staff tries to move fast. They know something is broken so they try to resurrect it and that’s when the problem starts to happen - when processes aren’t followed properly.

This is exactly what happened in the case of Facebook - A simple maintenance/configuration change caused Facebook to create ripples and news across the globe.

Dawid gave a brief overview of how he perceived this to happen during the webinar. Facebook did plan for change - they had some scripts for the risk assessment of the change but the scripts didn’t spot the problem. Many people speculate that they didn’t actually have the scripts for that specific type of change because the change was about a minimal impact - removing some of the network connectivity within the environment.

For the detailed explanation, tune into the webinar at 23:00 minutes.

What do we need to know about effective Change Management?

Configuration management/change management can appear to be a boring topic but it’s actually very important. If not managed and tested properly, it can be a big problem as was seen in the outage in question.

In the current environment, all networks and environments started as much smaller networks with one switch which later became multiple switches. One firewall evolved into multiple firewalls.

Any planned change, therefore, is impossible to analyse by a human being. It’s too huge for a human to do it and to correlate it with the security requirements.

As the two experts highlighted in the webinar, the human element needs to be eliminated when trying to analyse a change. The human knows that a change needs to be made and that’s where their role should end. You then need automation that can handle the complexity of IT and cyber. If you don’t have automation, humans are going to fail.

The 4 key terms for effective change management then are complexity, visibility, the human and automation. These 4 terms clearly explain the problem and the solutions to the problem in a Facebook-like outage situation.

How to Protect your Organisation from Network Outages?

Misconfiguration is usually what causes network outages. Basically, traditional approaches to managing network security policy inhibit your company’s ability to innovate and adapt to change.

Here are the five steps the experts offered on the webinar for protecting your organisation from such a network outage:

Conduct an assessment of your security policies & clean up
Streamline and accelerate security with automation
Gain visibility of your network across cloud and on-prem environments
Visualise the impact of security policy changes before you apply them
Integrate your security tools to maximise their performance

To conclude, Amar reiterated that integration of the technology stack is really critical. If you’re not able to automate, integrate and visualise, you are at the mercy of luck when it comes to your network security.

Visibility and scalability is key - the network security policy platform that you choose should be flexible and capable of securing your networks as they get larger and more complex, while maintaining desired workflows.

Is this type of an outage perceived as a ‘good one’?

While no outage can be labelled as ‘good’, this one is not considered bad in technical terms because it wasn’t caused by malicious outsiders.

A bad outage would typically be a ransomware attack or any attack that leads to a data breach or loss of customer data.

The Facebook outage didn’t lead to data compromise and it was a good lesson for everyone even remotely invested in cybersecurity, making experts label it as a ‘good’ outage.

Watch the Webinar here.

Read more about this on the FireMon blog on the Facebook Outage.

To access similar high-quality and educational content, subscribe to the Cyber Management Alliance BrightTALK channel.

NCSC Assured Cyber Incident Planning & Response Course

NCSC Assured Building & Optimising Cyber Incident Response Playbooks Course

NCSC Assured Cyber Security & Privacy Essentials

Cybersecurity Training for Executives

Cybersecurity Best Practise for Team Leaders and Managers

Crisis Management Training for Executives

Cyber Tabletop Exercise Masterclass

All Cyber Security Training Courses

Cyber Tabletop Exercises

Cybersecurity Briefings for Executives

Ransomware Tabletop Exercise

Cyber Tabletop Exercise Masterclass

Ransomware Readiness Assessment

Breach Readiness Assessment

SIEM & Use Case Assessment

Cyber Incident Response Maturity Assessment

One-Day NIST Cyber Health Check

Security Gap Assessment

ISO 27001 Audit

Third-Party Assessments & Audits

Cyber Incident Response Plan Review or Creation

IR Playbooks Creation Service

VCISO

Trusted Advisory

Crisis Communications

Virtual Cyber Consultant (VCC)

Cyber Incident Response Retainer Services

Cybersecurity Consultancy Services

Upcoming Events

Previous Events

Live Events Feedback

Virtual Private Events

Executive Roundtable Dinners

Previous Event Keynotes

Content Creation

Cybersecurity Blog

Webinars

Wisdom of Crowds

Case Studies

Client Testimonials

Our Clients

Meet the Team

Contact Us

About CMA

What you don’t know about Facebook’s Outage

So what exactly was behind the Facebook outage?

What do we need to know about effective Change Management?

How to Protect your Organisation from Network Outages?

Is this type of an outage perceived as a ‘good one’?

Related posts