Dubai: Facebook, the world’s largest social media platform, along with its instant messaging service WhatsApp and photo and video sharing app Instagram, was down for almost 6 hours Monday. This was the longest period any major social platform was inaccessible in recent times. It left billions of people disconnected from news sources, events and friends.
The outage prevented the company’s 3.5 billion users from accessing its social media and messaging services such as WhatsApp, Instagram and Messenger, the largest ever tracked by web monitoring group Downdetector.
Confused subscribers flocked to alternative platforms like Twitter, TikTok and Telegram. Facebook shares fell 4.9% and the company lost an estimated $6 billion in revenue in the process.
“To every small and large business, family, and individual who depends on us, I’m sorry,” Facebook Chief Technology Officer Mike Schroepfer tweeted, adding that it “may take some time to get to 100%.”
There were speculations of a cyber-attack and even rumours about the Facebook domain set for a sale. The outage not only impacted the site and apps but also Facebook internal networks and official emails.
In a statement, Facebook blamed it on a ‘faulty configuration change’. However, the company in its blog post did not mention the reason for the configuration change.
“This disruption to network traffic had a cascading effect on the way our data centres communicate, bringing our services to a halt,” the company’s engineering team wrote in a blog post on Monday night.
Many Facebook employees who declined to be named told Reuters that the outage was caused by an internal mistake in how internet traffic is routed to its systems.
But what actually caused the outage?
What Facebook says
Santosh Janardhan, Facebook’s vice president of infrastructure, wrote on a blog post that “configuration changes on the backbone routers that coordinate network traffic between our data centres caused issues that interrupted this communication”.
What does it mean?
Cyber experts think the problem boils down to something called BGP, or Border Gateway Patrol. It is the system the Internet uses to select the quickest route to move packets of information around.
Sami Slim of data centre company Telehouse compared BGP to ‘the internet equivalent of air traffic control’, AFP noted.
In the same way that air traffic controllers sometimes make changes to flight schedules, ‘Facebook did an update of these routes,’ Slim said.
However, this routing update contained a crucial error.
It’s not yet clear how or why, but the outage was triggered when Facebook’s routers essentially sent a message to the internet announcing that the company’s servers no longer existed.
Why did it take so long to fix it?
Facebook’s technical infrastructure is unusually reliant on its own systems – and that proved disastrous, noted experts.
Once the fateful routing update was sent, Facebook engineers got locked out of the system that would allow them to communicate that the update had, in fact, been an error. So they couldn’t fix the problem.
The knock-on effects of the shutdown were so extensive that some Facebook employees were unable to even enter their buildings because their security badges no longer worked, further slowing the response.