This isn't the first outage caused by Border Gateway Protocol (BGP); it probably won't be the last. Facebook, Instagram, and WhatsApp back online after BGP fix
F*c*book had not responded to inquiries from reporters yesterday, but then they spent most of the day afternoon trying to pick up the pieces. Even the F*c*book email servers were down.
Today, at approximately 11:50 AM EST, all three websites [Facebook, Instagram, and WhatsApp] were suddenly unreachable, with browsers displaying DNS errors when attempting to open them.
The funniest/sadest part of the day was when they announced that they...
- Needed to get physical access to the data centers to correct the configuration errors, and
- couldn't get into the buildings because the access system relied on the FB internet servers, which were broken.
You can file this under "no one does proper systems design anymore." Building access was required to fix the internet problems, but the internet was required for building access. I'm sure that they are not alone, and I doubt that they will change much relative to their access system. It would be expensive for one thing. And they would have to admit that it was wrong to begin with. "Modes of Failure" is a term people have forgotten about.
The outage lasted until 5PM Eastern, when the BGP routes were available again, and the servers reappeared on the internet.
It is unclear what caused today's outage, but it was likely due to a configuration error, like many other BGP-related outages in the past.
It will be interesting to see how this is explained.
And when F*c*book made this error, they shot themselves in the foot. There are other places a similar configuration error could shoot other people in the foot.
I had a friend who put the combo to his safe in his safety deposit box -- and then up the key to the safety deposit box in the safe. All was fine until he passed.
ReplyDeleteIn reality the Internet is robust. As proven by what just happened. Portions of it can fail but the rest will continue to function. Taking the web down completely would be a monumental task.
ReplyDelete