Back to blog

How BGP Actually Works: A No-Nonsense Guide for Application Developers

Written by

PR

Prashant Sharma

@eshuplus

Wednesday, April 8, 202610 viewsEnglish
cover

Let's be honest. Most developers hear "BGP" and immediately picture grey-haired network engineers in a dimly lit NOC, surrounded by Cisco routers and half-empty coffee mugs. And honestly? That image isn't totally wrong. BGP has historically been the domain of network operators, not people writing REST APIs or debugging Kubernetes pods.

But here's the thing BGP is the reason your app is reachable at all. Every HTTP request your users make, every DNS lookup, every TLS handshake it all travels paths that BGP decided. And when BGP breaks, everything breaks. Facebook's 6-hour outage in 2021? BGP. Pakistan Telecom accidentally hijacking YouTube in 2008? BGP. Cloudflare going dark in 2022? BGP.

You don't need to become a network engineer. But you absolutely should understand how this thing works. Let's get into it.

01 THE BIG PICTURE - The Internet Is Not One Network It's 100,000+ Networks Talking to Each Other

Here's a mental model that will make everything click: think of the internet as a federation of countries. Each country (called an Autonomous System, or AS) manages its own territory its own IP addresses, its own internal routing, its own rules. Your ISP is a country. AWS is a country. Google is a country. Cloudflare is a country.

Now, how do these countries negotiate traffic? They can't just yell across the border. They need a common diplomatic language. That language is BGP the Border Gateway Protocol.

Quick Definition

An Autonomous System (AS) is a collection of IP networks under a single organization's control, with a unified routing policy. Each AS gets a globally unique ASN (Autonomous System Number) like AS13335 for Cloudflare, or AS15169 for Google.

BGP is what allows AS13335 (Cloudflare) to tell AS15169 (Google): "Hey, I know how to reach 104.16.0.0/12. Route packets destined for that block through me." Google accepts this, notes it down in its routing table, and starts sending matching traffic Cloudflare's way. Simple concept. Insanely complex implementation.

02 THE MECHANICS - How BGP Peers Actually Talk to Each Other

BGP is a path-vector protocol. Unlike distance-vector protocols (which share hop counts) or link-state protocols (which map the whole topology), BGP says: "Here is the exact sequence of ASes you'd travel through to reach this prefix." That sequence is called the AS Path.

Two BGP routers that agree to exchange routing info are called peers or neighbors. Before they can share routes, they establish a TCP session yes, plain TCP on port 179. Once connected, they exchange:

  • OPEN "Hello, I'm AS13335. Here are my capabilities." Kicks off the session.
  • UPDATE "I'm advertising these prefixes" or "withdraw that route I told you about." The bread and butter of BGP.
  • KEEPALIVE "Still alive. Don't drop me." Sent every 60 seconds by default.
  • NOTIFICATION "Something went wrong. I'm closing this session." Then it slams the TCP connection shut.

There are two flavors of BGP peering. eBGP (external BGP) runs between different autonomous systems this is the internet-facing stuff, the "diplomatic" layer. iBGP (internal BGP) runs within a single AS, allowing routers inside the same organization to share routes they learned from the outside world.

03 THE ROUTING TABLE - What a BGP Route Actually Looks Like

When a BGP router receives an UPDATE, it stores the route in a structure called the RIB (Routing Information Base). Each entry looks something like this in real life (simplified for humans):

PrefixAS PathNext HopLocal PrefMEDBest?
8.8.8.0/2415169203.0.113.11000Best
8.8.8.0/243356 15169198.51.100.510050Backup
8.8.8.0/241299 3356 15169192.0.2.9800Unused

When multiple routes exist for the same prefix (which is super common), BGP runs its famous best path selection algorithm a strict decision process that evaluates attributes in order until it picks a winner. The full list has about 13 steps, but the most important ones in practice are:

// BGP Best Path Selection (simplified order)
1. Highest Local Preference wins // set by you, stays within your AS
2. Shortest AS Path wins // fewer hops = preferred
3. Lowest Origin code wins // IGP < EGP < incomplete
4. Lowest MED wins // hint from neighbor: "use this exit"
5. eBGP over iBGP // prefer external peers
6. Lowest IGP metric to next-hop // internal cost to reach the exit
7. Lowest Router ID breaks tie // basically a coin flip at this point

This matters to you as a developer because understanding these attributes is how large companies control where their traffic enters and exits the internet. It's not magic it's just knobs.

04 THE SCARY PART - BGP Has No Built-In Authentication And That's Terrifying

Here's what should keep you up at night: by default, BGP trusts whatever its neighbors tell it. If a router announces "I own 1.1.1.0/24 (Cloudflare's IP block)," other routers might just... believe it. This is called a BGP hijack, and it's caused some of the most spectacular internet outages and security incidents in history.

Real Incident April 2010

China Telecom (AS23724) advertised roughly 50,000 IP prefixes it didn't own affecting traffic destined for the US Senate, the US Army, and major banks. Traffic was silently rerouted through China for about 18 minutes. Nobody could definitively prove intent. BGP didn't care either way.

The fix that's slowly rolling out is called RPKI (Resource Public Key Infrastructure). Think of it as a certificate authority system for BGP routes each IP prefix owner can cryptographically sign a record saying "only AS13335 is allowed to originate 104.16.0.0/12." Routers that validate RPKI records will reject any announcement that doesn't match. As of 2025, adoption is solid among major networks but far from universal.

# What an RPKI Route Origin Authorization (ROA) looks like:
Prefix: 104.16.0.0/12
Max Length: 24
Origin ASN: AS13335
Valid Until: 2026-06-30
Signature: [cryptographic proof]
 
# Any announcement of 104.16.0.0/12 from a different ASN
# will be marked INVALID and dropped by RPKI-validating routers

05 WHY DEVELOPERS SHOULD CARE - BGP and Your Application: More Connected Than You Think

Okay, so BGP is a big deal at the infrastructure level. But you're writing Go microservices or React apps. Why does any of this matter to you? Here's where it gets practical:

Anycast routing the magic behind why 1.1.1.1 is fast everywhere is entirely BGP-powered. Cloudflare announces the same IP prefix from data centers in 300+ cities simultaneously. BGP naturally routes you to the nearest one based on AS path length. If you're building a globally distributed service and want the same behavior, you're going to need to understand BGP or work with a provider that does.

Latency is political, not just physical. Traffic between two cities in the same country might route through a completely different continent if the business peering agreements work out that way. You've probably hit this when running traceroute and wondering why a hop to a server 50 miles away goes through London. That's BGP deciding based on economics, not geography.

# Ever run this and gotten confused results?
$ traceroute 8.8.8.8
 
1 192.168.1.1 1.2ms (your router)
2 10.50.4.1 5.4ms (ISP edge)
3 72.14.215.165 18.2ms (Google peering point - could be far away!)
4 8.8.8.8 19.1ms (destination)
 
# That hop 3 is where BGP made a decision.
# It might have gone through Chicago to reach a server in New York.
# BGP picked that path because the AS Path was shorter or preferred.

Multi-CDN and failover strategies if you're running traffic through multiple CDN providers for redundancy, you're implicitly depending on BGP to steer traffic correctly when one goes down. Understanding prefix length specificity (a /24 beats a /20 for the same traffic) helps you design smarter failover systems.

Developer Pro Tip

Tools like BGPlay (via RIPE NCC) let you visually replay historical BGP events for any prefix. If you're ever investigating a past incident where traffic mysteriously disappeared or rerouted, this is your forensics tool.

06 THE FACEBOOK POSTMORTEM EXPLAINED - That Famous 2021 Outage, Decoded

In October 2021, Facebook, Instagram, and WhatsApp went dark for over six hours. The root cause was a BGP configuration change that accidentally withdrew all of Facebook's IP prefixes from the global routing table. From the internet's perspective, Facebook's entire address space just vanished.

Here's what made it catastrophic: Facebook's internal infrastructure used those same external IPs for internal communication. When the BGP routes disappeared, Facebook's own systems couldn't talk to each other. The DNS servers couldn't reach the configuration systems. The engineers who needed to fix it remotely couldn't log in because their VPN also relied on the withdrawn routes. They had to physically drive to the data center with credentials.

# What a BGP withdrawal looks like in a route announcement:
 
# Normal day - Facebook announces its prefixes:
ANNOUNCE 157.240.0.0/17 AS-PATH: 32934
 
# After the bad config change - withdrawal sent:
WITHDRAW 157.240.0.0/17
 
# Every router on earth that received this
# immediately deleted the route from its table.
# 3.5 billion users. Unreachable. Instantly.

The lesson isn't just "be careful with BGP changes." The deeper lesson is: when BGP breaks, you lose the ability to fix it remotely. It's like accidentally locking yourself out of a house and also dropping the spare key inside. Your runbooks need to account for the scenario where your own tools are unreachable.

07 THE GLOSSARY YOU ACTUALLY NEED - BGP Jargon, Translated for Normal Humans

Essential Terms

  • Prefix A block of IP addresses in CIDR notation. 10.0.0.0/8 means "all 16 million IPs starting with 10." BGP routes are advertised per-prefix.
  • Route leak When an AS accidentally redistributes routes it shouldn't. Like a middleman telling everyone in a private conversation. Usually unintentional, always disruptive.
  • Transit When AS-A pays AS-B to carry its traffic to the rest of the internet. AS-B is the transit provider. AS-A is the customer.
  • Peering A free, mutual agreement between two networks to exchange traffic directly. No money changes hands. Both benefit from shorter paths.
  • IXP (Internet Exchange Point) A physical location (like AMS-IX in Amsterdam or Equinix in Ashburn) where hundreds of networks plug in to peer with each other cheaply and efficiently.
  • Convergence The time it takes for all routers in the internet to agree on new routing info after a change. BGP convergence is measured in minutes, not milliseconds. That's why outages can linger.
  • Black hole Traffic gets sent into a prefix but never arrives. Classic symptom of a route leak or misconfiguration. Your packets just disappear silently.

08 WRAPPING UP - You Don't Need to Run BGP But You Should Respect It

Unless you're working at a company large enough to have its own ASN (and many SaaS companies do, eventually), you're unlikely to configure BGP directly. But that's not the point. The point is that BGP is the substrate everything else runs on.

When you're debugging a latency spike and your cloud provider says "there was a BGP event," you now know what that means. When you're designing a multi-region architecture and thinking about failover, you understand why traffic doesn't instantly reroute. When you read a post-mortem about a routing misconfiguration, you can follow the logic instead of glazing over the technical bits.

The internet is not a cloud it's 100,000+ negotiated agreements between autonomous systems, held together by an ancient protocol that runs on trust, TCP port 179, and the best practices of the engineers maintaining it. BGP is elegant in its simplicity and terrifying in its fragility. And every request your users make depends on it working correctly.

Next time your monitoring shows a spike in global latency? Before you blame your code, check BGP. It's probably BGP.