Network Fundamentals
Network Fundamentals
This is the bedrock. Everything else in networking — HTTP, TLS, gRPC, load balancing — is just specialization built on these concepts. Get this right and debugging gets much, much easier.
OSI Model vs TCP/IP Model
The Models
The OSI model has 7 layers. The TCP/IP model (also called the Internet model) has 4. They both describe the same thing — how data travels from one machine to another — but OSI is a conceptual teaching framework while TCP/IP is what actually ships.
| Layer | OSI Name | TCP/IP Layer | Real Protocols |
|---|---|---|---|
| 7 | Application | Application | HTTP, DNS, SMTP, FTP, SSH |
| 6 | Presentation | Application | TLS/SSL, MIME, compression |
| 5 | Session | Application | TLS sessions, RPC, NetBIOS |
| 4 | Transport | Transport | TCP, UDP, SCTP |
| 3 | Network | Internet | IP (v4/v6), ICMP, OSPF, BGP |
| 2 | Data Link | Network Access | Ethernet, Wi-Fi (802.11), ARP |
| 1 | Physical | Network Access | Cables, fiber, radio signals |
In practice: when engineers say “L3” or “L4,” they mean OSI layers. When they say “transport layer,” they mean TCP/UDP. The two vocabularies coexist.
ELI5: OSI is like a university textbook definition of how a restaurant works — front-of-house, kitchen, suppliers, etc. TCP/IP is the actual restaurant: host takes order, kitchen makes food, done. The textbook model has more categories; the real restaurant has what actually works. Use the textbook model to understand concepts, work in the real restaurant every day.
Why This Matters for Debugging
Knowing which layer your problem lives at cuts debugging time dramatically:
- Can’t ping the host at all? → L1/L2 issue. Check physical cable, switch, ARP table.
- Ping works but TCP connection refused? → L3/L4. Check firewall rules,
netstat, port binding. - Connection established but HTTP 502? → L7. Backend process crashed, check app logs.
- Intermittent packet loss? → L1/L2. Check for duplex mismatch, NIC errors, switch port.
Why this matters: Engineers who don’t know their layers waste hours looking at HTTP logs when the problem is a misconfigured MTU. Layers give you a systematic place to start.
Packets, Frames, and Segments
Encapsulation: Headers All the Way Down
When your app sends "Hello" over HTTP, that string gets wrapped in headers at each layer before hitting the wire. This is encapsulation.
App data: [ "Hello" ]
L7 (HTTP): [ HTTP headers | "Hello" ]
L4 (TCP): [ TCP header | HTTP headers | "Hello" ]
L3 (IP): [ IP header | TCP header | HTTP headers | "Hello" ]
L2 (Eth): [ Eth header | IP header | TCP header | HTTP headers | "Hello" | Eth trailer ]
Each wrapper has a specific name:
| Layer | PDU Name | Key Header Fields |
|---|---|---|
| L2 | Frame | Source MAC, Dest MAC, EtherType |
| L3 | Packet | Source IP, Dest IP, TTL, Protocol |
| L4 (TCP) | Segment | Source Port, Dest Port, Seq#, Ack# |
| L4 (UDP) | Datagram | Source Port, Dest Port, Length |
ELI5: Think of sending a letter internationally. You write your letter (app data), put it in an envelope with a recipient name (TCP/HTTP), put that inside a mailer with a full address (IP), and then a courier wraps it in a shipping package with a barcode for the sorting machine (Ethernet frame). Each wrapper is only read by the right handler — the sorting machine only reads the barcode, not your letter.
MTU and Fragmentation
MTU (Maximum Transmission Unit) is the largest single packet a link can carry. Ethernet default: 1500 bytes. Your IP header (20 bytes) + TCP header (20 bytes) = 40 bytes of overhead, leaving 1460 bytes for data per segment (the TCP MSS, Maximum Segment Size).
What happens when a packet is too big? Fragmentation — the router splits it. The receiving end reassembles it. Problems:
- Fragmentation is expensive — CPU overhead on routers
- If any fragment is lost, the entire packet is retransmitted
- Some firewalls drop fragments
- Path MTU Discovery (PMTUD) uses ICMP “Fragmentation Needed” messages to find the lowest MTU on a path — if your firewall blocks ICMP, PMTUD breaks silently
Jumbo frames: MTU up to 9000 bytes, used inside data centers. Never cross the public internet. Saves CPU overhead on storage/backup traffic between servers.
Common mistake: Enabling jumbo frames on servers but not on the switch between them. Packets hit the switch, get dropped (or fragmented if the switch doesn’t support it), and you get mysterious performance degradation that only affects large transfers.
IP Addressing
IPv4: 32 Bits, Running Out Since 2011
IPv4 gives $2^{32}$ ≈ 4.3 billion addresses. Sounds like a lot until you realize the internet has 5 billion people, each with multiple devices. The internet ran out of unallocated IPv4 space in 2011. We survive through NAT (more on that below).
Format: four octets in decimal, separated by dots. 192.168.1.100 = 11000000.10101000.00000001.01100100 in binary.
IPv6: 128 Bits, Plenty of Room
$2^{128}$ ≈ $3.4 \times 10^{38}$ addresses. Every grain of sand on Earth could have a trillion addresses. Format: eight groups of 4 hex digits. 2001:0db8:85a3:0000:0000:8a2e:0370:7334. Leading zeros can be omitted, consecutive all-zero groups collapse to ::.
IPv6 adoption is slow because: existing IPv4 infrastructure works, NAT reduced the pressure, dual-stack deployments are complex, and ISPs/enterprises move slowly. But it’s growing — ~40% of Google traffic is IPv6 now.
Private vs Public IP Ranges
| Range | CIDR | Addresses | Use |
|---|---|---|---|
| 10.0.0.0 – 10.255.255.255 | 10.0.0.0/8 | 16.7 million | Corporate networks, cloud VPCs |
| 172.16.0.0 – 172.31.255.255 | 172.16.0.0/12 | 1 million | Docker default, some corp networks |
| 192.168.0.0 – 192.168.255.255 | 192.168.0.0/16 | 65,536 | Home routers, small offices |
Private IPs are not routable on the public internet — your router at home translates them.
NAT: Network Address Translation
NAT is how your home router lets 50 devices share one public IP. The router maintains a translation table: when device 192.168.1.5:54321 sends a packet to 8.8.8.8:53, the router rewrites the source to 203.0.113.1:54321 (the public IP), records the mapping, and when the response comes back, rewrites the destination back to 192.168.1.5:54321.
ELI5: NAT is like a corporate mail room. Employees use internal extension numbers (private IPs). The mail room has one street address (public IP). When you send a letter out, the mail room puts their address on it. When a reply comes back to the mail room, they look up who originally sent it and deliver it internally. The outside world only ever sees the mail room’s address.
NAT implications for engineers:
- Servers behind NAT can’t receive unsolicited incoming connections without port forwarding
- P2P protocols (WebRTC, game servers) need NAT traversal techniques (STUN, TURN)
- NAT breaks IP-level security assumptions — the source IP in a packet isn’t the true sender
CIDR Notation
192.168.1.0/24 means: the first 24 bits are the network prefix, the remaining 8 bits are for hosts.
$$\text{Hosts} = 2^{(32 - \text{prefix})} - 2$$
The -2 subtracts the network address (all host bits 0) and broadcast address (all host bits 1).
| CIDR | Subnet Mask | Total IPs | Usable Hosts |
|---|---|---|---|
| /8 | 255.0.0.0 | 16,777,216 | 16,777,214 |
| /16 | 255.255.0.0 | 65,536 | 65,534 |
| /24 | 255.255.255.0 | 256 | 254 |
| /28 | 255.255.255.240 | 16 | 14 |
| /30 | 255.255.255.252 | 4 | 2 |
| /32 | 255.255.255.255 | 1 | 1 host (single host route) |
Quick mental math: /24 = 256, /25 = 128, /26 = 64. Each bit you add to the prefix halves the address space.
Subnetting
Why Subnets Exist
Three reasons engineers create subnets instead of one flat network:
- Broadcast control: Every L2 broadcast goes to every device in a subnet. A flat /8 with 16 million devices would drown in ARP broadcasts. Routers don’t forward broadcasts, so subnets contain the noise.
- Security: Put your database in a different subnet with restrictive routing rules. Traffic between subnets goes through a router/firewall you control.
- Organization: Dev, staging, prod in separate subnets. Each team’s services isolated. Cloud VPCs use this to separate public-facing (web servers) from private (databases).
ELI5: Subnets are like neighborhoods in a city. Mail within your neighborhood (subnet) gets delivered by the local postal worker (switch). Mail to another neighborhood goes through the central post office (router). Without neighborhoods, one postal worker would have to know everyone in the entire city — chaos.
Calculating Subnet Ranges
For 10.0.1.0/24:
- Network address:
10.0.1.0(all host bits = 0) - Broadcast address:
10.0.1.255(all host bits = 1) - Usable range:
10.0.1.1–10.0.1.254
For 10.0.0.0/26 (borrowing 2 bits from a /24 gives 4 subnets of 64 addresses each):
- Subnet 0:
10.0.0.0–10.0.0.63 - Subnet 1:
10.0.0.64–10.0.0.127 - Subnet 2:
10.0.0.128–10.0.0.191 - Subnet 3:
10.0.0.192–10.0.0.255
VLSM: Right-Sizing Subnets
VLSM (Variable Length Subnet Masking) lets you allocate different-sized subnets from the same block. A /30 for a point-to-point link (only 2 hosts needed), a /24 for a large office, a /28 for a DMZ.
Cloud patterns you’ll see constantly:
- VPC:
/16(65,536 IPs) — room to grow, not too wasteful - Subnets:
/24per availability zone per tier (web, app, db) - Management/bastion:
/28— tiny, just a few IPs
Common mistake: Making all subnets /24 by habit in a VPC, then running out of subnet space for a /16 VPC. Plan your CIDR hierarchy before you start. You can’t resize subnets in AWS without recreating them.
Routing
Longest Prefix Match: The Core Algorithm
When a router receives a packet, it looks up the destination IP in its routing table and picks the most specific matching route — the one with the longest prefix (highest CIDR number).
Routing table:
0.0.0.0/0 → gateway 1.2.3.4 (default route)
10.0.0.0/8 → gateway 10.0.0.1
10.0.1.0/24 → gateway 10.0.1.1
10.0.1.5/32 → gateway 10.0.1.5
Packet dest: 10.0.1.5
Match candidates: /0, /8, /24, /32 → picks /32 (most specific)
ELI5: It’s like GPS routing. If you’re looking for “123 Main St, Springfield, Illinois, USA,” the router picks the most specific match it knows. It knows about Illinois before it knows about the specific street. The more specific the address it has, the more precisely it can route you.
Default Gateway
The default route 0.0.0.0/0 matches everything — it’s the “I have no idea, send it here” route. Your laptop has a default gateway (your home router). Your home router has a default gateway (your ISP). Your ISP has default routes to major internet exchange points. Every packet eventually gets somewhere via this chain.
Static vs Dynamic Routing
| Type | How | When to use |
|---|---|---|
| Static | You manually configure routes | Small networks, specific security requirements |
| OSPF | Routers auto-discover neighbors, share topology | Enterprise internal routing |
| BGP | Routers exchange reachability info between autonomous systems | Internet routing, cloud multi-homing |
BGP is the protocol that runs the internet. Every ISP, cloud provider, and large company is an Autonomous System (AS) — a network under a single administrative control with an AS number (e.g., AWS is AS16509). BGP is how AS16509 tells the rest of the internet “I have routes to these IP prefixes.” When you see news about “internet routing incidents,” it’s almost always BGP: a misconfiguration causes wrong routes to propagate, and large swaths of the internet become unreachable.
Traceroute: TTL Tricks
Every IP packet has a TTL (Time to Live) field — an integer that each router decrements by 1. When TTL hits 0, the router drops the packet and sends back an ICMP “Time Exceeded” message (with its own IP in the source).
traceroute exploits this: send packets with TTL=1, then TTL=2, then TTL=3… Each router along the path drops the packet when TTL expires and reveals itself via the ICMP response.
traceroute google.com
# Output: hop-by-hop path, RTT for each hop
# On Linux, traceroute uses UDP by default:
traceroute -T google.com # TCP mode (bypasses some firewalls)
traceroute -I google.com # ICMP mode (like Windows tracert)
What traceroute tells you: where latency is added (big RTT jump = slow link or far-away hop), where packets stop (firewall blocking), asymmetric routing (forward and return paths differ).
Common mistake: Trusting traceroute completely. ICMP responses from routers are low priority — a router can forward packets fast but respond to ICMP slowly, making a hop look slower than it is. Use it for direction, not precise measurement.
ARP, DHCP, and ICMP
ARP: Bridging L2 and L3
IP works at L3, but Ethernet frames at L2 use MAC addresses. Before sending a frame, a device needs to know the MAC address for a given IP. That’s ARP’s job.
Device A wants to reach 192.168.1.5:
1. A broadcasts: "Who has 192.168.1.5? Tell 192.168.1.1"
2. Device at .5 replies: "192.168.1.5 is at MAC aa:bb:cc:dd:ee:ff"
3. A caches this mapping in its ARP table
4. A sends the Ethernet frame to that MAC address
arp -n # view ARP table on Linux
ip neigh show # modern equivalent
ARP cache entries expire (usually 5–20 minutes). If a device’s IP changes, stale ARP entries cause brief connectivity loss until they expire or are flushed.
ARP spoofing: A malicious device responds to ARP requests claiming to have the MAC for a gateway IP. All traffic intended for the gateway goes to the attacker instead — a classic man-in-the-middle attack. Only possible on local networks. Modern switches with Dynamic ARP Inspection (DAI) can detect and block this.
ELI5: ARP is like asking “hey, who in this room goes by the name Bob?” out loud (broadcast). Bob raises his hand (unicast reply). You note “Bob is the tall guy in the corner” (cache it). Next time you need Bob, you go directly to him. ARP spoofing is someone else raising their hand and claiming to be Bob.
DHCP: Automatic IP Assignment
DHCP lets devices get an IP address, subnet mask, default gateway, and DNS server automatically. Without DHCP, every device needs manual configuration. The exchange is DORA:
- Discover: Client broadcasts “anyone have a DHCP server?” (UDP, src
0.0.0.0:68, dst255.255.255.255:67) - Offer: Server replies “I can give you
192.168.1.100, valid for 24 hours” - Request: Client broadcasts “I’ll take the offer from server X” (still broadcast — there may be multiple DHCP servers)
- Acknowledge: Server confirms the lease
Leases have a duration. At 50% of lease time, the client tries to renew. If renewal fails, at 87.5% it tries any DHCP server. At expiry it releases the IP.
ELI5: DHCP is like checking into a hotel. You show up (Discover), the front desk offers you a room (Offer), you confirm you want it (Request), they hand you the key (Acknowledge). The key works for your stay (lease duration). If you want to stay longer, you ask to renew before checkout.
ICMP: Network Diagnostics Protocol
ICMP is not a data transport — it’s for control messages and diagnostics. Key message types:
| Type | Name | Used by |
|---|---|---|
| 0 | Echo Reply | ping response |
| 3 | Destination Unreachable | Port closed, host unreachable, fragmentation needed |
| 8 | Echo Request | ping |
| 11 | Time Exceeded | traceroute |
ping -c 4 google.com # 4 packets, RTT stats
ping -s 1400 192.168.1.1 # test with larger packet size (MTU check)
Common mistake: Blocking all ICMP on firewalls for “security.” This breaks Path MTU Discovery (PMTUD) and makes debugging hell. You should block echo requests from the public internet, but always allow ICMP Type 3 (Destination Unreachable) and Type 11 (Time Exceeded) — these are critical for correct operation.
Ports and Sockets
Ports: Demultiplexing Traffic on a Host
An IP address gets traffic to a machine. A port gets traffic to the right process on that machine. 16-bit integer: 0–65535.
| Range | Name | Use |
|---|---|---|
| 0–1023 | Well-known | HTTP (80), HTTPS (443), SSH (22), DNS (53), SMTP (25) |
| 1024–49151 | Registered | MySQL (3306), PostgreSQL (5432), Redis (6379), MongoDB (27017) |
| 49152–65535 | Ephemeral | OS-assigned source ports for outgoing connections |
A socket is the unique combination of (source IP, source port, destination IP, destination port, protocol). This 5-tuple uniquely identifies every connection in the network stack. Two clients can connect to the same server IP:port because they have different source ports.
ss -tuln # show listening sockets (no DNS resolution, numeric ports)
ss -tupn # show established connections with process IDs
netstat -antp # older equivalent, still widely available
ELI5: An IP address is like a building address, a port is like an apartment number. The postal service (OS network stack) delivers mail (packets) to the right apartment (process). The socket is the full address including who’s sending — “apartment 443 in building 1.2.3.4, sent by apartment 54321 in building 5.6.7.8.”
Ephemeral Port Exhaustion
Each outgoing TCP connection consumes one ephemeral port. The default range on Linux is 32768–60999 — about 28,000 ports. A high-traffic service making thousands of connections per second to the same destination can exhaust them.
Symptoms: connect: cannot assign requested address errors, even though CPU/memory are fine.
cat /proc/sys/net/ipv4/ip_local_port_range # current ephemeral range
sysctl -w net.ipv4.ip_local_port_range="1024 65535" # expand range
# Also check TIME_WAIT connections eating ports:
ss -s # socket statistics summary
Solutions: expand the ephemeral port range, enable SO_REUSEADDR/SO_REUSEPORT, reduce TIME_WAIT duration, or use connection pooling to reduce connection churn.
Why this matters: Port exhaustion is a real production issue for API gateways, proxies, and services that make many outbound requests. It’s non-obvious — the machine appears healthy but all new connections fail.
Network Types and Topologies
LAN, WAN, VPN
| Type | Scope | Speed | Latency | Your Control |
|---|---|---|---|---|
| LAN | Building/campus | 1–100 Gbps | <1ms | Full |
| WAN | Cities/countries | 10–100 Gbps (backbone) | 10–200ms | ISP-dependent |
| VPN | Any — tunneled over WAN | Limited by underlying WAN | +5–20ms overhead | Encryption layer only |
VPNs create an encrypted tunnel. Traffic exits your VPN client, gets encrypted, travels to the VPN endpoint, gets decrypted, then goes to the destination. You trade latency and bandwidth for privacy and simulated co-location.
Cloud Networking: VPCs
A VPC (Virtual Private Cloud) is your private network inside a cloud provider. Key components:
- Subnets: Subdivisions of the VPC CIDR, tied to availability zones
- Route tables: Attached to subnets, control where traffic goes
- Security Groups: Stateful firewall at the instance/ENI level (allow rules only)
- NACLs: Stateless firewall at the subnet level (allow and deny rules)
- Internet Gateway: The door between your VPC and the public internet
- NAT Gateway: Lets private subnet instances reach the internet without being reachable from it
ELI5: A VPC is like a private office building. The VPC is the building. Subnets are individual floors. Security groups are the locks on each office door. NACLs are the badge readers at each floor’s elevator. The Internet Gateway is the lobby entrance. The NAT Gateway is the mail slot — packages can go out, but nothing gets in unsolicited.
Overlay Networks: VXLAN and Container Networking
Physical networks use VLANs (max 4096) to segment traffic. At cloud scale, you need millions of virtual networks. VXLAN (Virtual Extensible LAN) solves this by encapsulating Ethernet frames inside UDP packets — an overlay over the existing IP network. VXLAN supports 16 million virtual network IDs.
Docker and Kubernetes use overlay networks built on VXLAN (or similar: Flannel, Calico, Cilium):
- Each pod gets an IP in a virtual network
- Traffic between pods on different hosts gets VXLAN-encapsulated and routed via the host network
- From the pod’s perspective, it’s just talking to another IP address
# Inspect Docker network
docker network ls
docker network inspect bridge
# View routing in a Kubernetes pod
kubectl exec -it <pod> -- ip route
kubectl exec -it <pod> -- ip addr
ELI5: VXLAN is like putting a letter inside another letter. The outer envelope (UDP/IP) is addressed to the physical machine. The inner envelope (Ethernet frame) is addressed to the virtual container. The outer envelope gets the data to the right building; the inner envelope gets it to the right tenant.
Summary: Quick Reference
Layer Reference
| When you see… | Layer | Likely cause |
|---|---|---|
| Can’t ping, ARP failing | L1–L2 | Cable, switch, VLAN |
| Ping works, TCP RST | L3–L4 | Firewall, routing, closed port |
| TCP connects, HTTP 5xx | L7 | Application, load balancer config |
| High latency, packet loss | L1–L3 | Congestion, bad link, routing loop |
Address Types at a Glance
| Address | Length | Scope | Changes? |
|---|---|---|---|
| MAC | 48 bits (6 bytes) | Local segment only | No (hardware-assigned) |
| IPv4 | 32 bits | Global (public) or local (private) | Yes (DHCP, NAT) |
| IPv6 | 128 bits | Global | Rarely |
| Port | 16 bits | Per-host process identifier | Per connection (ephemeral) |
Subnet Quick Reference
| CIDR | Mask | Hosts | Common Use |
|---|---|---|---|
| /8 | 255.0.0.0 | ~16M | Large private networks |
| /16 | 255.255.0.0 | ~65K | Cloud VPCs |
| /24 | 255.255.255.0 | 254 | Subnets, typical floor |
| /28 | 255.255.255.240 | 14 | Small segments, DMZ |
| /30 | 255.255.255.252 | 2 | Point-to-point links |
| /32 | 255.255.255.255 | 1 | Single host routes |
The Debugging Ladder
Is the host reachable at all? → ping
└─ No → check L1/L2: arp -n, check switch
└─ Yes → Is the port open? → nc -zv host port
└─ No → check firewall, ss -tuln on target
└─ Yes → Is the app responding? → curl -v
└─ No → check app logs, L7 config
Next up: TCP & UDP Deep Dive — handshakes, flow control, congestion, and why TCP’s reliability costs you latency.