Container Networking
Container Networking
Network issues are the #1 production pain point for containers. If you don’t understand how container networking works at the iptables/veth level, you’ll spend hours debugging problems that take minutes once you see the architecture.
Docker Network Drivers
| Driver | Scope | Use Case | Container-to-Container | External Access |
|---|---|---|---|---|
| bridge | Single host | Default. Dev/test. | Via bridge IP | Port mapping (-p) |
| host | Single host | Max network performance | Via localhost | Direct (no NAT) |
| none | Single host | Fully isolated | No networking | None |
| overlay | Multi-host | Swarm/K8s clusters | Across hosts via VXLAN | Via ingress/load balancer |
| macvlan | Single host | Legacy integration | Each container gets real MAC | Direct on physical network |
| ipvlan | Single host | Like macvlan, shared MAC | Shares host MAC address | Direct on physical network |
ELI5: Bridge is like a home WiFi router — all your devices connect to it and share one public IP. Host is like plugging directly into the modem — fastest, but no isolation. Overlay is like a VPN — containers on different machines talk as if they’re on the same network. Macvlan gives each container its own “phone number” on the physical network.
Bridge Network (Default)
Every Docker installation creates a docker0 bridge. When you run a container without specifying a network, it connects here.
┌─────────────┐ ┌─────────────┐
│ Container A │ │ Container B │
│ eth0 │ │ eth0 │
│ 172.17.0.2 │ │ 172.17.0.3 │
└──────┬───────┘ └──────┬───────┘
│ veth │ veth
│ │
┌──────┴─────────────────┴───────┐
│ docker0 bridge │
│ 172.17.0.1 │
└────────────┬────────────────────┘
│ NAT (iptables)
┌────────────┴────────────────────┐
│ Host eth0 │
│ 192.168.1.100 │
└─────────────────────────────────┘
Default Bridge vs User-Defined Bridge
| Feature | Default bridge (docker0) | User-defined bridge |
|---|---|---|
| DNS resolution | NO — only IP addresses | YES — containers resolve by name |
| Automatic connection | All containers by default | Only explicitly connected containers |
| Isolation | All containers can see each other | Only containers on same network |
| Live connect/disconnect | No | Yes (docker network connect/disconnect) |
| Link legacy support | --link needed for name resolution | Built-in DNS |
Why this matters: On the default bridge, containers can only reach each other by IP address. IPs change on restart. This is why beginners hardcode IPs and then everything breaks. Always use user-defined bridge networks — they provide DNS-based service discovery automatically.
Common mistake: Using the default bridge network in production. Always create a named network: docker network create myapp. The DNS resolution alone is worth it.
How Port Mapping Works
docker run -p 8080:80 does this under the hood:
- Docker adds an iptables DNAT rule: traffic to host:8080 → container:80
- Docker adds an iptables MASQUERADE rule for return traffic
docker-proxyprocess listens on host:8080 as a fallback (for non-TCP/UDP, hairpin NAT)
# See the actual iptables rules Docker creates
iptables -t nat -L -n | grep DNAT
# DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80
Think of it this way: Port mapping is like a receptionist at a hotel. You call the hotel’s main number (host:8080), and the receptionist forwards your call to Room 204 (container:80). The iptables DNAT rule IS the receptionist.
Host Network
docker run --network host — container uses the host’s network stack directly. No network namespace.
When to use:
- Maximum network performance (no NAT overhead, no bridge hop)
- Applications that need to bind to many ports dynamically
- Network monitoring tools that need to see all host traffic
When NOT to use:
- Multiple containers that need the same port (port conflicts)
- Any security-sensitive deployment (no network isolation at all)
- Production services (usually) — you lose container network isolation
ELI5: Host network is like giving your container a master key to the entire building’s phone system. It can use any phone line (port) directly, but it can also accidentally pick up calls meant for other residents.
Overlay Network (Multi-Host)
Overlay networks connect containers across multiple Docker hosts. Used by Docker Swarm and Kubernetes.
Host A (192.168.1.10) Host B (192.168.1.20)
┌──────────────────┐ ┌──────────────────┐
│ Container A │ │ Container B │
│ 10.0.0.2 │ │ 10.0.0.3 │
└───────┬──────────┘ └───────┬──────────┘
│ │
┌───────┴──────────┐ ┌───────┴──────────┐
│ Overlay network │ │ Overlay network │
│ VXLAN tunnel ├──────────────┤ VXLAN tunnel │
│ (encapsulates) │ UDP 4789 │ (decapsulates) │
└───────┬──────────┘ └───────┴──────────┘
│ │
┌───────┴──────────────────────────────────┴──────────┐
│ Physical Network │
└──────────────────────────────────────────────────────┘
How it works:
- Container A sends a packet to 10.0.0.3 (Container B’s overlay IP)
- Docker encapsulates the packet inside a VXLAN UDP packet (port 4789)
- Outer header: Host A → Host B
- Host B’s VTEP decapsulates and delivers to Container B
- Container B sees a packet from 10.0.0.2 — no idea it crossed physical networks
ELI5: Overlay networking is like putting a letter inside another envelope. The inner envelope says “To: Container B.” The outer envelope says “To: Host B.” The post office (physical network) only reads the outer envelope. Host B opens the outer envelope and delivers the inner letter to the right container.
Performance impact: VXLAN adds ~50 bytes overhead per packet and CPU cost for encap/decap. For most workloads, negligible. For high-throughput, latency-sensitive workloads (10Gbps+), consider macvlan or host networking.
Macvlan and IPvlan
Both give containers direct access to the physical network — no NAT, no bridge.
| Feature | Macvlan | IPvlan L2 | IPvlan L3 |
|---|---|---|---|
| MAC address | Unique per container | Shared with host | Shared with host |
| L2 adjacency | Yes | Yes | No (routed) |
| Promiscuous mode | Required on parent | Not required | Not required |
| Cloud compatibility | Often blocked (AWS, etc.) | Better cloud support | Best cloud support |
| Use case | Legacy VLANs, bare metal | Cloud, single MAC required | Pure routing, no broadcast |
When to use macvlan/ipvlan: You have legacy applications that need to be on a specific VLAN, need direct L2 connectivity, or need performance without NAT overhead. Common in network appliance containers and telecom workloads.
Common mistake: Using macvlan in AWS/GCP/Azure. Cloud providers typically don’t allow multiple MAC addresses per NIC (or charge for it). Use ipvlan L2 instead.
DNS and Service Discovery
Docker’s Built-in DNS (User-Defined Networks)
Docker runs an embedded DNS server at 127.0.0.11 for containers on user-defined networks.
Container A → DNS query "web" → 127.0.0.11 → resolves to 172.18.0.3
Resolution order:
- Container’s
/etc/hostsentries - Docker’s embedded DNS (container name → IP)
- Host’s DNS configuration (for external domains)
Docker Compose Service Discovery
services:
web:
image: nginx
api:
image: myapi
# can reach nginx at http://web:80 — service name = DNS name
In Compose, the service name IS the DNS name. All services on the same network can resolve each other by name.
ELI5: Docker DNS works like a hotel directory. You don’t need to know what room number (IP) the “restaurant” (container) is in — you just ask the front desk (DNS server) for “restaurant” and they tell you the room number. If the restaurant moves to a different room (container restarts with new IP), the directory updates automatically.
Kubernetes DNS (CoreDNS)
<service-name>.<namespace>.svc.cluster.local
Every Service in K8s gets a DNS entry. Pods resolve services by name within the same namespace or by FQDN across namespaces.
# Same namespace
curl http://api-service:8080
# Cross namespace
curl http://api-service.production.svc.cluster.local:8080
Kubernetes Networking Model
K8s has three fundamental networking requirements:
- Every pod gets its own IP — no NAT between pods
- Pods can communicate with any other pod — across nodes, without NAT
- The IP a pod sees itself as = the IP others see it as
CNI Plugins
The Container Network Interface (CNI) is how K8s implements networking. The cluster operator chooses a CNI plugin.
| CNI Plugin | Networking | Network Policy | Extra Features | Best For |
|---|---|---|---|---|
| Calico | L3 (BGP), VXLAN, IPIP | Yes (full) | eBPF dataplane option | General purpose, performance |
| Cilium | eBPF-based | Yes (L3-L7) | Service mesh, observability, encryption | Advanced security, L7 policies |
| Flannel | VXLAN overlay | No (needs Calico for policy) | Simple, minimal config | Small clusters, simplicity |
| Weave | VXLAN, sleeve | Yes | Encryption, multicast | Small/medium clusters |
| AWS VPC CNI | AWS ENI-based | Yes (with Calico) | Native VPC IP per pod | AWS EKS (default) |
| Azure CNI | Azure VNET | Yes | Native Azure IP per pod | AKS (default) |
Decision framework: Starting a new cluster? Cilium if you want eBPF + L7 policies + service mesh. Calico if you want proven stability + good performance. Flannel if you want simplicity and don’t need network policies. Cloud-managed cluster? Use the cloud’s default CNI (VPC CNI on EKS, Azure CNI on AKS) unless you need features they don’t provide.
Service Types
| Type | How It Works | Use Case |
|---|---|---|
| ClusterIP | Internal IP reachable only within cluster | Service-to-service communication |
| NodePort | Opens a port (30000-32767) on every node | Dev/test, direct node access |
| LoadBalancer | Provisions cloud LB pointing to NodePorts | Production external traffic |
| ExternalName | CNAME record to external DNS | Proxy to external services |
Client → Cloud LB → NodePort on any node → kube-proxy → Pod
Common mistake: Using NodePort in production. NodePort exposes a port on EVERY node, limits you to ports 30000-32767, and doesn’t load-balance well. Use LoadBalancer (cloud) or Ingress (for HTTP).
Ingress and Gateway API
| Feature | Ingress (legacy) | Gateway API (modern) |
|---|---|---|
| Maturity | Stable, widely supported | GA since K8s 1.26 |
| L7 routing | Host/path-based | Host/path/header/method-based |
| TCP/UDP support | Depends on controller | Native (TCPRoute, UDPRoute) |
| Multi-tenancy | No built-in model | GatewayClass for shared infra |
| TLS termination | Yes | Yes, more flexible |
| Traffic splitting | Controller-specific annotations | Native (weight-based) |
Think of it this way: Ingress is like a simple reception desk — it looks at your name and directs you to the right room. Gateway API is like a smart building management system — it can route you based on your name, your department, your badge type, and even split visitors across multiple rooms for A/B testing.
Network Policies (Kubernetes)
By default, all pods can talk to all pods. Network Policies restrict this.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
spec:
podSelector: {} # applies to all pods in namespace
policyTypes:
- Ingress # block all incoming traffic
# no ingress rules = deny all
Key principles:
- Network Policies are additive — if any policy allows traffic, it’s allowed
- An empty podSelector
{}means “all pods in this namespace” - If no NetworkPolicy selects a pod, all traffic is allowed (default allow)
- Once ANY NetworkPolicy selects a pod, all non-matching traffic is denied (default deny for that pod)
Common mistake: Assuming NetworkPolicies are enforced by default. They’re NOT — your CNI plugin must support them. Flannel doesn’t. Calico and Cilium do. Deploy Flannel + NetworkPolicies = false sense of security.
Interview pattern — “Defense in depth”:
- Default deny all ingress in every namespace
- Explicitly allow only needed communication paths
- Use labels for pod selection (not IPs — IPs change)
- Separate namespaces for different trust boundaries
Debugging Network Issues
Essential Commands
# What network is a container on?
docker inspect <container> | jq '.[0].NetworkSettings.Networks'
# DNS resolution inside container
docker exec <container> nslookup <service-name>
# See all iptables NAT rules Docker created
iptables -t nat -L -n -v
# Trace packet path (on host)
tcpdump -i docker0 -n port 80
# K8s: debug DNS resolution
kubectl run dnstest --image=busybox:1.36 --rm -it -- nslookup kubernetes.default
# K8s: check if NetworkPolicy is blocking
kubectl describe networkpolicy -n <namespace>
# K8s: check service endpoints
kubectl get endpoints <service-name>
Common Problems & Root Causes
| Symptom | Likely Cause | Fix |
|---|---|---|
| Container can’t reach internet | Missing NAT rule or DNS config | Check iptables -t nat -L, check /etc/resolv.conf |
| Container can’t reach other container by name | Using default bridge (no DNS) | Use user-defined network |
| Service unreachable in K8s | No endpoints (pods not ready) | kubectl get endpoints, check readiness probes |
| Intermittent timeouts in K8s | DNS resolution issues (ndots:5) | Add dnsConfig.options: [{name: ndots, value: "2"}] |
| Cross-node pod communication fails | CNI plugin misconfigured, firewall | Check CNI pods, check node firewall rules for VXLAN (4789) |
Why this matters: The
ndots:5default in Kubernetes means EVERY DNS query tries 5 search domains before going external. A lookup forapi.example.comgenerates 6 queries instead of 1. This causes latency and DNS server load. Settingndots: 2fixes most cases.
Key Takeaways for Interviews
- “Explain Docker networking” → veth pairs connect containers to bridges. iptables DNAT handles port mapping. User-defined bridges provide DNS. Overlay uses VXLAN for multi-host.
- “How does K8s networking work?” → CNI plugin assigns IPs. Every pod gets a routable IP. kube-proxy (iptables/IPVS) implements Services. CoreDNS handles discovery.
- “Bridge vs overlay vs macvlan?” → Bridge = single host + NAT. Overlay = multi-host + VXLAN encapsulation. Macvlan = direct physical network access.
- “How do you secure pod communication?” → Network Policies (deny all + explicit allow), mTLS via service mesh (Istio/Cilium), namespace isolation.
- “CNI plugin selection?” → Cilium for eBPF + L7. Calico for stability + performance. Cloud default for managed clusters.