← Docker & Containers Advanced

Container Networking

Container Networking

Network issues are the #1 production pain point for containers. If you don’t understand how container networking works at the iptables/veth level, you’ll spend hours debugging problems that take minutes once you see the architecture.


Docker Network Drivers

DriverScopeUse CaseContainer-to-ContainerExternal Access
bridgeSingle hostDefault. Dev/test.Via bridge IPPort mapping (-p)
hostSingle hostMax network performanceVia localhostDirect (no NAT)
noneSingle hostFully isolatedNo networkingNone
overlayMulti-hostSwarm/K8s clustersAcross hosts via VXLANVia ingress/load balancer
macvlanSingle hostLegacy integrationEach container gets real MACDirect on physical network
ipvlanSingle hostLike macvlan, shared MACShares host MAC addressDirect on physical network

ELI5: Bridge is like a home WiFi router — all your devices connect to it and share one public IP. Host is like plugging directly into the modem — fastest, but no isolation. Overlay is like a VPN — containers on different machines talk as if they’re on the same network. Macvlan gives each container its own “phone number” on the physical network.


Bridge Network (Default)

Every Docker installation creates a docker0 bridge. When you run a container without specifying a network, it connects here.

┌─────────────┐  ┌─────────────┐
│ Container A  │  │ Container B  │
│  eth0        │  │  eth0        │
│  172.17.0.2  │  │  172.17.0.3  │
└──────┬───────┘  └──────┬───────┘
       │ veth            │ veth
       │                 │
┌──────┴─────────────────┴───────┐
│         docker0 bridge          │
│         172.17.0.1              │
└────────────┬────────────────────┘
             │ NAT (iptables)
┌────────────┴────────────────────┐
│         Host eth0               │
│         192.168.1.100           │
└─────────────────────────────────┘

Default Bridge vs User-Defined Bridge

FeatureDefault bridge (docker0)User-defined bridge
DNS resolutionNO — only IP addressesYES — containers resolve by name
Automatic connectionAll containers by defaultOnly explicitly connected containers
IsolationAll containers can see each otherOnly containers on same network
Live connect/disconnectNoYes (docker network connect/disconnect)
Link legacy support--link needed for name resolutionBuilt-in DNS

Why this matters: On the default bridge, containers can only reach each other by IP address. IPs change on restart. This is why beginners hardcode IPs and then everything breaks. Always use user-defined bridge networks — they provide DNS-based service discovery automatically.

Common mistake: Using the default bridge network in production. Always create a named network: docker network create myapp. The DNS resolution alone is worth it.

How Port Mapping Works

docker run -p 8080:80 does this under the hood:

  1. Docker adds an iptables DNAT rule: traffic to host:8080 → container:80
  2. Docker adds an iptables MASQUERADE rule for return traffic
  3. docker-proxy process listens on host:8080 as a fallback (for non-TCP/UDP, hairpin NAT)
# See the actual iptables rules Docker creates
iptables -t nat -L -n | grep DNAT
# DNAT  tcp  --  0.0.0.0/0  0.0.0.0/0  tcp dpt:8080 to:172.17.0.2:80

Think of it this way: Port mapping is like a receptionist at a hotel. You call the hotel’s main number (host:8080), and the receptionist forwards your call to Room 204 (container:80). The iptables DNAT rule IS the receptionist.


Host Network

docker run --network host — container uses the host’s network stack directly. No network namespace.

When to use:

  • Maximum network performance (no NAT overhead, no bridge hop)
  • Applications that need to bind to many ports dynamically
  • Network monitoring tools that need to see all host traffic

When NOT to use:

  • Multiple containers that need the same port (port conflicts)
  • Any security-sensitive deployment (no network isolation at all)
  • Production services (usually) — you lose container network isolation

ELI5: Host network is like giving your container a master key to the entire building’s phone system. It can use any phone line (port) directly, but it can also accidentally pick up calls meant for other residents.


Overlay Network (Multi-Host)

Overlay networks connect containers across multiple Docker hosts. Used by Docker Swarm and Kubernetes.

Host A (192.168.1.10)              Host B (192.168.1.20)
┌──────────────────┐              ┌──────────────────┐
│ Container A      │              │ Container B      │
│ 10.0.0.2         │              │ 10.0.0.3         │
└───────┬──────────┘              └───────┬──────────┘
        │                                 │
┌───────┴──────────┐              ┌───────┴──────────┐
│ Overlay network  │              │ Overlay network  │
│ VXLAN tunnel     ├──────────────┤ VXLAN tunnel     │
│ (encapsulates)   │  UDP 4789    │ (decapsulates)   │
└───────┬──────────┘              └───────┴──────────┘
        │                                 │
┌───────┴──────────────────────────────────┴──────────┐
│                Physical Network                       │
└──────────────────────────────────────────────────────┘

How it works:

  1. Container A sends a packet to 10.0.0.3 (Container B’s overlay IP)
  2. Docker encapsulates the packet inside a VXLAN UDP packet (port 4789)
  3. Outer header: Host A → Host B
  4. Host B’s VTEP decapsulates and delivers to Container B
  5. Container B sees a packet from 10.0.0.2 — no idea it crossed physical networks

ELI5: Overlay networking is like putting a letter inside another envelope. The inner envelope says “To: Container B.” The outer envelope says “To: Host B.” The post office (physical network) only reads the outer envelope. Host B opens the outer envelope and delivers the inner letter to the right container.

Performance impact: VXLAN adds ~50 bytes overhead per packet and CPU cost for encap/decap. For most workloads, negligible. For high-throughput, latency-sensitive workloads (10Gbps+), consider macvlan or host networking.


Macvlan and IPvlan

Both give containers direct access to the physical network — no NAT, no bridge.

FeatureMacvlanIPvlan L2IPvlan L3
MAC addressUnique per containerShared with hostShared with host
L2 adjacencyYesYesNo (routed)
Promiscuous modeRequired on parentNot requiredNot required
Cloud compatibilityOften blocked (AWS, etc.)Better cloud supportBest cloud support
Use caseLegacy VLANs, bare metalCloud, single MAC requiredPure routing, no broadcast

When to use macvlan/ipvlan: You have legacy applications that need to be on a specific VLAN, need direct L2 connectivity, or need performance without NAT overhead. Common in network appliance containers and telecom workloads.

Common mistake: Using macvlan in AWS/GCP/Azure. Cloud providers typically don’t allow multiple MAC addresses per NIC (or charge for it). Use ipvlan L2 instead.


DNS and Service Discovery

Docker’s Built-in DNS (User-Defined Networks)

Docker runs an embedded DNS server at 127.0.0.11 for containers on user-defined networks.

Container A → DNS query "web" → 127.0.0.11 → resolves to 172.18.0.3

Resolution order:

  1. Container’s /etc/hosts entries
  2. Docker’s embedded DNS (container name → IP)
  3. Host’s DNS configuration (for external domains)

Docker Compose Service Discovery

services:
  web:
    image: nginx
  api:
    image: myapi
    # can reach nginx at http://web:80 — service name = DNS name

In Compose, the service name IS the DNS name. All services on the same network can resolve each other by name.

ELI5: Docker DNS works like a hotel directory. You don’t need to know what room number (IP) the “restaurant” (container) is in — you just ask the front desk (DNS server) for “restaurant” and they tell you the room number. If the restaurant moves to a different room (container restarts with new IP), the directory updates automatically.

Kubernetes DNS (CoreDNS)

<service-name>.<namespace>.svc.cluster.local

Every Service in K8s gets a DNS entry. Pods resolve services by name within the same namespace or by FQDN across namespaces.

# Same namespace
curl http://api-service:8080

# Cross namespace
curl http://api-service.production.svc.cluster.local:8080

Kubernetes Networking Model

K8s has three fundamental networking requirements:

  1. Every pod gets its own IP — no NAT between pods
  2. Pods can communicate with any other pod — across nodes, without NAT
  3. The IP a pod sees itself as = the IP others see it as

CNI Plugins

The Container Network Interface (CNI) is how K8s implements networking. The cluster operator chooses a CNI plugin.

CNI PluginNetworkingNetwork PolicyExtra FeaturesBest For
CalicoL3 (BGP), VXLAN, IPIPYes (full)eBPF dataplane optionGeneral purpose, performance
CiliumeBPF-basedYes (L3-L7)Service mesh, observability, encryptionAdvanced security, L7 policies
FlannelVXLAN overlayNo (needs Calico for policy)Simple, minimal configSmall clusters, simplicity
WeaveVXLAN, sleeveYesEncryption, multicastSmall/medium clusters
AWS VPC CNIAWS ENI-basedYes (with Calico)Native VPC IP per podAWS EKS (default)
Azure CNIAzure VNETYesNative Azure IP per podAKS (default)

Decision framework: Starting a new cluster? Cilium if you want eBPF + L7 policies + service mesh. Calico if you want proven stability + good performance. Flannel if you want simplicity and don’t need network policies. Cloud-managed cluster? Use the cloud’s default CNI (VPC CNI on EKS, Azure CNI on AKS) unless you need features they don’t provide.

Service Types

TypeHow It WorksUse Case
ClusterIPInternal IP reachable only within clusterService-to-service communication
NodePortOpens a port (30000-32767) on every nodeDev/test, direct node access
LoadBalancerProvisions cloud LB pointing to NodePortsProduction external traffic
ExternalNameCNAME record to external DNSProxy to external services
Client → Cloud LB → NodePort on any node → kube-proxy → Pod

Common mistake: Using NodePort in production. NodePort exposes a port on EVERY node, limits you to ports 30000-32767, and doesn’t load-balance well. Use LoadBalancer (cloud) or Ingress (for HTTP).

Ingress and Gateway API

FeatureIngress (legacy)Gateway API (modern)
MaturityStable, widely supportedGA since K8s 1.26
L7 routingHost/path-basedHost/path/header/method-based
TCP/UDP supportDepends on controllerNative (TCPRoute, UDPRoute)
Multi-tenancyNo built-in modelGatewayClass for shared infra
TLS terminationYesYes, more flexible
Traffic splittingController-specific annotationsNative (weight-based)

Think of it this way: Ingress is like a simple reception desk — it looks at your name and directs you to the right room. Gateway API is like a smart building management system — it can route you based on your name, your department, your badge type, and even split visitors across multiple rooms for A/B testing.


Network Policies (Kubernetes)

By default, all pods can talk to all pods. Network Policies restrict this.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
spec:
  podSelector: {}      # applies to all pods in namespace
  policyTypes:
  - Ingress             # block all incoming traffic
  # no ingress rules = deny all

Key principles:

  • Network Policies are additive — if any policy allows traffic, it’s allowed
  • An empty podSelector {} means “all pods in this namespace”
  • If no NetworkPolicy selects a pod, all traffic is allowed (default allow)
  • Once ANY NetworkPolicy selects a pod, all non-matching traffic is denied (default deny for that pod)

Common mistake: Assuming NetworkPolicies are enforced by default. They’re NOT — your CNI plugin must support them. Flannel doesn’t. Calico and Cilium do. Deploy Flannel + NetworkPolicies = false sense of security.

Interview pattern — “Defense in depth”:

  1. Default deny all ingress in every namespace
  2. Explicitly allow only needed communication paths
  3. Use labels for pod selection (not IPs — IPs change)
  4. Separate namespaces for different trust boundaries

Debugging Network Issues

Essential Commands

# What network is a container on?
docker inspect <container> | jq '.[0].NetworkSettings.Networks'

# DNS resolution inside container
docker exec <container> nslookup <service-name>

# See all iptables NAT rules Docker created
iptables -t nat -L -n -v

# Trace packet path (on host)
tcpdump -i docker0 -n port 80

# K8s: debug DNS resolution
kubectl run dnstest --image=busybox:1.36 --rm -it -- nslookup kubernetes.default

# K8s: check if NetworkPolicy is blocking
kubectl describe networkpolicy -n <namespace>

# K8s: check service endpoints
kubectl get endpoints <service-name>

Common Problems & Root Causes

SymptomLikely CauseFix
Container can’t reach internetMissing NAT rule or DNS configCheck iptables -t nat -L, check /etc/resolv.conf
Container can’t reach other container by nameUsing default bridge (no DNS)Use user-defined network
Service unreachable in K8sNo endpoints (pods not ready)kubectl get endpoints, check readiness probes
Intermittent timeouts in K8sDNS resolution issues (ndots:5)Add dnsConfig.options: [{name: ndots, value: "2"}]
Cross-node pod communication failsCNI plugin misconfigured, firewallCheck CNI pods, check node firewall rules for VXLAN (4789)

Why this matters: The ndots:5 default in Kubernetes means EVERY DNS query tries 5 search domains before going external. A lookup for api.example.com generates 6 queries instead of 1. This causes latency and DNS server load. Setting ndots: 2 fixes most cases.


Key Takeaways for Interviews

  1. “Explain Docker networking” → veth pairs connect containers to bridges. iptables DNAT handles port mapping. User-defined bridges provide DNS. Overlay uses VXLAN for multi-host.
  2. “How does K8s networking work?” → CNI plugin assigns IPs. Every pod gets a routable IP. kube-proxy (iptables/IPVS) implements Services. CoreDNS handles discovery.
  3. “Bridge vs overlay vs macvlan?” → Bridge = single host + NAT. Overlay = multi-host + VXLAN encapsulation. Macvlan = direct physical network access.
  4. “How do you secure pod communication?” → Network Policies (deny all + explicit allow), mTLS via service mesh (Istio/Cilium), namespace isolation.
  5. “CNI plugin selection?” → Cilium for eBPF + L7. Calico for stability + performance. Cloud default for managed clusters.