← Networking Mastery — Fundamentals to Principal

WebSockets, SSE & Real-time Communication

12 min read 2533 words

WebSockets, SSE & Real-time Communication

HTTP was designed for a world where the client always starts the conversation. That assumption held for decades. Then we built chat apps, live dashboards, multiplayer games, and collaborative editors — and suddenly the server needed to talk first. This note is about how the web learned to handle that.

The Real-time Problem

HTTP is fundamentally request-response. Client sends a request, server responds, connection closes (or persists for reuse, but still idle). The server has no way to say “hey, something happened” without the client asking first.

This created a set of hacks before real solutions existed.

Polling

Client sends a request every N seconds: “anything new?” Server answers yes or no.

Problems: latency equals your poll interval (1s poll = up to 1s delay). Most responses are empty “no, nothing new.” At scale, you’re hammering your servers with useless requests. 1000 users polling every second = 1000 req/sec of nothing.

Long Polling

Client sends a request. Server holds it open — doesn’t respond until there’s actual data (or a timeout). Client gets the response, immediately opens a new long-poll request.

Better than polling because you get near-instant delivery. But HTTP wasn’t designed for held-open connections. Every held connection ties up a server thread (in traditional thread-per-request models). Timeouts and reconnection logic get messy. Proxies and load balancers sometimes kill idle connections.

This is how Comet worked. It shipped real products — Gmail used it. But it was always fighting the protocol.

ELI5: Polling is like calling a restaurant every 5 minutes asking if your table is ready. Long polling is like calling and staying on hold until they tell you. WebSockets is like them calling you back when it’s ready.

WebSockets

WebSocket is a full-duplex, persistent connection over TCP. Once established, either side can send data at any time. The connection stays open until explicitly closed.

The Upgrade Handshake

WebSocket starts as HTTP. This is clever — it means it works through the same port 80/443, through existing infrastructure.

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After the 101 Switching Protocols response, the TCP connection is handed off. It’s no longer HTTP. The same bytes on the wire are now speaking WebSocket framing.

The Sec-WebSocket-Key and Sec-WebSocket-Accept are a challenge-response to prevent cross-protocol attacks. The server hashes the key with a magic GUID using SHA-1 and returns the base64 result. It’s not cryptographic security — it just prevents a plain HTTP cache from being tricked into replaying a WebSocket handshake.

Frame-Based Protocol

After handshake, communication is frame-based:

Frame Type	Opcode	Purpose
Text	0x1	UTF-8 text data
Binary	0x2	Binary data
Close	0x8	Graceful close
Ping	0x9	Heartbeat check
Pong	0xA	Heartbeat reply
Continuation	0x0	Fragment of previous frame

A WebSocket frame has: a 2-byte header (FIN bit, opcode, MASK bit, payload length), optional extended length, optional masking key, then payload. Client-to-server frames are always masked; server-to-client are not.

Message fragmentation: a large message can be split into continuation frames. The FIN bit marks the last fragment. This lets the protocol handle arbitrarily large messages without buffering the whole thing.

No HTTP headers per message — headers were negotiated once at handshake. Sending a 20-byte message costs ~6 bytes of framing overhead versus 500+ bytes for an HTTP request.

ws:// vs wss://

ws:// — plain TCP. wss:// — TLS. Always use wss:// in production. Same reasons as HTTP vs HTTPS, plus: some corporate proxies actively block or corrupt unencrypted WebSocket traffic.

ELI5: WebSocket is like upgrading from sending letters (HTTP) to getting a phone line installed. First you send a letter to ask for the phone line (the HTTP upgrade), then you just talk directly anytime.

Common mistake: Opening a new WebSocket connection per message. One connection, many messages. Connections are expensive. If you’re creating WebSocket connections like HTTP requests, you’re doing it wrong.

Server-Sent Events (SSE)

SSE is the often-overlooked middle ground. It’s simpler than WebSocket and sufficient for most use cases.

The idea: the server sends a response that never ends. The Content-Type is text/event-stream. The browser reads it as a stream and fires events as data arrives.

Wire Format

data: Hello world\n\n

event: user-joined\n
data: {"userId": "abc123"}\n\n

id: 42\n
event: message\n
data: {"text": "hi"}\n
retry: 3000\n\n

Each event is separated by a blank line. Fields:

data: — the payload (can be multi-line; last line has the blank terminator)
event: — custom event type (default is “message”)
id: — event ID; browser stores last received ID
retry: — reconnect delay in milliseconds

Automatic Reconnection

This is SSE’s killer feature. If the connection drops, the browser automatically reconnects using the Last-Event-ID header. The server can then resume from where it left off. You get reliable delivery for free, without writing reconnection logic.

const es = new EventSource('/events');
es.addEventListener('user-joined', (e) => {
  const data = JSON.parse(e.data);
});

SSE is just HTTP. It works through HTTP/1.1 and HTTP/2 (where each SSE stream is one HTTP/2 stream — you can have many). It works through reverse proxies and CDNs that understand HTTP. It works through corporate firewalls that block WebSocket upgrades.

ELI5: SSE is like subscribing to a news feed. You open the paper once and new articles keep appearing. You don’t need to ask for each one.

Common mistake: Using WebSocket when you only need server-push. SSE is less code, better proxy support, and auto-reconnects. If the client never sends real-time data, SSE is almost always the right choice.

WebSocket vs SSE vs Long Polling

Feature	Long Polling	SSE	WebSocket
Direction	Server → Client	Server → Client	Bidirectional
Protocol	HTTP	HTTP	WebSocket
Auto-reconnect	Manual	Built-in	Manual
Browser support	Universal	All modern browsers	All modern browsers
Proxy/firewall	Works	Works	Sometimes broken
Multiplexing	One req/response	One stream	One connection
Overhead	High (full HTTP)	Low	Very low
Complexity	Medium	Low	Medium-High
HTTP/2 benefit	Yes	Yes (multiple streams)	No (its own connection)

Decision Framework

Ask one question first: does the client need to send real-time data to the server?

No → use SSE. Notifications, live feeds, dashboards, progress updates.
Yes → is it occasional (user actions, form submits)? Regular HTTP POST still works fine alongside SSE.
Yes, truly bidirectional and latency-critical → WebSocket. Chat, collaborative editing, multiplayer games.

ELI5: SSE is a one-way radio broadcast. WebSocket is a walkie-talkie. Don’t buy a walkie-talkie when you just want to listen to the news.

WebSocket at Scale

Single-server WebSocket is easy. Scaling to 100K or 1M connections is where it gets interesting.

The C10K Problem

In the early 2000s, handling 10,000 concurrent connections on one server seemed impossible. The problem: traditional servers used one thread per connection. 10,000 threads meant 10,000 × ~1MB stack = 10GB RAM, plus context-switching overhead.

The fix: event-driven I/O with epoll (Linux) or kqueue (BSD/macOS). Instead of a thread waiting on each connection, one thread monitors all file descriptors with a single syscall. When any connection has data ready, handle it. Otherwise, sleep.

Node.js, nginx, Go’s net package, and Erlang/Elixir all use this model. That’s why Node can handle 100K+ WebSocket connections on modest hardware.

Memory Per Connection

Idle connections cost memory. Rough numbers:

Runtime	Idle WS memory
Node.js	~10KB
Go	~4KB (goroutine stack)
Elixir/Erlang	~2-3KB (lightweight process)
Java (virtual threads)	~1-2KB
Nginx (C)	~1KB

At 100K connections: Node.js uses ~1GB for connection state alone. Erlang uses ~200MB. These numbers matter at scale.

Heartbeats and Dead Connection Detection

TCP doesn’t tell you when the other side disappears silently (mobile app killed, laptop lid closed, NAT timeout). You find out when you try to write and get an error — which might be minutes later.

Solution: WebSocket ping/pong frames at application level. Server sends a ping frame every 30 seconds. Client must respond with pong. No pong within N seconds = dead connection, close it.

Don’t rely on TCP keepalive — the intervals are too long (default 2 hours on Linux) and proxies strip them.

Horizontal Scaling

WebSocket connections are stateful — a client is connected to a specific server instance. When user A (on server 1) messages user B (on server 2), server 1 can’t directly deliver to server 2’s connection.

Standard solution: pub/sub message broker.

Client A → Server 1 → Publish to Redis channel → Server 2 → Client B

Redis Pub/Sub works at moderate scale. NATS is faster. Kafka gives you durability and replay.

The pattern: each server subscribes to topics relevant to its connected clients. When a message comes in, publish to the broker. Every server that has subscribers for that message receives it and delivers to its local connections.

ELI5: Imagine a building where each floor has a phone operator. If you’re on floor 3 and want to reach someone on floor 7, you don’t take the stairs — you call the intercom and floor 7’s operator delivers the message.

Load Balancing WebSockets

WebSocket connections are long-lived. This breaks some assumptions:

Sticky sessions (IP hash or cookie): Route all requests from a client to the same server instance. Simple but creates uneven load as connections close unevenly.
WebSocket-aware L7 load balancer: nginx, HAProxy, AWS ALB all support WebSocket proxying. They must forward the Upgrade header and handle connection persistence.

Common mistake: Using an HTTP load balancer that terminates connections after 60 seconds. Many cloud load balancers have idle timeout defaults that will kill WebSocket connections. Increase it or configure keepalive.

Connection Limits

At high scale:

File descriptors: Each connection is an FD. Default Linux limit is 1024 per process. Raise with ulimit -n and /etc/security/limits.conf. Set to 1M+ for WebSocket servers.
Port exhaustion: Only applies to outbound connections. Servers accept on one port, so this isn’t a concern for WebSocket servers (it matters for outbound HTTP clients).
Memory: As above — ~4-10KB per idle connection.

WebTransport

WebTransport is the next evolution, built on HTTP/3 (QUIC). It’s shipping in Chrome and Firefox, still evolving.

What it adds over WebSocket:

Multiple independent streams over one connection — no head-of-line blocking between streams
Unreliable datagrams — UDP-like send-and-forget, no retransmission
Lower latency — QUIC’s 0-RTT connection establishment
Connection migration — survives IP address changes (mobile roaming)

Feature	WebSocket	WebTransport
Protocol	TCP	QUIC (UDP)
Streams	One	Multiple multiplexed
Reliability	Guaranteed	Reliable streams + unreliable datagrams
HOL blocking	Yes	No
Latency	~100ms setup	~0ms (0-RTT)
Status	Stable	Shipping, evolving

Use cases where WebTransport wins: real-time gaming (position updates can be lossy, order doesn’t matter), live video/audio where stale frames should be dropped, and anything where you currently work around WebSocket’s single-stream limitation.

ELI5: WebSocket is a single garden hose — reliable but one stream. WebTransport is multiple hoses bundled together, some reliable, some just spray-and-pray for when you need speed more than accuracy.

Real-time Architecture Patterns

Pub/Sub

The backbone of real-time at scale. Publishers emit events. Subscribers receive events matching their subscription. Publishers and subscribers don’t know about each other.

Redis Pub/Sub: simple, fast, in-memory, no persistence. Messages are lost if no subscriber is listening. NATS: higher throughput than Redis, JetStream adds persistence. Kafka: durable, replayable, high throughput. Overkill for most real-time chat — right for event sourcing.

Fan-Out

One event → many clients. A stock price update goes to 50,000 connected users watching that ticker. Naively, that’s 50,000 individual writes. At scale, you need to optimize:

Write amplification at the broker layer, not application layer
Consider read fan-out (clients pull from a shared feed) vs write fan-out (push to each)
WebSocket gateways that batch and broadcast efficiently

Presence

Tracking who’s online sounds easy. It’s not.

Naive approach: when a client connects, mark them online. When they disconnect, mark offline. Problem: connections drop and reconnect. Mobile apps go background. Network hiccups. Is the user offline or just temporarily disconnected?

Real approach: heartbeats with TTL. Client sends a heartbeat every 15 seconds. Server stores user:{id}:last_seen with a 30-second TTL in Redis. If TTL expires, user is considered offline. Heartbeat refresh resets the TTL. This handles flaky connections gracefully.

Delivery Guarantees

Guarantee	Meaning	Use case
At-most-once	May be lost, never duplicated	Metrics, position updates
At-least-once	Never lost, may duplicate	Notifications, events
Exactly-once	Never lost, never duplicate	Payments, state changes

At-least-once with idempotent processing is the practical default. Exactly-once is expensive and usually unnecessary.

Backpressure

What happens when clients can’t keep up with the message rate? If you just buffer, you’ll OOM. Options:

Drop old messages (ring buffer): good for live data where stale = irrelevant
Drop new messages: good when old data matters (chat history)
Signal slow consumers: pause sending, apply rate limiting
Disconnect slow consumers: drastic, but sometimes necessary for real-time systems

Practical Implementation

Socket.IO

WebSocket with fallbacks. If WebSocket fails, drops to long polling automatically. Adds rooms (grouped connections), namespaces (logical separation), and acknowledgements (RPC-like responses).

Use it when: you need broad compatibility and don’t want to handle reconnection/fallback yourself. Don’t use it if you only target modern browsers — native WebSocket is simpler and you lose the abstraction overhead.

MQTT

Publish-subscribe designed for constrained devices (IoT). Messages are tiny. QoS levels: 0 (fire and forget), 1 (at-least-once), 2 (exactly-once). Runs over TCP or WebSocket.

Use it for IoT, sensor data, any environment where bandwidth is precious.

Phoenix Channels (Elixir)

WebSocket abstraction that maps connections to lightweight Erlang processes. Built-in presence tracking (Phoenix Presence uses CRDTs for distributed presence). Handles massive concurrency gracefully — 2M WebSocket connections on a single server has been benchmarked.

If you’re building real-time at scale and choosing a stack, Elixir/Phoenix deserves serious consideration.

When to Use a Managed Service

Build your own if: you have specific requirements, want full control, or scale is predictable and manageable.

Use Pusher/Ably/Firebase if: you don’t want to operate WebSocket infrastructure, you need SDKs across many platforms, or you’re early stage and time-to-market matters more than cost.

The hidden cost of managed services: at scale, they’re expensive. At 1M monthly active users, you’re often better off running your own.

Security

Authentication at connect: Authenticate before allowing a WebSocket upgrade. Pass a token as a query param (wss://api.example.com/ws?token=xxx) or in an initial message. Don’t rely on cookies for cross-origin scenarios.

Per-message authorization: Just because a user is connected doesn’t mean they can subscribe to any channel. Validate permissions when subscribing to channels, not just at connect.

Origin checking: The WebSocket handshake includes an Origin header. Validate it server-side to prevent cross-site WebSocket hijacking (CSWSH). Browsers enforce CORS for HTTP but not for WebSocket upgrades.

Common mistake: Treating an authenticated WebSocket connection as authorized for all operations forever. Token expiry, permission revocation, and user logout need to close or re-authenticate WebSocket connections.

Final Decision Table

Scenario	Recommended	Reason
Live notifications (no client send)	SSE	Simpler, auto-reconnect, proxy-friendly
Live dashboard / analytics	SSE	Server-push only
Chat application	WebSocket	Bidirectional
Collaborative editing	WebSocket	Low-latency bidirectional
Multiplayer game (critical state)	WebSocket	Bidirectional, reliable
Multiplayer game (position updates)	WebTransport	UDP-like, tolerates loss
IoT sensor data	MQTT over WebSocket	Lightweight, QoS options
Progress bars, job status	SSE	Server-push only
Occasional user actions	HTTP + SSE	SSE for push, POST for actions
Deep in corporate firewall	Long polling (fallback)	WebSocket/SSE may be blocked

Start with SSE. Upgrade to WebSocket only when you actually need the client to send real-time data. Consider WebTransport when you need multiple streams or can tolerate unreliable delivery.

WebSockets, SSE & Real-time Communication#

The Real-time Problem#

Polling#

Long Polling#

WebSockets#

The Upgrade Handshake#

Frame-Based Protocol#

ws:// vs wss://#

Server-Sent Events (SSE)#

Wire Format#

Automatic Reconnection#

WebSocket vs SSE vs Long Polling#

Decision Framework#

WebSocket at Scale#

The C10K Problem#

Memory Per Connection#

Heartbeats and Dead Connection Detection#

Horizontal Scaling#

Load Balancing WebSockets#

Connection Limits#

WebTransport#

Real-time Architecture Patterns#

Pub/Sub#

Fan-Out#

Presence#

Delivery Guarantees#

Backpressure#

Practical Implementation#

Socket.IO#

MQTT#

Phoenix Channels (Elixir)#

When to Use a Managed Service#

Security#

Final Decision Table#

WebSockets, SSE & Real-time Communication

The Real-time Problem

Polling

Long Polling

WebSockets

The Upgrade Handshake

Frame-Based Protocol

ws:// vs wss://

Server-Sent Events (SSE)

Wire Format

Automatic Reconnection

WebSocket vs SSE vs Long Polling

Decision Framework

WebSocket at Scale

The C10K Problem

Memory Per Connection

Heartbeats and Dead Connection Detection

Horizontal Scaling

Load Balancing WebSockets

Connection Limits

WebTransport

Real-time Architecture Patterns

Pub/Sub

Fan-Out

Presence

Delivery Guarantees

Backpressure

Practical Implementation

Socket.IO

MQTT

Phoenix Channels (Elixir)

When to Use a Managed Service

Security

Final Decision Table