WebSockets, SSE & Real-time Communication
WebSockets, SSE & Real-time Communication
HTTP was designed for a world where the client always starts the conversation. That assumption held for decades. Then we built chat apps, live dashboards, multiplayer games, and collaborative editors — and suddenly the server needed to talk first. This note is about how the web learned to handle that.
The Real-time Problem
HTTP is fundamentally request-response. Client sends a request, server responds, connection closes (or persists for reuse, but still idle). The server has no way to say “hey, something happened” without the client asking first.
This created a set of hacks before real solutions existed.
Polling
Client sends a request every N seconds: “anything new?” Server answers yes or no.
Problems: latency equals your poll interval (1s poll = up to 1s delay). Most responses are empty “no, nothing new.” At scale, you’re hammering your servers with useless requests. 1000 users polling every second = 1000 req/sec of nothing.
Long Polling
Client sends a request. Server holds it open — doesn’t respond until there’s actual data (or a timeout). Client gets the response, immediately opens a new long-poll request.
Better than polling because you get near-instant delivery. But HTTP wasn’t designed for held-open connections. Every held connection ties up a server thread (in traditional thread-per-request models). Timeouts and reconnection logic get messy. Proxies and load balancers sometimes kill idle connections.
This is how Comet worked. It shipped real products — Gmail used it. But it was always fighting the protocol.
ELI5: Polling is like calling a restaurant every 5 minutes asking if your table is ready. Long polling is like calling and staying on hold until they tell you. WebSockets is like them calling you back when it’s ready.
WebSockets
WebSocket is a full-duplex, persistent connection over TCP. Once established, either side can send data at any time. The connection stays open until explicitly closed.
The Upgrade Handshake
WebSocket starts as HTTP. This is clever — it means it works through the same port 80/443, through existing infrastructure.
GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
After the 101 Switching Protocols response, the TCP connection is handed off. It’s no longer HTTP. The same bytes on the wire are now speaking WebSocket framing.
The Sec-WebSocket-Key and Sec-WebSocket-Accept are a challenge-response to prevent cross-protocol attacks. The server hashes the key with a magic GUID using SHA-1 and returns the base64 result. It’s not cryptographic security — it just prevents a plain HTTP cache from being tricked into replaying a WebSocket handshake.
Frame-Based Protocol
After handshake, communication is frame-based:
| Frame Type | Opcode | Purpose |
|---|---|---|
| Text | 0x1 | UTF-8 text data |
| Binary | 0x2 | Binary data |
| Close | 0x8 | Graceful close |
| Ping | 0x9 | Heartbeat check |
| Pong | 0xA | Heartbeat reply |
| Continuation | 0x0 | Fragment of previous frame |
A WebSocket frame has: a 2-byte header (FIN bit, opcode, MASK bit, payload length), optional extended length, optional masking key, then payload. Client-to-server frames are always masked; server-to-client are not.
Message fragmentation: a large message can be split into continuation frames. The FIN bit marks the last fragment. This lets the protocol handle arbitrarily large messages without buffering the whole thing.
No HTTP headers per message — headers were negotiated once at handshake. Sending a 20-byte message costs ~6 bytes of framing overhead versus 500+ bytes for an HTTP request.
ws:// vs wss://
ws:// — plain TCP. wss:// — TLS. Always use wss:// in production. Same reasons as HTTP vs HTTPS, plus: some corporate proxies actively block or corrupt unencrypted WebSocket traffic.
ELI5: WebSocket is like upgrading from sending letters (HTTP) to getting a phone line installed. First you send a letter to ask for the phone line (the HTTP upgrade), then you just talk directly anytime.
Common mistake: Opening a new WebSocket connection per message. One connection, many messages. Connections are expensive. If you’re creating WebSocket connections like HTTP requests, you’re doing it wrong.
Server-Sent Events (SSE)
SSE is the often-overlooked middle ground. It’s simpler than WebSocket and sufficient for most use cases.
The idea: the server sends a response that never ends. The Content-Type is text/event-stream. The browser reads it as a stream and fires events as data arrives.
Wire Format
data: Hello world\n\n
event: user-joined\n
data: {"userId": "abc123"}\n\n
id: 42\n
event: message\n
data: {"text": "hi"}\n
retry: 3000\n\n
Each event is separated by a blank line. Fields:
data:— the payload (can be multi-line; last line has the blank terminator)event:— custom event type (default is “message”)id:— event ID; browser stores last received IDretry:— reconnect delay in milliseconds
Automatic Reconnection
This is SSE’s killer feature. If the connection drops, the browser automatically reconnects using the Last-Event-ID header. The server can then resume from where it left off. You get reliable delivery for free, without writing reconnection logic.
const es = new EventSource('/events');
es.addEventListener('user-joined', (e) => {
const data = JSON.parse(e.data);
});
SSE is just HTTP. It works through HTTP/1.1 and HTTP/2 (where each SSE stream is one HTTP/2 stream — you can have many). It works through reverse proxies and CDNs that understand HTTP. It works through corporate firewalls that block WebSocket upgrades.
ELI5: SSE is like subscribing to a news feed. You open the paper once and new articles keep appearing. You don’t need to ask for each one.
Common mistake: Using WebSocket when you only need server-push. SSE is less code, better proxy support, and auto-reconnects. If the client never sends real-time data, SSE is almost always the right choice.
WebSocket vs SSE vs Long Polling
| Feature | Long Polling | SSE | WebSocket |
|---|---|---|---|
| Direction | Server → Client | Server → Client | Bidirectional |
| Protocol | HTTP | HTTP | WebSocket |
| Auto-reconnect | Manual | Built-in | Manual |
| Browser support | Universal | All modern browsers | All modern browsers |
| Proxy/firewall | Works | Works | Sometimes broken |
| Multiplexing | One req/response | One stream | One connection |
| Overhead | High (full HTTP) | Low | Very low |
| Complexity | Medium | Low | Medium-High |
| HTTP/2 benefit | Yes | Yes (multiple streams) | No (its own connection) |
Decision Framework
Ask one question first: does the client need to send real-time data to the server?
- No → use SSE. Notifications, live feeds, dashboards, progress updates.
- Yes → is it occasional (user actions, form submits)? Regular HTTP POST still works fine alongside SSE.
- Yes, truly bidirectional and latency-critical → WebSocket. Chat, collaborative editing, multiplayer games.
ELI5: SSE is a one-way radio broadcast. WebSocket is a walkie-talkie. Don’t buy a walkie-talkie when you just want to listen to the news.
WebSocket at Scale
Single-server WebSocket is easy. Scaling to 100K or 1M connections is where it gets interesting.
The C10K Problem
In the early 2000s, handling 10,000 concurrent connections on one server seemed impossible. The problem: traditional servers used one thread per connection. 10,000 threads meant 10,000 × ~1MB stack = 10GB RAM, plus context-switching overhead.
The fix: event-driven I/O with epoll (Linux) or kqueue (BSD/macOS). Instead of a thread waiting on each connection, one thread monitors all file descriptors with a single syscall. When any connection has data ready, handle it. Otherwise, sleep.
Node.js, nginx, Go’s net package, and Erlang/Elixir all use this model. That’s why Node can handle 100K+ WebSocket connections on modest hardware.
Memory Per Connection
Idle connections cost memory. Rough numbers:
| Runtime | Idle WS memory |
|---|---|
| Node.js | ~10KB |
| Go | ~4KB (goroutine stack) |
| Elixir/Erlang | ~2-3KB (lightweight process) |
| Java (virtual threads) | ~1-2KB |
| Nginx (C) | ~1KB |
At 100K connections: Node.js uses ~1GB for connection state alone. Erlang uses ~200MB. These numbers matter at scale.
Heartbeats and Dead Connection Detection
TCP doesn’t tell you when the other side disappears silently (mobile app killed, laptop lid closed, NAT timeout). You find out when you try to write and get an error — which might be minutes later.
Solution: WebSocket ping/pong frames at application level. Server sends a ping frame every 30 seconds. Client must respond with pong. No pong within N seconds = dead connection, close it.
Don’t rely on TCP keepalive — the intervals are too long (default 2 hours on Linux) and proxies strip them.
Horizontal Scaling
WebSocket connections are stateful — a client is connected to a specific server instance. When user A (on server 1) messages user B (on server 2), server 1 can’t directly deliver to server 2’s connection.
Standard solution: pub/sub message broker.
Client A → Server 1 → Publish to Redis channel → Server 2 → Client B
Redis Pub/Sub works at moderate scale. NATS is faster. Kafka gives you durability and replay.
The pattern: each server subscribes to topics relevant to its connected clients. When a message comes in, publish to the broker. Every server that has subscribers for that message receives it and delivers to its local connections.
ELI5: Imagine a building where each floor has a phone operator. If you’re on floor 3 and want to reach someone on floor 7, you don’t take the stairs — you call the intercom and floor 7’s operator delivers the message.
Load Balancing WebSockets
WebSocket connections are long-lived. This breaks some assumptions:
- Sticky sessions (IP hash or cookie): Route all requests from a client to the same server instance. Simple but creates uneven load as connections close unevenly.
- WebSocket-aware L7 load balancer: nginx, HAProxy, AWS ALB all support WebSocket proxying. They must forward the
Upgradeheader and handle connection persistence.
Common mistake: Using an HTTP load balancer that terminates connections after 60 seconds. Many cloud load balancers have idle timeout defaults that will kill WebSocket connections. Increase it or configure keepalive.
Connection Limits
At high scale:
- File descriptors: Each connection is an FD. Default Linux limit is 1024 per process. Raise with
ulimit -nand/etc/security/limits.conf. Set to 1M+ for WebSocket servers. - Port exhaustion: Only applies to outbound connections. Servers accept on one port, so this isn’t a concern for WebSocket servers (it matters for outbound HTTP clients).
- Memory: As above — ~4-10KB per idle connection.
WebTransport
WebTransport is the next evolution, built on HTTP/3 (QUIC). It’s shipping in Chrome and Firefox, still evolving.
What it adds over WebSocket:
- Multiple independent streams over one connection — no head-of-line blocking between streams
- Unreliable datagrams — UDP-like send-and-forget, no retransmission
- Lower latency — QUIC’s 0-RTT connection establishment
- Connection migration — survives IP address changes (mobile roaming)
| Feature | WebSocket | WebTransport |
|---|---|---|
| Protocol | TCP | QUIC (UDP) |
| Streams | One | Multiple multiplexed |
| Reliability | Guaranteed | Reliable streams + unreliable datagrams |
| HOL blocking | Yes | No |
| Latency | ~100ms setup | ~0ms (0-RTT) |
| Status | Stable | Shipping, evolving |
Use cases where WebTransport wins: real-time gaming (position updates can be lossy, order doesn’t matter), live video/audio where stale frames should be dropped, and anything where you currently work around WebSocket’s single-stream limitation.
ELI5: WebSocket is a single garden hose — reliable but one stream. WebTransport is multiple hoses bundled together, some reliable, some just spray-and-pray for when you need speed more than accuracy.
Real-time Architecture Patterns
Pub/Sub
The backbone of real-time at scale. Publishers emit events. Subscribers receive events matching their subscription. Publishers and subscribers don’t know about each other.
Redis Pub/Sub: simple, fast, in-memory, no persistence. Messages are lost if no subscriber is listening. NATS: higher throughput than Redis, JetStream adds persistence. Kafka: durable, replayable, high throughput. Overkill for most real-time chat — right for event sourcing.
Fan-Out
One event → many clients. A stock price update goes to 50,000 connected users watching that ticker. Naively, that’s 50,000 individual writes. At scale, you need to optimize:
- Write amplification at the broker layer, not application layer
- Consider read fan-out (clients pull from a shared feed) vs write fan-out (push to each)
- WebSocket gateways that batch and broadcast efficiently
Presence
Tracking who’s online sounds easy. It’s not.
Naive approach: when a client connects, mark them online. When they disconnect, mark offline. Problem: connections drop and reconnect. Mobile apps go background. Network hiccups. Is the user offline or just temporarily disconnected?
Real approach: heartbeats with TTL. Client sends a heartbeat every 15 seconds. Server stores user:{id}:last_seen with a 30-second TTL in Redis. If TTL expires, user is considered offline. Heartbeat refresh resets the TTL. This handles flaky connections gracefully.
Delivery Guarantees
| Guarantee | Meaning | Use case |
|---|---|---|
| At-most-once | May be lost, never duplicated | Metrics, position updates |
| At-least-once | Never lost, may duplicate | Notifications, events |
| Exactly-once | Never lost, never duplicate | Payments, state changes |
At-least-once with idempotent processing is the practical default. Exactly-once is expensive and usually unnecessary.
Backpressure
What happens when clients can’t keep up with the message rate? If you just buffer, you’ll OOM. Options:
- Drop old messages (ring buffer): good for live data where stale = irrelevant
- Drop new messages: good when old data matters (chat history)
- Signal slow consumers: pause sending, apply rate limiting
- Disconnect slow consumers: drastic, but sometimes necessary for real-time systems
Practical Implementation
Socket.IO
WebSocket with fallbacks. If WebSocket fails, drops to long polling automatically. Adds rooms (grouped connections), namespaces (logical separation), and acknowledgements (RPC-like responses).
Use it when: you need broad compatibility and don’t want to handle reconnection/fallback yourself. Don’t use it if you only target modern browsers — native WebSocket is simpler and you lose the abstraction overhead.
MQTT
Publish-subscribe designed for constrained devices (IoT). Messages are tiny. QoS levels: 0 (fire and forget), 1 (at-least-once), 2 (exactly-once). Runs over TCP or WebSocket.
Use it for IoT, sensor data, any environment where bandwidth is precious.
Phoenix Channels (Elixir)
WebSocket abstraction that maps connections to lightweight Erlang processes. Built-in presence tracking (Phoenix Presence uses CRDTs for distributed presence). Handles massive concurrency gracefully — 2M WebSocket connections on a single server has been benchmarked.
If you’re building real-time at scale and choosing a stack, Elixir/Phoenix deserves serious consideration.
When to Use a Managed Service
Build your own if: you have specific requirements, want full control, or scale is predictable and manageable.
Use Pusher/Ably/Firebase if: you don’t want to operate WebSocket infrastructure, you need SDKs across many platforms, or you’re early stage and time-to-market matters more than cost.
The hidden cost of managed services: at scale, they’re expensive. At 1M monthly active users, you’re often better off running your own.
Security
Authentication at connect: Authenticate before allowing a WebSocket upgrade. Pass a token as a query param (wss://api.example.com/ws?token=xxx) or in an initial message. Don’t rely on cookies for cross-origin scenarios.
Per-message authorization: Just because a user is connected doesn’t mean they can subscribe to any channel. Validate permissions when subscribing to channels, not just at connect.
Origin checking: The WebSocket handshake includes an Origin header. Validate it server-side to prevent cross-site WebSocket hijacking (CSWSH). Browsers enforce CORS for HTTP but not for WebSocket upgrades.
Common mistake: Treating an authenticated WebSocket connection as authorized for all operations forever. Token expiry, permission revocation, and user logout need to close or re-authenticate WebSocket connections.
Final Decision Table
| Scenario | Recommended | Reason |
|---|---|---|
| Live notifications (no client send) | SSE | Simpler, auto-reconnect, proxy-friendly |
| Live dashboard / analytics | SSE | Server-push only |
| Chat application | WebSocket | Bidirectional |
| Collaborative editing | WebSocket | Low-latency bidirectional |
| Multiplayer game (critical state) | WebSocket | Bidirectional, reliable |
| Multiplayer game (position updates) | WebTransport | UDP-like, tolerates loss |
| IoT sensor data | MQTT over WebSocket | Lightweight, QoS options |
| Progress bars, job status | SSE | Server-push only |
| Occasional user actions | HTTP + SSE | SSE for push, POST for actions |
| Deep in corporate firewall | Long polling (fallback) | WebSocket/SSE may be blocked |
Start with SSE. Upgrade to WebSocket only when you actually need the client to send real-time data. Consider WebTransport when you need multiple streams or can tolerate unreliable delivery.