HTTP/2
HTTP/2
HTTP/2 is not a new protocol — it’s a new wire format for the same semantics you already know. Same methods, same status codes, same headers. Different everything underneath.
Understanding HTTP/2 properly means understanding what HTTP/1.1 got wrong first. Because HTTP/2’s design is almost entirely a response to those specific failures.
Why HTTP/2 Exists
HTTP/1.1’s Fundamental Problems
HTTP/1.1 (1999) was designed for a web that no longer exists. A typical page in 1999: one HTML file, a few images. A typical page today: 80-200 separate resources. The protocol never kept up.
Head-of-line blocking (HOL blocking)
HTTP/1.1 is strictly sequential. On a single connection, you send request 1, wait for response 1, then send request 2. Browsers added pipelining (send multiple requests without waiting), but it barely worked — if response 1 was slow, responses 2, 3, 4 all waited regardless. This is HOL blocking.
Browsers worked around it by opening 6-8 parallel TCP connections per domain. That’s the fix the web ran on for 15 years.
Header overhead
Every single request sends full headers — cookies, user-agent, accept-encoding, authorization — as plaintext. On a site that sets 10 cookies, that’s 500-1000 bytes of headers per request, repeated verbatim every time. No compression, no deduplication.
Text format
HTTP/1.1 is human-readable text. That’s nice for debugging but bad for performance. Parsing text requires more CPU than parsing binary, and text is less compact.
ELI5: HTTP/1.1 is like ordering food at a restaurant where you can only have one order per waiter, and you have to repeat your entire dietary history (“no gluten, no nuts, allergic to shellfish…”) every single time you order anything, even a glass of water.
SPDY → HTTP/2
Google built SPDY in 2012 as an experiment. It proved multiplexing and header compression worked in practice. HTTP/2 (RFC 7540, 2015) is essentially a standardized, refined version of SPDY. Google deprecated SPDY after HTTP/2 shipped.
Semantic compatibility is the key design decision. HTTP/2 doesn’t change what an HTTP request means. GET /api/users means the same thing. Status 404 means the same thing. This let the upgrade happen at the infrastructure layer — load balancers, proxies, servers — without changing application code.
Binary Framing Layer
The biggest conceptual shift in HTTP/2: everything is a frame.
Frame Structure
+-----------------------------------------------+
| Length (24 bits) |
+---------------+---------------+---------------+
| Type (8) | Flags (8) |
+-+-------------+---------------+-------------------+
|R| Stream Identifier (31 bits) |
+=+=================================================+
| Frame Payload (0..2^24-1 bytes) |
+---------------------------------------------------+
Every HTTP/2 message is a sequence of frames. The fixed 9-byte header tells you exactly how long the payload is, what kind of frame it is, which stream it belongs to, and what flags apply. You never have to scan for a newline or delimiter.
Frame Types
| Type | Purpose |
|---|---|
DATA | Request/response body chunks |
HEADERS | Request/response headers (compressed with HPACK) |
PRIORITY | Stream dependency and weight hints |
RST_STREAM | Immediately terminate a stream |
SETTINGS | Connection configuration (window sizes, max streams) |
PUSH_PROMISE | Server announces an upcoming push |
PING | Round-trip measurement, keepalive |
GOAWAY | Graceful connection shutdown |
WINDOW_UPDATE | Flow control credit |
CONTINUATION | Overflow from HEADERS frame |
DATA and HEADERS are the workhorses. The rest are control frames that manage the connection and its streams.
Why Binary
Binary parsing is mechanical — read N bytes, done. Text parsing requires scanning character by character looking for \r\n, handling edge cases in whitespace, dealing with case-insensitive comparisons. Binary is faster, more compact, and harder to accidentally malform.
ELI5: HTTP/1.1 headers are like a handwritten letter you have to read word by word. HTTP/2 frames are like a form with fixed boxes — you know exactly where everything is before you start reading.
Streams and Multiplexing
What a Stream Is
A stream is a logical bidirectional channel within a single TCP connection. It has an ID, a state, and carries a sequence of frames. Multiple streams coexist on one connection simultaneously.
TCP Connection
│
├── Stream 1 ──→ [HEADERS] [DATA] [DATA] [DATA(END_STREAM)]
│ ↕ interleaved freely
├── Stream 3 ──→ [HEADERS] [DATA(END_STREAM)]
│
├── Stream 5 ──→ [HEADERS] [DATA] ...
│
└── Stream 7 ──→ [HEADERS] ...
The client sends frames from all active streams on the same TCP connection. The server reassembles each stream independently.
Stream ID Rules
- Client-initiated streams: odd numbers (1, 3, 5, …)
- Server-initiated streams (pushes): even numbers (2, 4, 6, …)
- Stream 0: reserved for connection-level control frames (SETTINGS, PING, GOAWAY)
- IDs never reuse within a connection — once a stream ID is exhausted, open a new connection
Stream States
idle
│
─────send HEADERS─────
│ │
▼ ▼
half-closed (remote) open
│ │
│ send END_STREAM
│ │
│ ▼
│ half-closed (local)
│ │
└──────────send RST_STREAM──→ closed
│
send END_STREAM
│
▼
closed
END_STREAM flag on a DATA or HEADERS frame signals “I’m done sending on this stream.” Both sides reaching END_STREAM closes the stream.
ELI5: Streams are like having multiple phone calls happening on the same wire at once. Each call has its own ID. The wire carries a chunk from call #1, then a chunk from call #3, then back to call #1. Each endpoint sorts the chunks by ID and reassembles each conversation independently.
HOL Blocking: Fixed at Application, Not TCP
This is critical. HTTP/2 solves application-level HOL blocking — a slow response for one resource no longer blocks other resources. But it does not solve TCP-level HOL blocking.
TCP delivers bytes in order. If a TCP segment is lost, all streams on that connection stall until retransmission arrives — even streams that have no data in the lost segment. One packet loss affects all 50 concurrent streams.
This is the primary motivation for HTTP/3’s switch to QUIC (UDP-based).
Common mistake: Assuming HTTP/2 eliminates HOL blocking entirely. It eliminates application-layer HOL blocking. TCP-layer HOL blocking is worse with HTTP/2 because one connection carries more streams — one loss event impacts more concurrent requests.
Header Compression (HPACK)
The Problem
HTTP/1.1 sends full plaintext headers every request. A typical authenticated API call sends:
authorization: Bearer eyJ0eXAiOiJKV1Q...— 500+ bytescookie: session_id=abc123; ...— 100-300 bytesuser-agent: Mozilla/5.0 ...— 100+ bytes- Various accept, content-type, cache-control headers
All of that, repeated on every single request.
HPACK Architecture
HPACK uses two tables:
Static table — 61 pre-defined entries that every HTTP/2 implementation knows:
| Index | Header Name | Header Value |
|---|---|---|
| 2 | :method | GET |
| 3 | :method | POST |
| 8 | :status | 200 |
| 14 | :status | 404 |
| 23 | cache-control | |
| 55 | set-cookie |
A GET request to a known path can reference :method: GET as a single byte (index 2).
Dynamic table — connection-specific. When the client sends a new header value not in the static table, it’s added to the dynamic table. Subsequent requests reference it by index.
First request: authorization: Bearer abc123 ← add to dynamic table at index 62
Second request: [index 62] ← single byte replaces entire header
Huffman encoding compresses string values further using a static Huffman code optimized for HTTP header characters.
ELI5: First time you call someone, you introduce yourself fully: “Hi, I’m John, I work at Acme, my badge number is 12345.” Every call after that, you just say “it’s John again” and they pull up your file. HPACK is that — you describe yourself once, then use a short code forever.
Why Not gzip?
gzip would compress better. But gzip on headers over TLS enables the CRIME attack (2012). The attacker injects known strings into requests (via script in browser), then observes compressed size. Because compression reveals shared prefixes, the attacker can deduce secret header values (like session tokens) by measuring size differences.
HPACK avoids this by not compressing across request boundaries in the same way gzip does. The Huffman encoding operates on individual values; the table deduplication is explicit, not compression-based.
Compression Ratios
| Request | Headers Before | Headers After | Reduction |
|---|---|---|---|
| First request | 800 bytes | 200 bytes | 75% |
| Subsequent same-origin requests | 800 bytes | 20-50 bytes | 94-97% |
Server Push
The Idea
The browser requests index.html. While parsing it, the browser will discover it needs style.css and app.js — but that discovery requires a round trip first. Server push lets the server skip that wait: when you send index.html, simultaneously push style.css and app.js.
Client Server
│── GET /index.html ─────────→│
│ │ ← sends PUSH_PROMISE for /style.css
│ │ ← sends PUSH_PROMISE for /app.js
│←──── HEADERS (index.html) ──│
│←──── DATA (index.html) ─────│
│←──── HEADERS (style.css) ───│ ← pushed without request
│←──── DATA (style.css) ──────│
│←──── HEADERS (app.js) ──────│
│←──── DATA (app.js) ─────────│
The PUSH_PROMISE frame arrives before the pushed resource, telling the client “I’m about to send this.” The client can reject it with RST_STREAM if it already has the resource cached.
Why It Mostly Failed
Server push has a fatal flaw: the server doesn’t know what’s in the client’s cache.
If the browser has style.css cached from a previous visit, the server pushes it anyway. Wasted bandwidth. The browser receives it, checks the cache, discards it. Worse: the pushed resource takes bandwidth away from resources the browser actually needs.
Getting push right requires the server to track per-client cache state, which is complex and stateful. Most teams got it wrong, saw worse performance, and turned it off.
Common mistake: Treating server push as a free performance win. In practice, push helps only when: the pushed resource is small, the client definitely doesn’t have it cached, and it’s needed very soon. Hard to get all three right.
The replacement: HTTP’s 103 Early Hints status code lets the server send Link: <style.css>; rel=preload before the full response. The browser initiates the fetch itself — respecting its cache — while the server is still preparing the main response. This is cache-aware and simpler.
Push is deprecated in HTTP/3.
Flow Control
Why HTTP/2 Needs Its Own Flow Control
TCP already has flow control (receiver advertises a window). But HTTP/2 runs multiple streams over one TCP connection. TCP flow control applies to the whole connection — it can’t distinguish between “stream 3 is processing slowly” and “streams 1, 5, 7 are fine.”
HTTP/2 adds flow control at the stream level.
How It Works
Both sides start with a default window of 65,535 bytes (configurable via SETTINGS). When the sender transmits DATA frames, it decrements its view of the window. When the receiver processes data, it sends WINDOW_UPDATE to grant more credit.
Sender: window = 65535
[DATA, 16384 bytes] → window = 49151
[DATA, 16384 bytes] → window = 32767
[DATA, 16384 bytes] → window = 16383
[DATA, 16384 bytes] → window = 0, STOP SENDING
Receiver: processes data, sends WINDOW_UPDATE(65535)
Sender: window = 65535, RESUME
This happens independently for each stream AND for the whole connection. A slow stream doesn’t starve a fast one.
ELI5: Imagine you’re filling water glasses at a dinner table with a limited pitcher. Each guest has their own glass (stream window) and you also have a limited amount of water total (connection window). You can fill glasses quickly if guests drain them fast. If one guest doesn’t drink, you stop refilling theirs — but you keep filling everyone else’s.
Practical Implications
Default 65,535-byte window is small by modern standards. Downloading a 5 MB file with default settings requires ~80 WINDOW_UPDATE round trips. High-throughput applications configure initial windows to 1 MB or more via SETTINGS.
Common mistake: Leaving the default window size for server-to-server connections with large payloads. Always tune
SETTINGS_INITIAL_WINDOW_SIZEfor your workload.
Stream Prioritization
The Model
Streams can declare a dependency on another stream, forming a dependency tree. Siblings share the parent’s bandwidth proportionally to their weight (1-256).
Stream 1 (weight 12)
├── Stream 3 (weight 4) ← gets 4/(4+8) = 33% of stream 1's share
└── Stream 5 (weight 8) ← gets 8/(4+8) = 67% of stream 1's share
Stream 7 (exclusive child of stream 3)
└── Stream 7 blocks all non-exclusive children of stream 3
Critical path resources (HTML, critical CSS) should have higher weights. Images and analytics scripts can have lower weights.
The Reality
Most servers ignore priority hints or implement them incorrectly. Browsers implement them differently from each other. The spec is complex and the performance gains are hard to measure.
HTTP/3 dropped the dependency tree entirely in favor of a simpler “urgency + incremental” scheme.
ELI5: Priority hints are like the “urgent” checkbox on an email form. In theory, urgent emails get processed first. In practice, the recipients (servers) don’t all honor it the same way, and some ignore it entirely.
HTTP/2 in Practice
Negotiation: ALPN
Browsers negotiate HTTP/2 via ALPN (Application-Layer Protocol Negotiation), a TLS extension. During the TLS handshake, the client sends a list: ["h2", "http/1.1"]. The server picks one and confirms. Zero extra round trips.
For plaintext connections, HTTP/2 (called h2c) uses an HTTP Upgrade mechanism — the client sends a special Upgrade: h2c header. Almost no browser supports plaintext HTTP/2. In practice: HTTP/2 requires TLS.
Undoing HTTP/1.1 Performance Hacks
HTTP/1.1’s limitations drove a generation of “best practices” that HTTP/2 makes harmful:
| HTTP/1.1 hack | Reason | HTTP/2 verdict |
|---|---|---|
| Domain sharding (cdn1.example.com, cdn2.example.com) | 6 connections per domain × N domains = more parallelism | Counter-productive. Defeats HPACK (separate connections = separate tables), requires more TLS handshakes |
| CSS/JS concatenation | Fewer requests = fewer connections | Unnecessary. HTTP/2 multiplexes freely. Concatenation breaks cache granularity |
| CSS sprites | Fewer image requests | Unnecessary. Individual images can be fetched concurrently |
| Inline critical resources | Saves a round trip | Sometimes still valid for truly tiny resources, but less important |
Common mistake: Applying HTTP/1.1 “performance best practices” to HTTP/2 deployments without re-evaluating them. Domain sharding actively hurts on HTTP/2.
gRPC Is HTTP/2
gRPC uses HTTP/2 as its transport. This is why gRPC gets streaming, multiplexing, header compression, and binary framing for free. A gRPC call is HTTP/2 POST to a path like /package.Service/Method with protobuf body. gRPC’s bidirectional streaming is HTTP/2 stream multiplexing.
If you’re debugging gRPC, you’re debugging HTTP/2.
When HTTP/2 Doesn’t Help
HTTP/2 improves high-request-count, latency-sensitive scenarios. It doesn’t help everywhere:
| Scenario | HTTP/2 benefit |
|---|---|
| Single large file download (video, backup) | None — one stream, already maxes out bandwidth |
| Already-fast LAN connections with low latency | Minimal — HOL blocking is less painful on fast links |
| Low-traffic API with few concurrent requests | Minimal — existing keep-alive usually sufficient |
| High packet-loss networks | Can be worse — one loss event stalls all streams |
High packet-loss networks are where TCP-level HOL blocking really hurts, and where HTTP/3’s QUIC transport shines.
ELI5: HTTP/2 is a better highway for carrying lots of small cars (requests) efficiently. If you’re moving one giant truck (large file download), the highway design doesn’t matter much — you’re still limited by the truck’s speed.
HTTP/1.1 vs HTTP/2 Comparison
| Dimension | HTTP/1.1 | HTTP/2 |
|---|---|---|
| Wire format | Plain text | Binary frames |
| Connections per origin | 6-8 (browsers workaround) | 1 (by design) |
| Request concurrency | Sequential per connection, parallel via multiple connections | True multiplexing on 1 connection |
| HOL blocking | Application + TCP level | TCP level only |
| Header compression | None (plaintext, repeated every request) | HPACK (static + dynamic table, Huffman) |
| Server push | Not supported | Supported (but deprecated in HTTP/3) |
| Flow control | TCP-level only | Per-stream + per-connection |
| Prioritization | None | Dependency tree + weights (mostly ignored) |
| TLS requirement | Optional | Optional in spec, required in practice |
| Negotiation | N/A | ALPN (TLS extension) |
| gRPC support | No | Yes (native transport) |
| Best for | Simple sites, legacy systems | APIs, SPAs, high-resource-count pages |
| RFC | RFC 7230-7235 (2014) | RFC 7540 (2015), RFC 9113 (2022) |
What Comes Next
HTTP/2 solved application-layer HOL blocking. It didn’t solve TCP-layer HOL blocking. On a lossy connection (mobile, cross-continental, WiFi), one dropped packet stalls all 50 concurrent streams. The more you multiplex, the worse the blast radius.
HTTP/3 replaces TCP with QUIC (a UDP-based transport) where each stream is independent at the transport layer. One lost packet only affects the stream it belongs to. That’s the next topic.