← Networking Mastery — Fundamentals to Principal

HTTP/2

HTTP/2

HTTP/2 is not a new protocol — it’s a new wire format for the same semantics you already know. Same methods, same status codes, same headers. Different everything underneath.

Understanding HTTP/2 properly means understanding what HTTP/1.1 got wrong first. Because HTTP/2’s design is almost entirely a response to those specific failures.


Why HTTP/2 Exists

HTTP/1.1’s Fundamental Problems

HTTP/1.1 (1999) was designed for a web that no longer exists. A typical page in 1999: one HTML file, a few images. A typical page today: 80-200 separate resources. The protocol never kept up.

Head-of-line blocking (HOL blocking)

HTTP/1.1 is strictly sequential. On a single connection, you send request 1, wait for response 1, then send request 2. Browsers added pipelining (send multiple requests without waiting), but it barely worked — if response 1 was slow, responses 2, 3, 4 all waited regardless. This is HOL blocking.

Browsers worked around it by opening 6-8 parallel TCP connections per domain. That’s the fix the web ran on for 15 years.

Header overhead

Every single request sends full headers — cookies, user-agent, accept-encoding, authorization — as plaintext. On a site that sets 10 cookies, that’s 500-1000 bytes of headers per request, repeated verbatim every time. No compression, no deduplication.

Text format

HTTP/1.1 is human-readable text. That’s nice for debugging but bad for performance. Parsing text requires more CPU than parsing binary, and text is less compact.

ELI5: HTTP/1.1 is like ordering food at a restaurant where you can only have one order per waiter, and you have to repeat your entire dietary history (“no gluten, no nuts, allergic to shellfish…”) every single time you order anything, even a glass of water.

SPDY → HTTP/2

Google built SPDY in 2012 as an experiment. It proved multiplexing and header compression worked in practice. HTTP/2 (RFC 7540, 2015) is essentially a standardized, refined version of SPDY. Google deprecated SPDY after HTTP/2 shipped.

Semantic compatibility is the key design decision. HTTP/2 doesn’t change what an HTTP request means. GET /api/users means the same thing. Status 404 means the same thing. This let the upgrade happen at the infrastructure layer — load balancers, proxies, servers — without changing application code.


Binary Framing Layer

The biggest conceptual shift in HTTP/2: everything is a frame.

Frame Structure

+-----------------------------------------------+
|                 Length (24 bits)               |
+---------------+---------------+---------------+
|   Type (8)    |   Flags (8)   |
+-+-------------+---------------+-------------------+
|R|           Stream Identifier (31 bits)           |
+=+=================================================+
|                 Frame Payload (0..2^24-1 bytes)  |
+---------------------------------------------------+

Every HTTP/2 message is a sequence of frames. The fixed 9-byte header tells you exactly how long the payload is, what kind of frame it is, which stream it belongs to, and what flags apply. You never have to scan for a newline or delimiter.

Frame Types

TypePurpose
DATARequest/response body chunks
HEADERSRequest/response headers (compressed with HPACK)
PRIORITYStream dependency and weight hints
RST_STREAMImmediately terminate a stream
SETTINGSConnection configuration (window sizes, max streams)
PUSH_PROMISEServer announces an upcoming push
PINGRound-trip measurement, keepalive
GOAWAYGraceful connection shutdown
WINDOW_UPDATEFlow control credit
CONTINUATIONOverflow from HEADERS frame

DATA and HEADERS are the workhorses. The rest are control frames that manage the connection and its streams.

Why Binary

Binary parsing is mechanical — read N bytes, done. Text parsing requires scanning character by character looking for \r\n, handling edge cases in whitespace, dealing with case-insensitive comparisons. Binary is faster, more compact, and harder to accidentally malform.

ELI5: HTTP/1.1 headers are like a handwritten letter you have to read word by word. HTTP/2 frames are like a form with fixed boxes — you know exactly where everything is before you start reading.


Streams and Multiplexing

What a Stream Is

A stream is a logical bidirectional channel within a single TCP connection. It has an ID, a state, and carries a sequence of frames. Multiple streams coexist on one connection simultaneously.

TCP Connection
│
├── Stream 1  ──→  [HEADERS] [DATA] [DATA] [DATA(END_STREAM)]
│                              ↕ interleaved freely
├── Stream 3  ──→  [HEADERS] [DATA(END_STREAM)]
│
├── Stream 5  ──→  [HEADERS] [DATA] ...
│
└── Stream 7  ──→  [HEADERS] ...

The client sends frames from all active streams on the same TCP connection. The server reassembles each stream independently.

Stream ID Rules

  • Client-initiated streams: odd numbers (1, 3, 5, …)
  • Server-initiated streams (pushes): even numbers (2, 4, 6, …)
  • Stream 0: reserved for connection-level control frames (SETTINGS, PING, GOAWAY)
  • IDs never reuse within a connection — once a stream ID is exhausted, open a new connection

Stream States

         idle
          │
    ─────send HEADERS─────
    │                    │
    ▼                    ▼
half-closed (remote)   open
    │                    │
    │              send END_STREAM
    │                    │
    │                    ▼
    │           half-closed (local)
    │                    │
    └──────────send RST_STREAM──→ closed
                         │
                    send END_STREAM
                         │
                         ▼
                       closed

END_STREAM flag on a DATA or HEADERS frame signals “I’m done sending on this stream.” Both sides reaching END_STREAM closes the stream.

ELI5: Streams are like having multiple phone calls happening on the same wire at once. Each call has its own ID. The wire carries a chunk from call #1, then a chunk from call #3, then back to call #1. Each endpoint sorts the chunks by ID and reassembles each conversation independently.

HOL Blocking: Fixed at Application, Not TCP

This is critical. HTTP/2 solves application-level HOL blocking — a slow response for one resource no longer blocks other resources. But it does not solve TCP-level HOL blocking.

TCP delivers bytes in order. If a TCP segment is lost, all streams on that connection stall until retransmission arrives — even streams that have no data in the lost segment. One packet loss affects all 50 concurrent streams.

This is the primary motivation for HTTP/3’s switch to QUIC (UDP-based).

Common mistake: Assuming HTTP/2 eliminates HOL blocking entirely. It eliminates application-layer HOL blocking. TCP-layer HOL blocking is worse with HTTP/2 because one connection carries more streams — one loss event impacts more concurrent requests.


Header Compression (HPACK)

The Problem

HTTP/1.1 sends full plaintext headers every request. A typical authenticated API call sends:

  • authorization: Bearer eyJ0eXAiOiJKV1Q... — 500+ bytes
  • cookie: session_id=abc123; ... — 100-300 bytes
  • user-agent: Mozilla/5.0 ... — 100+ bytes
  • Various accept, content-type, cache-control headers

All of that, repeated on every single request.

HPACK Architecture

HPACK uses two tables:

Static table — 61 pre-defined entries that every HTTP/2 implementation knows:

IndexHeader NameHeader Value
2:methodGET
3:methodPOST
8:status200
14:status404
23cache-control
55set-cookie

A GET request to a known path can reference :method: GET as a single byte (index 2).

Dynamic table — connection-specific. When the client sends a new header value not in the static table, it’s added to the dynamic table. Subsequent requests reference it by index.

First request:  authorization: Bearer abc123   ← add to dynamic table at index 62
Second request: [index 62]                      ← single byte replaces entire header

Huffman encoding compresses string values further using a static Huffman code optimized for HTTP header characters.

ELI5: First time you call someone, you introduce yourself fully: “Hi, I’m John, I work at Acme, my badge number is 12345.” Every call after that, you just say “it’s John again” and they pull up your file. HPACK is that — you describe yourself once, then use a short code forever.

Why Not gzip?

gzip would compress better. But gzip on headers over TLS enables the CRIME attack (2012). The attacker injects known strings into requests (via script in browser), then observes compressed size. Because compression reveals shared prefixes, the attacker can deduce secret header values (like session tokens) by measuring size differences.

HPACK avoids this by not compressing across request boundaries in the same way gzip does. The Huffman encoding operates on individual values; the table deduplication is explicit, not compression-based.

Compression Ratios

RequestHeaders BeforeHeaders AfterReduction
First request800 bytes200 bytes75%
Subsequent same-origin requests800 bytes20-50 bytes94-97%

Server Push

The Idea

The browser requests index.html. While parsing it, the browser will discover it needs style.css and app.js — but that discovery requires a round trip first. Server push lets the server skip that wait: when you send index.html, simultaneously push style.css and app.js.

Client                          Server
  │── GET /index.html ─────────→│
  │                              │ ← sends PUSH_PROMISE for /style.css
  │                              │ ← sends PUSH_PROMISE for /app.js
  │←──── HEADERS (index.html) ──│
  │←──── DATA (index.html) ─────│
  │←──── HEADERS (style.css) ───│  ← pushed without request
  │←──── DATA (style.css) ──────│
  │←──── HEADERS (app.js) ──────│
  │←──── DATA (app.js) ─────────│

The PUSH_PROMISE frame arrives before the pushed resource, telling the client “I’m about to send this.” The client can reject it with RST_STREAM if it already has the resource cached.

Why It Mostly Failed

Server push has a fatal flaw: the server doesn’t know what’s in the client’s cache.

If the browser has style.css cached from a previous visit, the server pushes it anyway. Wasted bandwidth. The browser receives it, checks the cache, discards it. Worse: the pushed resource takes bandwidth away from resources the browser actually needs.

Getting push right requires the server to track per-client cache state, which is complex and stateful. Most teams got it wrong, saw worse performance, and turned it off.

Common mistake: Treating server push as a free performance win. In practice, push helps only when: the pushed resource is small, the client definitely doesn’t have it cached, and it’s needed very soon. Hard to get all three right.

The replacement: HTTP’s 103 Early Hints status code lets the server send Link: <style.css>; rel=preload before the full response. The browser initiates the fetch itself — respecting its cache — while the server is still preparing the main response. This is cache-aware and simpler.

Push is deprecated in HTTP/3.


Flow Control

Why HTTP/2 Needs Its Own Flow Control

TCP already has flow control (receiver advertises a window). But HTTP/2 runs multiple streams over one TCP connection. TCP flow control applies to the whole connection — it can’t distinguish between “stream 3 is processing slowly” and “streams 1, 5, 7 are fine.”

HTTP/2 adds flow control at the stream level.

How It Works

Both sides start with a default window of 65,535 bytes (configurable via SETTINGS). When the sender transmits DATA frames, it decrements its view of the window. When the receiver processes data, it sends WINDOW_UPDATE to grant more credit.

Sender:  window = 65535
         [DATA, 16384 bytes] → window = 49151
         [DATA, 16384 bytes] → window = 32767
         [DATA, 16384 bytes] → window = 16383
         [DATA, 16384 bytes] → window = 0, STOP SENDING
Receiver: processes data, sends WINDOW_UPDATE(65535)
Sender:  window = 65535, RESUME

This happens independently for each stream AND for the whole connection. A slow stream doesn’t starve a fast one.

ELI5: Imagine you’re filling water glasses at a dinner table with a limited pitcher. Each guest has their own glass (stream window) and you also have a limited amount of water total (connection window). You can fill glasses quickly if guests drain them fast. If one guest doesn’t drink, you stop refilling theirs — but you keep filling everyone else’s.

Practical Implications

Default 65,535-byte window is small by modern standards. Downloading a 5 MB file with default settings requires ~80 WINDOW_UPDATE round trips. High-throughput applications configure initial windows to 1 MB or more via SETTINGS.

Common mistake: Leaving the default window size for server-to-server connections with large payloads. Always tune SETTINGS_INITIAL_WINDOW_SIZE for your workload.


Stream Prioritization

The Model

Streams can declare a dependency on another stream, forming a dependency tree. Siblings share the parent’s bandwidth proportionally to their weight (1-256).

Stream 1 (weight 12)
    ├── Stream 3 (weight 4)    ← gets 4/(4+8) = 33% of stream 1's share
    └── Stream 5 (weight 8)    ← gets 8/(4+8) = 67% of stream 1's share

Stream 7 (exclusive child of stream 3)
    └── Stream 7 blocks all non-exclusive children of stream 3

Critical path resources (HTML, critical CSS) should have higher weights. Images and analytics scripts can have lower weights.

The Reality

Most servers ignore priority hints or implement them incorrectly. Browsers implement them differently from each other. The spec is complex and the performance gains are hard to measure.

HTTP/3 dropped the dependency tree entirely in favor of a simpler “urgency + incremental” scheme.

ELI5: Priority hints are like the “urgent” checkbox on an email form. In theory, urgent emails get processed first. In practice, the recipients (servers) don’t all honor it the same way, and some ignore it entirely.


HTTP/2 in Practice

Negotiation: ALPN

Browsers negotiate HTTP/2 via ALPN (Application-Layer Protocol Negotiation), a TLS extension. During the TLS handshake, the client sends a list: ["h2", "http/1.1"]. The server picks one and confirms. Zero extra round trips.

For plaintext connections, HTTP/2 (called h2c) uses an HTTP Upgrade mechanism — the client sends a special Upgrade: h2c header. Almost no browser supports plaintext HTTP/2. In practice: HTTP/2 requires TLS.

Undoing HTTP/1.1 Performance Hacks

HTTP/1.1’s limitations drove a generation of “best practices” that HTTP/2 makes harmful:

HTTP/1.1 hackReasonHTTP/2 verdict
Domain sharding (cdn1.example.com, cdn2.example.com)6 connections per domain × N domains = more parallelismCounter-productive. Defeats HPACK (separate connections = separate tables), requires more TLS handshakes
CSS/JS concatenationFewer requests = fewer connectionsUnnecessary. HTTP/2 multiplexes freely. Concatenation breaks cache granularity
CSS spritesFewer image requestsUnnecessary. Individual images can be fetched concurrently
Inline critical resourcesSaves a round tripSometimes still valid for truly tiny resources, but less important

Common mistake: Applying HTTP/1.1 “performance best practices” to HTTP/2 deployments without re-evaluating them. Domain sharding actively hurts on HTTP/2.

gRPC Is HTTP/2

gRPC uses HTTP/2 as its transport. This is why gRPC gets streaming, multiplexing, header compression, and binary framing for free. A gRPC call is HTTP/2 POST to a path like /package.Service/Method with protobuf body. gRPC’s bidirectional streaming is HTTP/2 stream multiplexing.

If you’re debugging gRPC, you’re debugging HTTP/2.

When HTTP/2 Doesn’t Help

HTTP/2 improves high-request-count, latency-sensitive scenarios. It doesn’t help everywhere:

ScenarioHTTP/2 benefit
Single large file download (video, backup)None — one stream, already maxes out bandwidth
Already-fast LAN connections with low latencyMinimal — HOL blocking is less painful on fast links
Low-traffic API with few concurrent requestsMinimal — existing keep-alive usually sufficient
High packet-loss networksCan be worse — one loss event stalls all streams

High packet-loss networks are where TCP-level HOL blocking really hurts, and where HTTP/3’s QUIC transport shines.

ELI5: HTTP/2 is a better highway for carrying lots of small cars (requests) efficiently. If you’re moving one giant truck (large file download), the highway design doesn’t matter much — you’re still limited by the truck’s speed.


HTTP/1.1 vs HTTP/2 Comparison

DimensionHTTP/1.1HTTP/2
Wire formatPlain textBinary frames
Connections per origin6-8 (browsers workaround)1 (by design)
Request concurrencySequential per connection, parallel via multiple connectionsTrue multiplexing on 1 connection
HOL blockingApplication + TCP levelTCP level only
Header compressionNone (plaintext, repeated every request)HPACK (static + dynamic table, Huffman)
Server pushNot supportedSupported (but deprecated in HTTP/3)
Flow controlTCP-level onlyPer-stream + per-connection
PrioritizationNoneDependency tree + weights (mostly ignored)
TLS requirementOptionalOptional in spec, required in practice
NegotiationN/AALPN (TLS extension)
gRPC supportNoYes (native transport)
Best forSimple sites, legacy systemsAPIs, SPAs, high-resource-count pages
RFCRFC 7230-7235 (2014)RFC 7540 (2015), RFC 9113 (2022)

What Comes Next

HTTP/2 solved application-layer HOL blocking. It didn’t solve TCP-layer HOL blocking. On a lossy connection (mobile, cross-continental, WiFi), one dropped packet stalls all 50 concurrent streams. The more you multiplex, the worse the blast radius.

HTTP/3 replaces TCP with QUIC (a UDP-based transport) where each stream is independent at the transport layer. One lost packet only affects the stream it belongs to. That’s the next topic.