← Networking Mastery — Fundamentals to Principal

gRPC & Protocol Buffers

15 min read 3161 words

gRPC & Protocol Buffers

REST+JSON is the default choice for APIs because it’s familiar, curl-friendly, and works in every browser. That’s fine for public-facing APIs. But when you’re building 50 microservices talking to each other thousands of times per second, “curl-friendly” stops being a virtue and starts being a tax you pay on every request.

gRPC is what happens when you stop treating internal service communication like an HTTP endpoint and start treating it like a function call — because that’s what it actually is.

Why gRPC Exists

The REST+JSON Problem

REST over JSON has four quiet performance killers that most people don’t think about until they’re running at scale:

Text parsing overhead. JSON is human-readable, which means it’s not machine-optimal. Parsing {"user_id": 12345, "active": true} requires tokenizing text, type inference, and allocating strings. A binary format that says “field 1 is an int32 with value 12345, field 2 is a bool with value true” is parsed in a single memory read.

No streaming. HTTP/1.1 is request-response. You ask, you wait, you get one answer. If you want a live feed of events, you’re hacking it with Server-Sent Events or WebSockets — neither of which feels like a natural extension of REST semantics.

No type safety. JSON has no schema enforcement at the wire level. The server says a field is an integer, the client receives it as a float, someone adds a new field and forgets to update the documentation. OpenAPI/Swagger helps, but it’s optional and often stale.

Schema discovery. With REST, there’s no standard way for a client to ask “what endpoints exist and what do they accept?” Swagger docs help but require manual maintenance. gRPC has reflection built in.

ELI5: REST+JSON for internal services is like two developers communicating via sticky notes — readable, flexible, but slow to write and easy to misread. gRPC is like using a shared type-safe function library — both sides compile against the same contract, so typos get caught at build time, not 2 AM on-call.

What gRPC Actually Is

gRPC is three things bundled together:

HTTP/2 transport — multiplexed, binary, bi-directional
Protocol Buffers serialization — binary, schema-first, 3-10x smaller than JSON
Code generation — protoc compiles .proto files into client + server stubs in 10+ languages

Google built an internal version called Stubby in the early 2000s. It ran essentially every Google service-to-service call. In 2015, they rebuilt it on HTTP/2 (instead of a proprietary protocol) and open-sourced it as gRPC. The design is proven at Google-scale before most companies get anywhere near the problem.

When gRPC shines:

Internal microservice communication
Polyglot environments (Go services calling Python services calling Java services)
Streaming workloads: real-time feeds, file uploads, bidirectional sync
High-throughput, low-latency APIs where serialization overhead is measurable
Anywhere you want type safety enforced at the compiler level

Protocol Buffers (Protobuf)

Schema Definition

Everything starts with a .proto file. This is the contract between client and server — both sides compile against it, and the compiler generates the boring boilerplate.

syntax = "proto3";

package user.v1;

service UserService {
  rpc GetUser(GetUserRequest) returns (GetUserResponse);
  rpc ListUsers(ListUsersRequest) returns (stream User);
}

message GetUserRequest {
  int64 user_id = 1;
}

message GetUserResponse {
  User user = 1;
}

message User {
  int64  id         = 1;
  string email      = 2;
  bool   is_active  = 3;
  repeated string roles = 4;
}

Run protoc --go_out=. --go-grpc_out=. user.proto and you get type-safe client and server code in Go (or Python, Java, Rust, C++, etc.).

ELI5: A .proto file is like a legal contract both sides sign before talking. Once both parties compile against the same .proto, the compiler guarantees they’re speaking the same language. If you change the contract on one side without updating the other, the build breaks — which is much better than finding out at runtime.

Binary Serialization: Why It’s Faster

Protobuf uses a binary wire format. Each field is encoded as:

[field_number << 3 | wire_type] [value]

Field numbers, not field names, appear on the wire. The string "user_id" never travels. Instead, field 1 (a varint) carries the value 12345.

Format	Size for {“user_id”: 12345, “active”: true}	Parse time (relative)
JSON	33 bytes	1x (baseline)
XML	~70 bytes	~2x slower
Protobuf	4 bytes	20-100x faster

Protobuf is 3-10x smaller and 20-100x faster to parse than JSON in practice. At 100k RPS, that difference is measurable in CPU budget.

ELI5: JSON is like shipping a package and writing “This box contains: shoes, size 42, color blue” on the outside in full sentences. Protobuf is like using a barcode that both sender and receiver know means “item #3, size 42, blue.” The barcode is tiny and scanned instantly; the sentence takes time to read and parse.

Field Numbers and Wire Compatibility

This is the part people get wrong. Field numbers are forever. Once you ship field 1 as user_id, that number is burned for the lifetime of the message.

Schema evolution rules (memorize these):

Action	Safe?	Why
Add a new field with a new number	Yes	Old clients ignore unknown fields
Remove a field	Yes, but…	Mark as `reserved` so the number is never reused
Rename a field	Yes	Names don’t appear on the wire
Change a field’s type	Rarely	Only compatible pairs (int32 → int64 is ok; int32 → string is not)
Reuse a deleted field number	Never	Old clients will misinterpret the data
Change a field number	Never	This is the same as deleting one and adding another

message User {
  reserved 5, 6;          // old fields, numbers retired forever
  reserved "old_name";    // old field name, never reuse
  int64  id    = 1;
  string email = 2;
}

ELI5: Field numbers are like apartment numbers in a building. If apartment 5 used to be a storage room and you tear it down, you don’t renumber 6, 7, 8… to 5, 6, 7. You leave apartment 5 permanently retired, or mail will get delivered to the wrong place. Old clients don’t know the layout changed.

proto2 vs proto3

proto3 (2016+) is what you should use. proto2 had required fields, explicit default values, and other complexity that caused more problems than it solved. In proto3: all fields are optional, zero values are the default, and required is gone. The simplification was intentional — Google’s experience showed that required fields made schema evolution painful.

The Four RPC Patterns

This is gRPC’s biggest differentiator from REST. Not just request-response — four distinct communication patterns.

service ChatService {
  // Unary: one request, one response
  rpc SendMessage(SendMessageRequest) returns (SendMessageResponse);

  // Server streaming: one request, stream of responses
  rpc StreamMessages(StreamRequest) returns (stream Message);

  // Client streaming: stream of requests, one response
  rpc UploadFile(stream FileChunk) returns (UploadResponse);

  // Bidirectional: stream in both directions simultaneously
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

Unary

One request, one response. Functionally identical to a REST endpoint. Use this for anything where you have a single question and expect a single answer: fetching a record, creating a resource, running a calculation.

Real-world example: GetUser(user_id=123) → User{...}

Server Streaming

Client sends one request, server responds with a stream. The connection stays open and the server pushes responses as they’re available.

When to use: Live feed of events, large paginated result sets (stream 10k records instead of paginating), real-time price updates, log tailing.

Real-world example: SubscribeToTrades(symbol="AAPL") → stream of Trade{...} events until the client disconnects.

ELI5: Server streaming is like calling a sports hotline that reads you every score update live instead of just today’s final score. You call once, stay on the line, and they push updates to you as they happen.

Client Streaming

Client sends a stream of requests, server responds once at the end.

When to use: File uploads, batch inserts, telemetry aggregation where you’re sending many small measurements and want one summary back.

Real-world example: Stream 10,000 LogEntry messages → server responds with IngestResponse{records_written: 10000}

Bidirectional Streaming

Both sides send independent streams simultaneously. Either side can close their stream independently.

When to use: Real-time chat, collaborative editing sync, interactive games, any protocol that feels more like a conversation than a transaction.

Real-world example: Chat application where both client and server send ChatMessage streams in parallel.

ELI5: Bidirectional streaming is like a walkie-talkie where both people can talk at the same time. Unary is like a phone call where you ask a question and wait for the full answer. Server streaming is a press conference — one question, the speaker talks until done. Client streaming is dictating a letter — you talk, they wait, then reply with “got it.”

gRPC Under the Hood

Why HTTP/2 Matters

gRPC runs on HTTP/2, and HTTP/2 gives it three things REST couldn’t have on HTTP/1.1:

Multiplexing. Multiple RPC calls share a single TCP connection without blocking each other. Compare to HTTP/1.1 where each request needed its own connection (or waited in line).

Binary framing. HTTP/2 speaks binary. No text parsing. Fits naturally with Protobuf’s binary serialization.

Header compression (HPACK). gRPC metadata (authentication tokens, trace IDs) is compressed across calls. The first call sends full headers; subsequent calls send diffs.

Trailers: The Unusual Part

gRPC uses HTTP/2 trailers — headers sent after the response body — to convey the final status code. This is unusual because most HTTP tooling ignores trailers entirely.

HTTP/2 Response Flow:
  HEADERS frame    → response metadata (content-type, etc.)
  DATA frame(s)    → response body (protobuf-encoded)
  HEADERS frame    → trailers: grpc-status=0, grpc-message=""

This design lets gRPC stream data AND still report a final status (success or error) after all the data is sent. You can’t put the status in the initial headers because you don’t know if the streaming will succeed until it’s done.

ELI5: Trailers are like the “end of letter” footer that says “everything I told you was accurate.” The status code comes after the data because for streaming calls, the server doesn’t know if everything succeeded until all the data is sent. It’s the difference between signing a receipt before or after the boxes are unloaded.

Deadlines, Metadata, Status Codes

Deadlines are how gRPC propagates timeouts across service boundaries. When Service A calls Service B with a 500ms deadline and Service B calls Service C, Service C automatically receives the remaining deadline (say 380ms). If C takes too long, the whole chain gets cancelled. Always set deadlines — the default is no timeout, which means resource leaks under failure.

Metadata is gRPC’s equivalent of HTTP headers. Key-value pairs attached to a call: auth tokens, trace IDs, request IDs. Sent as HTTP/2 headers (and HPACK-compressed).

gRPC status codes are not HTTP status codes. Know the important ones:

gRPC Status	HTTP Analog	When it occurs
OK (0)	200	Success
CANCELLED (1)	499	Client cancelled the request
DEADLINE_EXCEEDED (4)	504	Timeout hit (server or propagated)
NOT_FOUND (5)	404	Resource doesn’t exist
ALREADY_EXISTS (6)	409	Conflict/duplicate
PERMISSION_DENIED (7)	403	Authorized but not allowed
UNAUTHENTICATED (16)	401	Not authenticated
RESOURCE_EXHAUSTED (8)	429	Rate limited or quota exceeded
INTERNAL (13)	500	Unexpected server error
UNAVAILABLE (14)	503	Service temporarily down (safe to retry)

gRPC vs REST — The Real Comparison

Dimension	REST + JSON	gRPC + Protobuf
Performance	Baseline	2-10x faster serialization
Type safety	OpenAPI (optional, often stale)	Enforced at compile time
Streaming	Hacked (SSE, WebSocket)	Native (4 patterns)
Browser support	Native	gRPC-Web proxy required
Schema evolution	Manual discipline	Built into protobuf rules
Debugging	curl-friendly, human-readable	Needs grpcurl or Postman gRPC
Codegen	Optional (swagger-codegen)	Built-in, first-class
Ecosystem tooling	Massive	Growing, smaller
Learning curve	Low	Medium (proto files, codegen workflow)
Error model	HTTP status codes	gRPC status codes

ELI5: REST is a car everyone knows how to drive — comfortable, familiar, works everywhere. gRPC is a sports car that goes 3x faster and handles better, but you need a special license, it doesn’t fit in every parking garage (no native browser support), and when it breaks you need a different mechanic.

The decision framework:

External API (browsers, third parties, mobile apps) → REST. Always. gRPC-Web exists but adds operational complexity for marginal gains.
Internal service-to-service → seriously consider gRPC, especially if you have streaming needs or high throughput.
Mixed: expose REST externally via a gateway, use gRPC internally. This is the common pattern in mature microservice architectures.

Common mistake: Defaulting to REST for internal services because it’s familiar, then discovering 18 months later that JSON parsing is 8% of your CPU budget. The switching cost is high — protobuf requires a migration of every client and server simultaneously.

Load Balancing gRPC

This is the part that catches people by surprise.

The HTTP/2 Connection Problem

HTTP/2 multiplexes all requests over a single long-lived connection. This is great for performance. It’s terrible for load balancing.

A traditional L4 load balancer (TCP-level) sees one TCP connection from Client → LB → Backend A. All 10,000 RPC calls that connection carries go to Backend A. Backend B-D sit idle. You’ve accidentally built a single-server system.

L4 Load Balancer (WRONG for gRPC):
  Client ─── TCP conn ──► LB ─── TCP conn ──► Backend A (100% load)
                                              Backend B (0% load)
                                              Backend C (0% load)

L7 Load Balancing (Correct)

An L7 load balancer understands HTTP/2 frames and can distribute individual RPC calls across backends, even though they share one TCP connection from client to LB.

L7 Load Balancer (correct):
  Client ─── HTTP/2 conn ──► LB ─┬─ HTTP/2 conn ──► Backend A
                                  ├─ HTTP/2 conn ──► Backend B
                                  └─ HTTP/2 conn ──► Backend C
  (each RPC call routed to a different backend)

Options:

Approach	How it works	When to use
L7 proxy (Envoy, nginx)	Proxy understands HTTP/2 frames	Standard service mesh setup
Client-side LB	Client knows all backends, picks one per call	Small services, no mesh
Service mesh (Istio, Linkerd)	Sidecar proxy handles it transparently	K8s environments
Look-aside LB	External balancer tells client where to connect (xDS)	Large-scale, Google-style

ELI5: A regular load balancer is like a phone operator who connects your call to one receptionist and leaves it there forever. All your follow-up questions go to that same person even if they’re overwhelmed. An L7 load balancer is like a smart operator who routes each individual question to whoever is free right now.

Common mistake: Deploying gRPC behind an AWS ALB (L4 by default) and wondering why one pod has 100% CPU and the others are idle. ALB supports gRPC at L7 since 2020, but you must explicitly configure it. Nginx needs grpc_pass not proxy_pass.

Error Handling and Resilience

Retries

gRPC has a built-in retry policy you can configure in the service config (no code changes needed):

{
  "retryPolicy": {
    "maxAttempts": 4,
    "initialBackoff": "0.1s",
    "maxBackoff": "1s",
    "backoffMultiplier": 2,
    "retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
  }
}

Only retry idempotent operations or codes known to be safe (UNAVAILABLE means the server never processed the request). Never retry INTERNAL or ALREADY_EXISTS blindly.

Deadlines (Non-Negotiable)

Always set a deadline on every call. No exceptions. A call without a deadline will wait forever if the downstream service hangs, consuming thread/goroutine, connection, and memory — all the way up the call chain.

Common mistake: Setting a 30-second deadline because “that seems safe.” Under failure, you have 30 seconds × concurrent callers × chain depth of resource leakage before anything gives up. 1-5 seconds is usually the right range. If your operation genuinely needs 30 seconds, you want an async job, not a synchronous RPC.

Hedging

Send the same request to multiple backends simultaneously, return the first response, cancel the rest. Trades higher server load for lower tail latency (p99/p99.9).

Use hedging when: latency is more important than cost, and operations are read-only/idempotent.

Health Checking Protocol

gRPC standardizes health checking via the grpc.health.v1.Health service (a spec, not just a convention). Load balancers and orchestration systems query Check RPC — backend responds with SERVING, NOT_SERVING, or UNKNOWN. Prefer this over HTTP /health endpoints for gRPC services.

Advanced Patterns

Interceptors

Interceptors are middleware for gRPC. They wrap each call, both on the client and server side, without modifying the actual RPC handlers. The standard way to add cross-cutting concerns:

Auth: verify JWT/mTLS on every incoming call
Logging: log request/response metadata, status codes, latency
Tracing: inject/propagate OpenTelemetry trace context
Metrics: record RPC duration, error rates per method
Rate limiting: per-client or per-method throttling

ELI5: Interceptors are like security guards and timekeepers at a building entrance — every person (RPC call) passes through them automatically. The guards check ID (auth interceptor), the timekeepers note arrival time (metrics), and a camera takes a photo (logging). The office workers inside don’t need to know any of this happens.

gRPC-Web

Browsers can’t make native gRPC calls because they don’t have direct access to HTTP/2 trailers (the fetch API hides them). gRPC-Web is a slightly different wire protocol that works within browser constraints, mediated by a proxy (usually Envoy) that translates gRPC-Web ↔ gRPC on the server side.

This adds operational complexity. For browser-facing APIs, REST usually wins unless you have a strong reason (heavy bidirectional streaming, enforced schema sharing between frontend and backend).

Connect Protocol

Buf’s Connect is a newer HTTP-compatible RPC protocol that works natively with HTTP/1.1, HTTP/2, and HTTP/3 without needing gRPC-Web or special proxies. It’s backward compatible with gRPC (connect clients can talk to gRPC servers and vice versa). Worth watching — it solves the browser support problem more cleanly.

Reflection

gRPC servers can expose a Reflection service that lets clients discover what services and methods exist at runtime. This is how grpcurl works without needing the .proto file. Enable reflection in development/staging, consider disabling in production for security.

When NOT to Use gRPC

Public API (browser clients, external developers): REST is more accessible; gRPC is a higher barrier to entry
Simple CRUD with infrequent calls: the protobuf toolchain overhead isn’t worth it
Teams unfamiliar with generated code workflows: the cognitive overhead of managing .proto files and codegen is real
Services that need to be called from a shell script: curl your REST endpoint, not grpcurl

Protobuf Alternatives

Format	Serialization	Schema	Zero-copy	Best for
Protobuf	Binary, 3-10x smaller than JSON	Yes (.proto)	No	General RPC, gRPC
FlatBuffers	Binary, zero-copy	Yes (.fbs)	Yes	Games, embedded, max perf
Cap’n Proto	Binary, zero-copy	Yes (.capnp)	Yes	Low-latency, same author as Protobuf v2
MessagePack	Binary, ~2x smaller than JSON	No	No	Drop-in JSON replacement
Avro	Binary	Yes (JSON schema)	No	Kafka, data pipelines
Thrift	Binary	Yes (.thrift)	No	Legacy Facebook services

Unless you’re building a game engine or need memory-mapped binary data, Protobuf is the right default. The ecosystem, tooling, and gRPC integration aren’t matched by anything else.

Decision Table: gRPC vs REST

Situation	Choose
Public API, external developers	REST
Browser-first frontend	REST
Internal microservices, high throughput	gRPC
Polyglot team, type safety critical	gRPC
Streaming (server push, bidirectional)	gRPC
Team new to the stack	REST (lower ops overhead)
Existing REST API, adding streaming	Consider gRPC for new services
Mobile app backend	REST (simpler, better tooling)
Data pipeline between internal services	gRPC or Protobuf over Kafka
Need request-level load balancing	gRPC (with L7 LB or mesh)

The real answer is not one or the other — it’s REST at the boundary (external-facing), gRPC in the interior (service mesh). That’s the architecture most large companies converge on after operating both at scale.

gRPC & Protocol Buffers#

Why gRPC Exists#

The REST+JSON Problem#

What gRPC Actually Is#

Protocol Buffers (Protobuf)#

Schema Definition#

Binary Serialization: Why It’s Faster#

Field Numbers and Wire Compatibility#

proto2 vs proto3#

The Four RPC Patterns#

Unary#

Server Streaming#

Client Streaming#

Bidirectional Streaming#

gRPC Under the Hood#

Why HTTP/2 Matters#

Trailers: The Unusual Part#

Deadlines, Metadata, Status Codes#

gRPC vs REST — The Real Comparison#

Load Balancing gRPC#

The HTTP/2 Connection Problem#

L7 Load Balancing (Correct)#

Error Handling and Resilience#

Retries#

Deadlines (Non-Negotiable)#

Hedging#

Health Checking Protocol#

Advanced Patterns#

Interceptors#

gRPC-Web#

Connect Protocol#

Reflection#

When NOT to Use gRPC#

Protobuf Alternatives#

Decision Table: gRPC vs REST#

gRPC & Protocol Buffers

Why gRPC Exists

The REST+JSON Problem

What gRPC Actually Is

Protocol Buffers (Protobuf)

Schema Definition

Binary Serialization: Why It’s Faster

Field Numbers and Wire Compatibility

proto2 vs proto3

The Four RPC Patterns

Unary

Server Streaming

Client Streaming

Bidirectional Streaming

gRPC Under the Hood

Why HTTP/2 Matters

Trailers: The Unusual Part

Deadlines, Metadata, Status Codes

gRPC vs REST — The Real Comparison

Load Balancing gRPC

The HTTP/2 Connection Problem

L7 Load Balancing (Correct)

Error Handling and Resilience

Retries

Deadlines (Non-Negotiable)

Hedging

Health Checking Protocol

Advanced Patterns

Interceptors

gRPC-Web

Connect Protocol

Reflection

When NOT to Use gRPC

Protobuf Alternatives

Decision Table: gRPC vs REST