← Networking Mastery — Fundamentals to Principal

gRPC & Protocol Buffers

gRPC & Protocol Buffers

REST+JSON is the default choice for APIs because it’s familiar, curl-friendly, and works in every browser. That’s fine for public-facing APIs. But when you’re building 50 microservices talking to each other thousands of times per second, “curl-friendly” stops being a virtue and starts being a tax you pay on every request.

gRPC is what happens when you stop treating internal service communication like an HTTP endpoint and start treating it like a function call — because that’s what it actually is.


Why gRPC Exists

The REST+JSON Problem

REST over JSON has four quiet performance killers that most people don’t think about until they’re running at scale:

Text parsing overhead. JSON is human-readable, which means it’s not machine-optimal. Parsing {"user_id": 12345, "active": true} requires tokenizing text, type inference, and allocating strings. A binary format that says “field 1 is an int32 with value 12345, field 2 is a bool with value true” is parsed in a single memory read.

No streaming. HTTP/1.1 is request-response. You ask, you wait, you get one answer. If you want a live feed of events, you’re hacking it with Server-Sent Events or WebSockets — neither of which feels like a natural extension of REST semantics.

No type safety. JSON has no schema enforcement at the wire level. The server says a field is an integer, the client receives it as a float, someone adds a new field and forgets to update the documentation. OpenAPI/Swagger helps, but it’s optional and often stale.

Schema discovery. With REST, there’s no standard way for a client to ask “what endpoints exist and what do they accept?” Swagger docs help but require manual maintenance. gRPC has reflection built in.

ELI5: REST+JSON for internal services is like two developers communicating via sticky notes — readable, flexible, but slow to write and easy to misread. gRPC is like using a shared type-safe function library — both sides compile against the same contract, so typos get caught at build time, not 2 AM on-call.

What gRPC Actually Is

gRPC is three things bundled together:

  1. HTTP/2 transport — multiplexed, binary, bi-directional
  2. Protocol Buffers serialization — binary, schema-first, 3-10x smaller than JSON
  3. Code generationprotoc compiles .proto files into client + server stubs in 10+ languages

Google built an internal version called Stubby in the early 2000s. It ran essentially every Google service-to-service call. In 2015, they rebuilt it on HTTP/2 (instead of a proprietary protocol) and open-sourced it as gRPC. The design is proven at Google-scale before most companies get anywhere near the problem.

When gRPC shines:

  • Internal microservice communication
  • Polyglot environments (Go services calling Python services calling Java services)
  • Streaming workloads: real-time feeds, file uploads, bidirectional sync
  • High-throughput, low-latency APIs where serialization overhead is measurable
  • Anywhere you want type safety enforced at the compiler level

Protocol Buffers (Protobuf)

Schema Definition

Everything starts with a .proto file. This is the contract between client and server — both sides compile against it, and the compiler generates the boring boilerplate.

syntax = "proto3";

package user.v1;

service UserService {
  rpc GetUser(GetUserRequest) returns (GetUserResponse);
  rpc ListUsers(ListUsersRequest) returns (stream User);
}

message GetUserRequest {
  int64 user_id = 1;
}

message GetUserResponse {
  User user = 1;
}

message User {
  int64  id         = 1;
  string email      = 2;
  bool   is_active  = 3;
  repeated string roles = 4;
}

Run protoc --go_out=. --go-grpc_out=. user.proto and you get type-safe client and server code in Go (or Python, Java, Rust, C++, etc.).

ELI5: A .proto file is like a legal contract both sides sign before talking. Once both parties compile against the same .proto, the compiler guarantees they’re speaking the same language. If you change the contract on one side without updating the other, the build breaks — which is much better than finding out at runtime.

Binary Serialization: Why It’s Faster

Protobuf uses a binary wire format. Each field is encoded as:

[field_number << 3 | wire_type] [value]

Field numbers, not field names, appear on the wire. The string "user_id" never travels. Instead, field 1 (a varint) carries the value 12345.

FormatSize for {“user_id”: 12345, “active”: true}Parse time (relative)
JSON33 bytes1x (baseline)
XML~70 bytes~2x slower
Protobuf4 bytes20-100x faster

Protobuf is 3-10x smaller and 20-100x faster to parse than JSON in practice. At 100k RPS, that difference is measurable in CPU budget.

ELI5: JSON is like shipping a package and writing “This box contains: shoes, size 42, color blue” on the outside in full sentences. Protobuf is like using a barcode that both sender and receiver know means “item #3, size 42, blue.” The barcode is tiny and scanned instantly; the sentence takes time to read and parse.

Field Numbers and Wire Compatibility

This is the part people get wrong. Field numbers are forever. Once you ship field 1 as user_id, that number is burned for the lifetime of the message.

Schema evolution rules (memorize these):

ActionSafe?Why
Add a new field with a new numberYesOld clients ignore unknown fields
Remove a fieldYes, but…Mark as reserved so the number is never reused
Rename a fieldYesNames don’t appear on the wire
Change a field’s typeRarelyOnly compatible pairs (int32 → int64 is ok; int32 → string is not)
Reuse a deleted field numberNeverOld clients will misinterpret the data
Change a field numberNeverThis is the same as deleting one and adding another
message User {
  reserved 5, 6;          // old fields, numbers retired forever
  reserved "old_name";    // old field name, never reuse
  int64  id    = 1;
  string email = 2;
}

ELI5: Field numbers are like apartment numbers in a building. If apartment 5 used to be a storage room and you tear it down, you don’t renumber 6, 7, 8… to 5, 6, 7. You leave apartment 5 permanently retired, or mail will get delivered to the wrong place. Old clients don’t know the layout changed.

proto2 vs proto3

proto3 (2016+) is what you should use. proto2 had required fields, explicit default values, and other complexity that caused more problems than it solved. In proto3: all fields are optional, zero values are the default, and required is gone. The simplification was intentional — Google’s experience showed that required fields made schema evolution painful.


The Four RPC Patterns

This is gRPC’s biggest differentiator from REST. Not just request-response — four distinct communication patterns.

service ChatService {
  // Unary: one request, one response
  rpc SendMessage(SendMessageRequest) returns (SendMessageResponse);

  // Server streaming: one request, stream of responses
  rpc StreamMessages(StreamRequest) returns (stream Message);

  // Client streaming: stream of requests, one response
  rpc UploadFile(stream FileChunk) returns (UploadResponse);

  // Bidirectional: stream in both directions simultaneously
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

Unary

One request, one response. Functionally identical to a REST endpoint. Use this for anything where you have a single question and expect a single answer: fetching a record, creating a resource, running a calculation.

Real-world example: GetUser(user_id=123)User{...}

Server Streaming

Client sends one request, server responds with a stream. The connection stays open and the server pushes responses as they’re available.

When to use: Live feed of events, large paginated result sets (stream 10k records instead of paginating), real-time price updates, log tailing.

Real-world example: SubscribeToTrades(symbol="AAPL") → stream of Trade{...} events until the client disconnects.

ELI5: Server streaming is like calling a sports hotline that reads you every score update live instead of just today’s final score. You call once, stay on the line, and they push updates to you as they happen.

Client Streaming

Client sends a stream of requests, server responds once at the end.

When to use: File uploads, batch inserts, telemetry aggregation where you’re sending many small measurements and want one summary back.

Real-world example: Stream 10,000 LogEntry messages → server responds with IngestResponse{records_written: 10000}

Bidirectional Streaming

Both sides send independent streams simultaneously. Either side can close their stream independently.

When to use: Real-time chat, collaborative editing sync, interactive games, any protocol that feels more like a conversation than a transaction.

Real-world example: Chat application where both client and server send ChatMessage streams in parallel.

ELI5: Bidirectional streaming is like a walkie-talkie where both people can talk at the same time. Unary is like a phone call where you ask a question and wait for the full answer. Server streaming is a press conference — one question, the speaker talks until done. Client streaming is dictating a letter — you talk, they wait, then reply with “got it.”


gRPC Under the Hood

Why HTTP/2 Matters

gRPC runs on HTTP/2, and HTTP/2 gives it three things REST couldn’t have on HTTP/1.1:

Multiplexing. Multiple RPC calls share a single TCP connection without blocking each other. Compare to HTTP/1.1 where each request needed its own connection (or waited in line).

Binary framing. HTTP/2 speaks binary. No text parsing. Fits naturally with Protobuf’s binary serialization.

Header compression (HPACK). gRPC metadata (authentication tokens, trace IDs) is compressed across calls. The first call sends full headers; subsequent calls send diffs.

Trailers: The Unusual Part

gRPC uses HTTP/2 trailers — headers sent after the response body — to convey the final status code. This is unusual because most HTTP tooling ignores trailers entirely.

HTTP/2 Response Flow:
  HEADERS frame    → response metadata (content-type, etc.)
  DATA frame(s)    → response body (protobuf-encoded)
  HEADERS frame    → trailers: grpc-status=0, grpc-message=""

This design lets gRPC stream data AND still report a final status (success or error) after all the data is sent. You can’t put the status in the initial headers because you don’t know if the streaming will succeed until it’s done.

ELI5: Trailers are like the “end of letter” footer that says “everything I told you was accurate.” The status code comes after the data because for streaming calls, the server doesn’t know if everything succeeded until all the data is sent. It’s the difference between signing a receipt before or after the boxes are unloaded.

Deadlines, Metadata, Status Codes

Deadlines are how gRPC propagates timeouts across service boundaries. When Service A calls Service B with a 500ms deadline and Service B calls Service C, Service C automatically receives the remaining deadline (say 380ms). If C takes too long, the whole chain gets cancelled. Always set deadlines — the default is no timeout, which means resource leaks under failure.

Metadata is gRPC’s equivalent of HTTP headers. Key-value pairs attached to a call: auth tokens, trace IDs, request IDs. Sent as HTTP/2 headers (and HPACK-compressed).

gRPC status codes are not HTTP status codes. Know the important ones:

gRPC StatusHTTP AnalogWhen it occurs
OK (0)200Success
CANCELLED (1)499Client cancelled the request
DEADLINE_EXCEEDED (4)504Timeout hit (server or propagated)
NOT_FOUND (5)404Resource doesn’t exist
ALREADY_EXISTS (6)409Conflict/duplicate
PERMISSION_DENIED (7)403Authorized but not allowed
UNAUTHENTICATED (16)401Not authenticated
RESOURCE_EXHAUSTED (8)429Rate limited or quota exceeded
INTERNAL (13)500Unexpected server error
UNAVAILABLE (14)503Service temporarily down (safe to retry)

gRPC vs REST — The Real Comparison

DimensionREST + JSONgRPC + Protobuf
PerformanceBaseline2-10x faster serialization
Type safetyOpenAPI (optional, often stale)Enforced at compile time
StreamingHacked (SSE, WebSocket)Native (4 patterns)
Browser supportNativegRPC-Web proxy required
Schema evolutionManual disciplineBuilt into protobuf rules
Debuggingcurl-friendly, human-readableNeeds grpcurl or Postman gRPC
CodegenOptional (swagger-codegen)Built-in, first-class
Ecosystem toolingMassiveGrowing, smaller
Learning curveLowMedium (proto files, codegen workflow)
Error modelHTTP status codesgRPC status codes

ELI5: REST is a car everyone knows how to drive — comfortable, familiar, works everywhere. gRPC is a sports car that goes 3x faster and handles better, but you need a special license, it doesn’t fit in every parking garage (no native browser support), and when it breaks you need a different mechanic.

The decision framework:

  • External API (browsers, third parties, mobile apps) → REST. Always. gRPC-Web exists but adds operational complexity for marginal gains.
  • Internal service-to-service → seriously consider gRPC, especially if you have streaming needs or high throughput.
  • Mixed: expose REST externally via a gateway, use gRPC internally. This is the common pattern in mature microservice architectures.

Common mistake: Defaulting to REST for internal services because it’s familiar, then discovering 18 months later that JSON parsing is 8% of your CPU budget. The switching cost is high — protobuf requires a migration of every client and server simultaneously.


Load Balancing gRPC

This is the part that catches people by surprise.

The HTTP/2 Connection Problem

HTTP/2 multiplexes all requests over a single long-lived connection. This is great for performance. It’s terrible for load balancing.

A traditional L4 load balancer (TCP-level) sees one TCP connection from Client → LB → Backend A. All 10,000 RPC calls that connection carries go to Backend A. Backend B-D sit idle. You’ve accidentally built a single-server system.

L4 Load Balancer (WRONG for gRPC):
  Client ─── TCP conn ──► LB ─── TCP conn ──► Backend A (100% load)
                                              Backend B (0% load)
                                              Backend C (0% load)

L7 Load Balancing (Correct)

An L7 load balancer understands HTTP/2 frames and can distribute individual RPC calls across backends, even though they share one TCP connection from client to LB.

L7 Load Balancer (correct):
  Client ─── HTTP/2 conn ──► LB ─┬─ HTTP/2 conn ──► Backend A
                                  ├─ HTTP/2 conn ──► Backend B
                                  └─ HTTP/2 conn ──► Backend C
  (each RPC call routed to a different backend)

Options:

ApproachHow it worksWhen to use
L7 proxy (Envoy, nginx)Proxy understands HTTP/2 framesStandard service mesh setup
Client-side LBClient knows all backends, picks one per callSmall services, no mesh
Service mesh (Istio, Linkerd)Sidecar proxy handles it transparentlyK8s environments
Look-aside LBExternal balancer tells client where to connect (xDS)Large-scale, Google-style

ELI5: A regular load balancer is like a phone operator who connects your call to one receptionist and leaves it there forever. All your follow-up questions go to that same person even if they’re overwhelmed. An L7 load balancer is like a smart operator who routes each individual question to whoever is free right now.

Common mistake: Deploying gRPC behind an AWS ALB (L4 by default) and wondering why one pod has 100% CPU and the others are idle. ALB supports gRPC at L7 since 2020, but you must explicitly configure it. Nginx needs grpc_pass not proxy_pass.


Error Handling and Resilience

Retries

gRPC has a built-in retry policy you can configure in the service config (no code changes needed):

{
  "retryPolicy": {
    "maxAttempts": 4,
    "initialBackoff": "0.1s",
    "maxBackoff": "1s",
    "backoffMultiplier": 2,
    "retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
  }
}

Only retry idempotent operations or codes known to be safe (UNAVAILABLE means the server never processed the request). Never retry INTERNAL or ALREADY_EXISTS blindly.

Deadlines (Non-Negotiable)

Always set a deadline on every call. No exceptions. A call without a deadline will wait forever if the downstream service hangs, consuming thread/goroutine, connection, and memory — all the way up the call chain.

Common mistake: Setting a 30-second deadline because “that seems safe.” Under failure, you have 30 seconds × concurrent callers × chain depth of resource leakage before anything gives up. 1-5 seconds is usually the right range. If your operation genuinely needs 30 seconds, you want an async job, not a synchronous RPC.

Hedging

Send the same request to multiple backends simultaneously, return the first response, cancel the rest. Trades higher server load for lower tail latency (p99/p99.9).

Use hedging when: latency is more important than cost, and operations are read-only/idempotent.

Health Checking Protocol

gRPC standardizes health checking via the grpc.health.v1.Health service (a spec, not just a convention). Load balancers and orchestration systems query Check RPC — backend responds with SERVING, NOT_SERVING, or UNKNOWN. Prefer this over HTTP /health endpoints for gRPC services.


Advanced Patterns

Interceptors

Interceptors are middleware for gRPC. They wrap each call, both on the client and server side, without modifying the actual RPC handlers. The standard way to add cross-cutting concerns:

  • Auth: verify JWT/mTLS on every incoming call
  • Logging: log request/response metadata, status codes, latency
  • Tracing: inject/propagate OpenTelemetry trace context
  • Metrics: record RPC duration, error rates per method
  • Rate limiting: per-client or per-method throttling

ELI5: Interceptors are like security guards and timekeepers at a building entrance — every person (RPC call) passes through them automatically. The guards check ID (auth interceptor), the timekeepers note arrival time (metrics), and a camera takes a photo (logging). The office workers inside don’t need to know any of this happens.

gRPC-Web

Browsers can’t make native gRPC calls because they don’t have direct access to HTTP/2 trailers (the fetch API hides them). gRPC-Web is a slightly different wire protocol that works within browser constraints, mediated by a proxy (usually Envoy) that translates gRPC-Web ↔ gRPC on the server side.

This adds operational complexity. For browser-facing APIs, REST usually wins unless you have a strong reason (heavy bidirectional streaming, enforced schema sharing between frontend and backend).

Connect Protocol

Buf’s Connect is a newer HTTP-compatible RPC protocol that works natively with HTTP/1.1, HTTP/2, and HTTP/3 without needing gRPC-Web or special proxies. It’s backward compatible with gRPC (connect clients can talk to gRPC servers and vice versa). Worth watching — it solves the browser support problem more cleanly.

Reflection

gRPC servers can expose a Reflection service that lets clients discover what services and methods exist at runtime. This is how grpcurl works without needing the .proto file. Enable reflection in development/staging, consider disabling in production for security.

When NOT to Use gRPC

  • Public API (browser clients, external developers): REST is more accessible; gRPC is a higher barrier to entry
  • Simple CRUD with infrequent calls: the protobuf toolchain overhead isn’t worth it
  • Teams unfamiliar with generated code workflows: the cognitive overhead of managing .proto files and codegen is real
  • Services that need to be called from a shell script: curl your REST endpoint, not grpcurl

Protobuf Alternatives

FormatSerializationSchemaZero-copyBest for
ProtobufBinary, 3-10x smaller than JSONYes (.proto)NoGeneral RPC, gRPC
FlatBuffersBinary, zero-copyYes (.fbs)YesGames, embedded, max perf
Cap’n ProtoBinary, zero-copyYes (.capnp)YesLow-latency, same author as Protobuf v2
MessagePackBinary, ~2x smaller than JSONNoNoDrop-in JSON replacement
AvroBinaryYes (JSON schema)NoKafka, data pipelines
ThriftBinaryYes (.thrift)NoLegacy Facebook services

Unless you’re building a game engine or need memory-mapped binary data, Protobuf is the right default. The ecosystem, tooling, and gRPC integration aren’t matched by anything else.


Decision Table: gRPC vs REST

SituationChoose
Public API, external developersREST
Browser-first frontendREST
Internal microservices, high throughputgRPC
Polyglot team, type safety criticalgRPC
Streaming (server push, bidirectional)gRPC
Team new to the stackREST (lower ops overhead)
Existing REST API, adding streamingConsider gRPC for new services
Mobile app backendREST (simpler, better tooling)
Data pipeline between internal servicesgRPC or Protobuf over Kafka
Need request-level load balancinggRPC (with L7 LB or mesh)

The real answer is not one or the other — it’s REST at the boundary (external-facing), gRPC in the interior (service mesh). That’s the architecture most large companies converge on after operating both at scale.