gRPC & Protocol Buffers
gRPC & Protocol Buffers
REST+JSON is the default choice for APIs because it’s familiar, curl-friendly, and works in every browser. That’s fine for public-facing APIs. But when you’re building 50 microservices talking to each other thousands of times per second, “curl-friendly” stops being a virtue and starts being a tax you pay on every request.
gRPC is what happens when you stop treating internal service communication like an HTTP endpoint and start treating it like a function call — because that’s what it actually is.
Why gRPC Exists
The REST+JSON Problem
REST over JSON has four quiet performance killers that most people don’t think about until they’re running at scale:
Text parsing overhead. JSON is human-readable, which means it’s not machine-optimal. Parsing {"user_id": 12345, "active": true} requires tokenizing text, type inference, and allocating strings. A binary format that says “field 1 is an int32 with value 12345, field 2 is a bool with value true” is parsed in a single memory read.
No streaming. HTTP/1.1 is request-response. You ask, you wait, you get one answer. If you want a live feed of events, you’re hacking it with Server-Sent Events or WebSockets — neither of which feels like a natural extension of REST semantics.
No type safety. JSON has no schema enforcement at the wire level. The server says a field is an integer, the client receives it as a float, someone adds a new field and forgets to update the documentation. OpenAPI/Swagger helps, but it’s optional and often stale.
Schema discovery. With REST, there’s no standard way for a client to ask “what endpoints exist and what do they accept?” Swagger docs help but require manual maintenance. gRPC has reflection built in.
ELI5: REST+JSON for internal services is like two developers communicating via sticky notes — readable, flexible, but slow to write and easy to misread. gRPC is like using a shared type-safe function library — both sides compile against the same contract, so typos get caught at build time, not 2 AM on-call.
What gRPC Actually Is
gRPC is three things bundled together:
- HTTP/2 transport — multiplexed, binary, bi-directional
- Protocol Buffers serialization — binary, schema-first, 3-10x smaller than JSON
- Code generation —
protoccompiles.protofiles into client + server stubs in 10+ languages
Google built an internal version called Stubby in the early 2000s. It ran essentially every Google service-to-service call. In 2015, they rebuilt it on HTTP/2 (instead of a proprietary protocol) and open-sourced it as gRPC. The design is proven at Google-scale before most companies get anywhere near the problem.
When gRPC shines:
- Internal microservice communication
- Polyglot environments (Go services calling Python services calling Java services)
- Streaming workloads: real-time feeds, file uploads, bidirectional sync
- High-throughput, low-latency APIs where serialization overhead is measurable
- Anywhere you want type safety enforced at the compiler level
Protocol Buffers (Protobuf)
Schema Definition
Everything starts with a .proto file. This is the contract between client and server — both sides compile against it, and the compiler generates the boring boilerplate.
syntax = "proto3";
package user.v1;
service UserService {
rpc GetUser(GetUserRequest) returns (GetUserResponse);
rpc ListUsers(ListUsersRequest) returns (stream User);
}
message GetUserRequest {
int64 user_id = 1;
}
message GetUserResponse {
User user = 1;
}
message User {
int64 id = 1;
string email = 2;
bool is_active = 3;
repeated string roles = 4;
}
Run protoc --go_out=. --go-grpc_out=. user.proto and you get type-safe client and server code in Go (or Python, Java, Rust, C++, etc.).
ELI5: A
.protofile is like a legal contract both sides sign before talking. Once both parties compile against the same.proto, the compiler guarantees they’re speaking the same language. If you change the contract on one side without updating the other, the build breaks — which is much better than finding out at runtime.
Binary Serialization: Why It’s Faster
Protobuf uses a binary wire format. Each field is encoded as:
[field_number << 3 | wire_type] [value]
Field numbers, not field names, appear on the wire. The string "user_id" never travels. Instead, field 1 (a varint) carries the value 12345.
| Format | Size for {“user_id”: 12345, “active”: true} | Parse time (relative) |
|---|---|---|
| JSON | 33 bytes | 1x (baseline) |
| XML | ~70 bytes | ~2x slower |
| Protobuf | 4 bytes | 20-100x faster |
Protobuf is 3-10x smaller and 20-100x faster to parse than JSON in practice. At 100k RPS, that difference is measurable in CPU budget.
ELI5: JSON is like shipping a package and writing “This box contains: shoes, size 42, color blue” on the outside in full sentences. Protobuf is like using a barcode that both sender and receiver know means “item #3, size 42, blue.” The barcode is tiny and scanned instantly; the sentence takes time to read and parse.
Field Numbers and Wire Compatibility
This is the part people get wrong. Field numbers are forever. Once you ship field 1 as user_id, that number is burned for the lifetime of the message.
Schema evolution rules (memorize these):
| Action | Safe? | Why |
|---|---|---|
| Add a new field with a new number | Yes | Old clients ignore unknown fields |
| Remove a field | Yes, but… | Mark as reserved so the number is never reused |
| Rename a field | Yes | Names don’t appear on the wire |
| Change a field’s type | Rarely | Only compatible pairs (int32 → int64 is ok; int32 → string is not) |
| Reuse a deleted field number | Never | Old clients will misinterpret the data |
| Change a field number | Never | This is the same as deleting one and adding another |
message User {
reserved 5, 6; // old fields, numbers retired forever
reserved "old_name"; // old field name, never reuse
int64 id = 1;
string email = 2;
}
ELI5: Field numbers are like apartment numbers in a building. If apartment 5 used to be a storage room and you tear it down, you don’t renumber 6, 7, 8… to 5, 6, 7. You leave apartment 5 permanently retired, or mail will get delivered to the wrong place. Old clients don’t know the layout changed.
proto2 vs proto3
proto3 (2016+) is what you should use. proto2 had required fields, explicit default values, and other complexity that caused more problems than it solved. In proto3: all fields are optional, zero values are the default, and required is gone. The simplification was intentional — Google’s experience showed that required fields made schema evolution painful.
The Four RPC Patterns
This is gRPC’s biggest differentiator from REST. Not just request-response — four distinct communication patterns.
service ChatService {
// Unary: one request, one response
rpc SendMessage(SendMessageRequest) returns (SendMessageResponse);
// Server streaming: one request, stream of responses
rpc StreamMessages(StreamRequest) returns (stream Message);
// Client streaming: stream of requests, one response
rpc UploadFile(stream FileChunk) returns (UploadResponse);
// Bidirectional: stream in both directions simultaneously
rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}
Unary
One request, one response. Functionally identical to a REST endpoint. Use this for anything where you have a single question and expect a single answer: fetching a record, creating a resource, running a calculation.
Real-world example: GetUser(user_id=123) → User{...}
Server Streaming
Client sends one request, server responds with a stream. The connection stays open and the server pushes responses as they’re available.
When to use: Live feed of events, large paginated result sets (stream 10k records instead of paginating), real-time price updates, log tailing.
Real-world example: SubscribeToTrades(symbol="AAPL") → stream of Trade{...} events until the client disconnects.
ELI5: Server streaming is like calling a sports hotline that reads you every score update live instead of just today’s final score. You call once, stay on the line, and they push updates to you as they happen.
Client Streaming
Client sends a stream of requests, server responds once at the end.
When to use: File uploads, batch inserts, telemetry aggregation where you’re sending many small measurements and want one summary back.
Real-world example: Stream 10,000 LogEntry messages → server responds with IngestResponse{records_written: 10000}
Bidirectional Streaming
Both sides send independent streams simultaneously. Either side can close their stream independently.
When to use: Real-time chat, collaborative editing sync, interactive games, any protocol that feels more like a conversation than a transaction.
Real-world example: Chat application where both client and server send ChatMessage streams in parallel.
ELI5: Bidirectional streaming is like a walkie-talkie where both people can talk at the same time. Unary is like a phone call where you ask a question and wait for the full answer. Server streaming is a press conference — one question, the speaker talks until done. Client streaming is dictating a letter — you talk, they wait, then reply with “got it.”
gRPC Under the Hood
Why HTTP/2 Matters
gRPC runs on HTTP/2, and HTTP/2 gives it three things REST couldn’t have on HTTP/1.1:
Multiplexing. Multiple RPC calls share a single TCP connection without blocking each other. Compare to HTTP/1.1 where each request needed its own connection (or waited in line).
Binary framing. HTTP/2 speaks binary. No text parsing. Fits naturally with Protobuf’s binary serialization.
Header compression (HPACK). gRPC metadata (authentication tokens, trace IDs) is compressed across calls. The first call sends full headers; subsequent calls send diffs.
Trailers: The Unusual Part
gRPC uses HTTP/2 trailers — headers sent after the response body — to convey the final status code. This is unusual because most HTTP tooling ignores trailers entirely.
HTTP/2 Response Flow:
HEADERS frame → response metadata (content-type, etc.)
DATA frame(s) → response body (protobuf-encoded)
HEADERS frame → trailers: grpc-status=0, grpc-message=""
This design lets gRPC stream data AND still report a final status (success or error) after all the data is sent. You can’t put the status in the initial headers because you don’t know if the streaming will succeed until it’s done.
ELI5: Trailers are like the “end of letter” footer that says “everything I told you was accurate.” The status code comes after the data because for streaming calls, the server doesn’t know if everything succeeded until all the data is sent. It’s the difference between signing a receipt before or after the boxes are unloaded.
Deadlines, Metadata, Status Codes
Deadlines are how gRPC propagates timeouts across service boundaries. When Service A calls Service B with a 500ms deadline and Service B calls Service C, Service C automatically receives the remaining deadline (say 380ms). If C takes too long, the whole chain gets cancelled. Always set deadlines — the default is no timeout, which means resource leaks under failure.
Metadata is gRPC’s equivalent of HTTP headers. Key-value pairs attached to a call: auth tokens, trace IDs, request IDs. Sent as HTTP/2 headers (and HPACK-compressed).
gRPC status codes are not HTTP status codes. Know the important ones:
| gRPC Status | HTTP Analog | When it occurs |
|---|---|---|
| OK (0) | 200 | Success |
| CANCELLED (1) | 499 | Client cancelled the request |
| DEADLINE_EXCEEDED (4) | 504 | Timeout hit (server or propagated) |
| NOT_FOUND (5) | 404 | Resource doesn’t exist |
| ALREADY_EXISTS (6) | 409 | Conflict/duplicate |
| PERMISSION_DENIED (7) | 403 | Authorized but not allowed |
| UNAUTHENTICATED (16) | 401 | Not authenticated |
| RESOURCE_EXHAUSTED (8) | 429 | Rate limited or quota exceeded |
| INTERNAL (13) | 500 | Unexpected server error |
| UNAVAILABLE (14) | 503 | Service temporarily down (safe to retry) |
gRPC vs REST — The Real Comparison
| Dimension | REST + JSON | gRPC + Protobuf |
|---|---|---|
| Performance | Baseline | 2-10x faster serialization |
| Type safety | OpenAPI (optional, often stale) | Enforced at compile time |
| Streaming | Hacked (SSE, WebSocket) | Native (4 patterns) |
| Browser support | Native | gRPC-Web proxy required |
| Schema evolution | Manual discipline | Built into protobuf rules |
| Debugging | curl-friendly, human-readable | Needs grpcurl or Postman gRPC |
| Codegen | Optional (swagger-codegen) | Built-in, first-class |
| Ecosystem tooling | Massive | Growing, smaller |
| Learning curve | Low | Medium (proto files, codegen workflow) |
| Error model | HTTP status codes | gRPC status codes |
ELI5: REST is a car everyone knows how to drive — comfortable, familiar, works everywhere. gRPC is a sports car that goes 3x faster and handles better, but you need a special license, it doesn’t fit in every parking garage (no native browser support), and when it breaks you need a different mechanic.
The decision framework:
- External API (browsers, third parties, mobile apps) → REST. Always. gRPC-Web exists but adds operational complexity for marginal gains.
- Internal service-to-service → seriously consider gRPC, especially if you have streaming needs or high throughput.
- Mixed: expose REST externally via a gateway, use gRPC internally. This is the common pattern in mature microservice architectures.
Common mistake: Defaulting to REST for internal services because it’s familiar, then discovering 18 months later that JSON parsing is 8% of your CPU budget. The switching cost is high — protobuf requires a migration of every client and server simultaneously.
Load Balancing gRPC
This is the part that catches people by surprise.
The HTTP/2 Connection Problem
HTTP/2 multiplexes all requests over a single long-lived connection. This is great for performance. It’s terrible for load balancing.
A traditional L4 load balancer (TCP-level) sees one TCP connection from Client → LB → Backend A. All 10,000 RPC calls that connection carries go to Backend A. Backend B-D sit idle. You’ve accidentally built a single-server system.
L4 Load Balancer (WRONG for gRPC):
Client ─── TCP conn ──► LB ─── TCP conn ──► Backend A (100% load)
Backend B (0% load)
Backend C (0% load)
L7 Load Balancing (Correct)
An L7 load balancer understands HTTP/2 frames and can distribute individual RPC calls across backends, even though they share one TCP connection from client to LB.
L7 Load Balancer (correct):
Client ─── HTTP/2 conn ──► LB ─┬─ HTTP/2 conn ──► Backend A
├─ HTTP/2 conn ──► Backend B
└─ HTTP/2 conn ──► Backend C
(each RPC call routed to a different backend)
Options:
| Approach | How it works | When to use |
|---|---|---|
| L7 proxy (Envoy, nginx) | Proxy understands HTTP/2 frames | Standard service mesh setup |
| Client-side LB | Client knows all backends, picks one per call | Small services, no mesh |
| Service mesh (Istio, Linkerd) | Sidecar proxy handles it transparently | K8s environments |
| Look-aside LB | External balancer tells client where to connect (xDS) | Large-scale, Google-style |
ELI5: A regular load balancer is like a phone operator who connects your call to one receptionist and leaves it there forever. All your follow-up questions go to that same person even if they’re overwhelmed. An L7 load balancer is like a smart operator who routes each individual question to whoever is free right now.
Common mistake: Deploying gRPC behind an AWS ALB (L4 by default) and wondering why one pod has 100% CPU and the others are idle. ALB supports gRPC at L7 since 2020, but you must explicitly configure it. Nginx needs
grpc_passnotproxy_pass.
Error Handling and Resilience
Retries
gRPC has a built-in retry policy you can configure in the service config (no code changes needed):
{
"retryPolicy": {
"maxAttempts": 4,
"initialBackoff": "0.1s",
"maxBackoff": "1s",
"backoffMultiplier": 2,
"retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
}
}
Only retry idempotent operations or codes known to be safe (UNAVAILABLE means the server never processed the request). Never retry INTERNAL or ALREADY_EXISTS blindly.
Deadlines (Non-Negotiable)
Always set a deadline on every call. No exceptions. A call without a deadline will wait forever if the downstream service hangs, consuming thread/goroutine, connection, and memory — all the way up the call chain.
Common mistake: Setting a 30-second deadline because “that seems safe.” Under failure, you have 30 seconds × concurrent callers × chain depth of resource leakage before anything gives up. 1-5 seconds is usually the right range. If your operation genuinely needs 30 seconds, you want an async job, not a synchronous RPC.
Hedging
Send the same request to multiple backends simultaneously, return the first response, cancel the rest. Trades higher server load for lower tail latency (p99/p99.9).
Use hedging when: latency is more important than cost, and operations are read-only/idempotent.
Health Checking Protocol
gRPC standardizes health checking via the grpc.health.v1.Health service (a spec, not just a convention). Load balancers and orchestration systems query Check RPC — backend responds with SERVING, NOT_SERVING, or UNKNOWN. Prefer this over HTTP /health endpoints for gRPC services.
Advanced Patterns
Interceptors
Interceptors are middleware for gRPC. They wrap each call, both on the client and server side, without modifying the actual RPC handlers. The standard way to add cross-cutting concerns:
- Auth: verify JWT/mTLS on every incoming call
- Logging: log request/response metadata, status codes, latency
- Tracing: inject/propagate OpenTelemetry trace context
- Metrics: record RPC duration, error rates per method
- Rate limiting: per-client or per-method throttling
ELI5: Interceptors are like security guards and timekeepers at a building entrance — every person (RPC call) passes through them automatically. The guards check ID (auth interceptor), the timekeepers note arrival time (metrics), and a camera takes a photo (logging). The office workers inside don’t need to know any of this happens.
gRPC-Web
Browsers can’t make native gRPC calls because they don’t have direct access to HTTP/2 trailers (the fetch API hides them). gRPC-Web is a slightly different wire protocol that works within browser constraints, mediated by a proxy (usually Envoy) that translates gRPC-Web ↔ gRPC on the server side.
This adds operational complexity. For browser-facing APIs, REST usually wins unless you have a strong reason (heavy bidirectional streaming, enforced schema sharing between frontend and backend).
Connect Protocol
Buf’s Connect is a newer HTTP-compatible RPC protocol that works natively with HTTP/1.1, HTTP/2, and HTTP/3 without needing gRPC-Web or special proxies. It’s backward compatible with gRPC (connect clients can talk to gRPC servers and vice versa). Worth watching — it solves the browser support problem more cleanly.
Reflection
gRPC servers can expose a Reflection service that lets clients discover what services and methods exist at runtime. This is how grpcurl works without needing the .proto file. Enable reflection in development/staging, consider disabling in production for security.
When NOT to Use gRPC
- Public API (browser clients, external developers): REST is more accessible; gRPC is a higher barrier to entry
- Simple CRUD with infrequent calls: the protobuf toolchain overhead isn’t worth it
- Teams unfamiliar with generated code workflows: the cognitive overhead of managing
.protofiles and codegen is real - Services that need to be called from a shell script:
curlyour REST endpoint, notgrpcurl
Protobuf Alternatives
| Format | Serialization | Schema | Zero-copy | Best for |
|---|---|---|---|---|
| Protobuf | Binary, 3-10x smaller than JSON | Yes (.proto) | No | General RPC, gRPC |
| FlatBuffers | Binary, zero-copy | Yes (.fbs) | Yes | Games, embedded, max perf |
| Cap’n Proto | Binary, zero-copy | Yes (.capnp) | Yes | Low-latency, same author as Protobuf v2 |
| MessagePack | Binary, ~2x smaller than JSON | No | No | Drop-in JSON replacement |
| Avro | Binary | Yes (JSON schema) | No | Kafka, data pipelines |
| Thrift | Binary | Yes (.thrift) | No | Legacy Facebook services |
Unless you’re building a game engine or need memory-mapped binary data, Protobuf is the right default. The ecosystem, tooling, and gRPC integration aren’t matched by anything else.
Decision Table: gRPC vs REST
| Situation | Choose |
|---|---|
| Public API, external developers | REST |
| Browser-first frontend | REST |
| Internal microservices, high throughput | gRPC |
| Polyglot team, type safety critical | gRPC |
| Streaming (server push, bidirectional) | gRPC |
| Team new to the stack | REST (lower ops overhead) |
| Existing REST API, adding streaming | Consider gRPC for new services |
| Mobile app backend | REST (simpler, better tooling) |
| Data pipeline between internal services | gRPC or Protobuf over Kafka |
| Need request-level load balancing | gRPC (with L7 LB or mesh) |
The real answer is not one or the other — it’s REST at the boundary (external-facing), gRPC in the interior (service mesh). That’s the architecture most large companies converge on after operating both at scale.