gRPC: Testing Plan and Risks

3.7 Testing Plan

Backward compatibility: add new field to .proto — old client still works correctly
Forward compatibility: old server handles messages with new unknown fields gracefully
Breaking changes detection: removed field, changed type, reused field number — must fail at compile time
Reserved fields: using a reserved field number — protoc reports compilation error
Proto linting: use buf lint or protolint to enforce style and compatibility rules

Encode/decode roundtrip: message -> bytes -> message = identical values
Default values: unset fields return default values (0 for int, "" for string, false for bool)
Unknown fields: preserved during deserialization, forwarded correctly
Large payloads: messages near max size limit (default 4 MB) — handled without error
Edge cases: empty strings, zero values, maximum int values, unicode text

Unary: send request, receive correct response, verify all fields
Server streaming: send request, receive correct sequence of messages, verify proper stream completion
Client streaming: send multiple messages, receive correct aggregated response
Bidirectional: messages flow in both directions, verify correct ordering
Cancellation: client cancels mid-stream — server receives cancellation signal, stops processing
Deadline exceeded: slow server response — client gets DEADLINE_EXCEEDED status

Latency: unary call p50 / p95 / p99 under normal load
Throughput: maximum RPCs per second before degradation
Streaming throughput: messages per second in server and bidirectional streams
Payload size impact: latency increase as message size grows
Connection scaling: performance with 1, 10, 100 concurrent connections

Structured logging: each RPC call produces a log entry with method, status, duration, request_id
Trace propagation: OpenTelemetry trace context passes through interceptors to downstream services
Span creation: each RPC creates a span with correct parent-child relationship
Error logging: failed RPCs log status code, error message, and metadata
Metrics export: request count, error rate, and latency histograms are exported to Prometheus

Flow control: fast sender does not overwhelm slow receiver
Slow consumer: server streaming to slow client — flow control activates, sender pauses
Buffer limits: message exceeds max size (default 4 MB) — returns RESOURCE_EXHAUSTED
Memory usage: streaming large datasets does not cause OOM

HTTP/2 dependency: gRPC requires HTTP/2. Some proxies and load balancers do not fully support it (e.g., older AWS ALB, some CDNs)
Browser limitations: browsers do not natively support gRPC. You need grpc-web proxy (Envoy) to bridge HTTP/1.1 to HTTP/2
Infrastructure complexity: need protoc toolchain, generated code management, HTTP/2-capable infrastructure across the stack

Binary debugging: you cannot read protobuf with curl or a browser. Need specialized tools: grpcurl, Postman gRPC, Kreya, or Evans
Tight coupling: client and server share the .proto contract. Schema changes need coordination across teams
Schema rigidity: strict types mean less flexibility than JSON. Adding optional context or metadata requires schema updates
Learning curve: teams familiar with REST need training on protobuf, streaming patterns, and gRPC tooling

Streaming pitfalls: streams cannot be load-balanced once started. Long-lived streams hold server resources and complicate scaling
Backpressure complexity: flow control is automatic but misconfigured buffer sizes cause hangs or OOM errors
Observability challenges: binary format makes logging harder. Need structured logging with decoded messages and proper tracing
Connection management: too few connections = bottleneck, too many = resource waste. Requires careful tuning