Performance
11.1 Latency
Sources of latency
| Source | Typical cost | Mitigation |
|---|---|---|
| Network round-trip | 10–200 ms | CDN, edge caching, region selection |
| DNS resolution | 50–100 ms | DNS prefetch, keep-alive |
| TLS handshake | 1–3 RTT | TLS session resumption, HTTP/3 |
| Server processing | Variable | DB indexes, caching, async processing |
| Serialization | 1–10 ms | Binary format (protobuf), sparse fieldsets |
Reducing latency
- Use CDN for static assets and cacheable API responses.
- Use edge caching to serve common requests without hitting the origin.
- Keep API responses small — sparse fieldsets, pagination.
- Use HTTP/2 or HTTP/3 to eliminate per-request connection overhead.
11.2 Throughput
Requests per second the system can handle at acceptable latency.
Load balancing
Distribute load across instances. Each instance processes its share. Throughput scales linearly with instances if the service is stateless.
Batching
Process multiple items in one operation:
- DB batch inserts instead of one-by-one rows.
- Kafka consumer batch: process 100 messages per tick.
- Reduces per-item overhead significantly at high volume.
# One batch insert — not 1000 individual INSERTs
await db.execute_many(
"INSERT INTO events (type, payload) VALUES ($1, $2)",
[(e.type, e.payload) for e in batch],
)
11.3 Payload Optimization
Large payloads cost bandwidth and processing time on both sides.
Compression
Enable gzip or brotli on your server or reverse proxy.
- gzip: 60–80% size reduction for JSON.
- brotli: slightly better compression than gzip, supported by all modern browsers.
Content-Encoding: gzipheader tells the client to decompress.- Threshold: compress responses larger than 1 KB. Compression on small payloads wastes CPU.
# Nginx — enable gzip for JSON responses
gzip on;
gzip_types application/json text/plain;
gzip_min_length 1024;
Binary format (gRPC / protobuf)
Protobuf payloads are 3–10× smaller than equivalent JSON.
- No field names in binary payload — field numbers are used instead.
- Faster serialization due to schema-driven encoding.
- Use for high-volume internal service calls.
Sparse fieldsets
Client requests only the fields it needs:
- REST:
?fields=id,name,email - GraphQL: client specifies exact fields in the query
Reduces payload size and server-side serialization work.
11.4 Performance Targets
| Metric | Target | Action when exceeded |
|---|---|---|
| p50 latency | < 50 ms | Optimize hot paths |
| p95 latency | < 300 ms | Add caching, optimize DB queries |
| p99 latency | < 1000 ms | Check tail-latency outliers |
| Error rate | < 0.1% | Reliability review |
| Throughput | Per SLO | Scale horizontally |
Always measure p95 and p99, not only average. Average hides outliers that users actually experience.
Key Rules
- Measure first. Do not optimize without profiling data.
- Enable compression at the reverse proxy level — not in application code.
- Set SLO targets for p95 and p99, not just average latency.
- Use binary serialization (protobuf) for internal high-volume calls.
- Apply sparse fieldsets on any endpoint that returns large objects.