Core Metrics

Response Time (Latency)

Time from the moment a request is sent until the full response is received.

Components

DNS resolution → TCP handshake → TLS handshake → Server processing → Network transfer

Each step can be a bottleneck. A slow DNS resolver adds latency even if the server is fast.

Percentiles (Critical Concept)

Never trust the average. Average hides outliers. Use percentiles.

Percentile	Meaning
p50	50% of requests are faster than this — the typical user
p90	90% of requests are faster — high-load users
p95	SLA baseline — the standard contract metric
p99	Worst-case tail latency — affects 1 in 100 users
Max	Single worst request — often a fluke, but worth monitoring

Why Tail Latency Matters

A p99 of 5 seconds means 1 in 100 users waits 5 seconds. On a system with 1M daily users, that is 10,000 painful experiences per day.

Average: 80ms  →  looks healthy
p99:     4200ms →  10,000 users/day have a broken experience

RPS — Requests Per Second

Number of requests the system processes per second.

Represents load intensity from the client perspective
In Locust: controlled by number of users + wait time between requests
RPS is an input metric — you set it. Latency is the output — the system responds.

Throughput

Amount of data transferred per second (bytes/sec), not request count.

Metric	Counts
RPS	Requests
Throughput	Bytes

A single file upload request may have low RPS but very high throughput. A health-check endpoint may have high RPS but near-zero throughput.

Concurrency

Number of active users or in-flight requests at any given moment.

Little's Law

Concurrency = RPS × Response Time (in seconds)

Example: 100 RPS × 0.5s latency = 50 concurrent requests in flight at all times. When latency increases, concurrency grows — consuming more connections, threads, memory.

Error Rate

Percentage of failed requests over total requests.

Error Type	Examples
HTTP 4xx	Bad request, unauthorized, not found
HTTP 5xx	Server error, gateway timeout
Timeout	Response took longer than client threshold
Network failure	Connection refused, reset

Healthy systems target < 1% error rate under expected load. Error rate should be the first metric to alert on.

Latency Distribution

A histogram of all response times over the test duration.

Shows whether the distribution is normal (bell curve) or bimodal (two peaks)
Two peaks often indicate two code paths: fast cache hit vs slow DB query
Long tail on the right = p99 problem worth investigating