Skip to content

Pitfalls, Real-World Scenarios & Heuristics

Common Pitfalls

Ignoring Percentiles

Using average response time as the success metric. Average hides the tail.

Average: 80ms — "looks great"
p99:     4200ms — 10,000 users/day have a broken experience

Fix: define SLOs on p95/p99. Never accept a test report that only shows averages.


Unrealistic Load

Testing with 1000 users all hitting the same endpoint simultaneously. Real users are spread across pages, have sessions, use different features.

Fix: model actual user flows with task weights, think time, and session state.


No Think Time

Locust users hammering endpoints with zero delay generates 10–100× more RPS than the same number of real users. Results are not comparable to production.

Fix: always set wait_time. Match production traffic patterns.


Testing Only Happy Paths

Load testing only successful flows misses the cost of error handling: retries, rollbacks, logging, notification emails — all add server load.

Fix: include realistic error scenarios: 404 lookups, validation failures, expired tokens.


No Monitoring

Running a load test with only Locust output means you see symptoms (slow/errors) but not causes (which resource is saturated).

Fix: always run with system metrics (Prometheus/Grafana, htop, DB slow query log).


Misinterpreting Averages

The average is distorted by a small number of extreme outliers. A system with 990ms p99 and 10ms p50 has average ~19ms — misleadingly healthy.


Real-World Scenarios

CRUD APIs

Focus: RPS + p95 latency per endpoint.

Key concern: DB query efficiency. Every endpoint has a DB query. Test with realistic data volumes — test with 10M rows, not 1000.


Realtime Systems (WebSocket, SSE)

Focus: concurrent connections + message latency.

Standard HTTP load testing does not apply. Use Locust with WebSocket support:

from locust import User, task, between
import websocket


class WsUser(User):
    wait_time = between(0.1, 0.5)

    def on_start(self) -> None:
        self.ws = websocket.create_connection("ws://localhost:8080/ws")

    @task
    def send_message(self) -> None:
        self.ws.send('{"type": "ping"}')
        self.ws.recv()

    def on_stop(self) -> None:
        self.ws.close()

Microservices

Focus: service-to-service latency, cascading failures.

One slow upstream service can degrade the entire call chain. Test with realistic inter-service latencies. Add artificial delays to dependencies to see how your service behaves when an upstream is slow (chaos engineering).


Large-Scale Systems

Focus: distributed load generation + cross-region observability.

Use Locust distributed mode. Correlate load test timing with: - CDN cache hit rates - Load balancer request distribution - Per-AZ latency


Engineering Heuristics (Senior Level)

Heuristic Why It Matters
Always analyze p95/p99, never average Average hides the users with bad experience
Correlate load with system metrics Symptoms are in Locust; causes are in the system
Use realistic scenarios, not endpoint hammering Test the system the way users use it
Focus on bottlenecks, not averages Fix the constraint, everything else follows
Measure before optimizing Intuition is wrong 80% of the time
Optimize tail latency, not mean p99 is what users feel, p50 is what engineers celebrate
Keep performance tests in CI Performance regressions caught in hours, not after deploy