Test Reliability and Flakiness
Causes of Flakiness
| Cause | Description | Example |
|---|---|---|
| Timing issues | Test proceeds before async operation completes | Assert before API response settles |
| Shared state | One test's side effect breaks another | Global DB row modified by two tests |
| Async behaviour | Event loop or queue processing order not guaranteed | WebSocket message arrives late |
| External dependencies | Third-party API rate limit or downtime | OAuth token refresh fails |
| Port conflicts | Two tests bind to the same port | Parallel integration tests |
| Clock dependency | Test relies on datetime.now() |
Time-based expiry check |
| Random ordering | Test passes only in specific order | Order-dependent fixtures |
Solutions
Wait Strategies
Avoid arbitrary fixed time.sleep() in test bodies. Use polling or event-driven waits instead.
Polling with timeout:
import time
import logging
logger = logging.getLogger(__name__)
def wait_until(condition_fn, timeout: float = 5.0, interval: float = 0.2) -> None:
deadline = time.monotonic() + timeout
while time.monotonic() < deadline:
if condition_fn():
return
logger.debug("Condition not met, retrying in %.1fs", interval)
time.sleep(interval)
raise TimeoutError(f"Condition not met within {timeout}s")
Usage:
def test_async_job_completes(api_client, job_id):
wait_until(
lambda: api_client.get(f"/jobs/{job_id}").json()["status"] == "done",
timeout=10.0,
)
Playwright explicit waits:
page.wait_for_selector('[data-testid="result"]', timeout=5000)
page.wait_for_response(lambda r: r.url.endswith("/api/data") and r.status == 200)
Retry Logic
See 01-execution-strategies.md for retry configuration.
Apply retry only at the boundaries (network call, external service). Never retry assertion failures caused by business logic bugs.
Isolation
| Isolation technique | What it solves |
|---|---|
| Function-scoped DB fixture | Shared state between tests |
| Unique test data (UUID-based IDs) | Collision in parallel runs |
Mocked clock (freezegun) |
Time-dependent test behaviour |
| WireMock / httpx mock | External service instability |
| Separate DB schema per worker | Parallel DB state conflicts |
Freeze time:
from freezegun import freeze_time
@freeze_time("2026-01-15 12:00:00")
def test_token_expires_after_one_hour(auth_service):
token = auth_service.create_token(expires_in=3600)
with freeze_time("2026-01-15 13:00:01"):
assert auth_service.is_expired(token)
Flakiness Detection
Track flaky tests systematically:
| Method | Description |
|---|---|
| Run suite N times in CI | pytest --count=5 (pytest-repeat) |
| Randomise test order | pytest-randomly |
| Record failure rates | CI metrics over time |
| Quarantine known flaky tests | @pytest.mark.xfail(strict=False) |
Quarantine Pattern
@pytest.mark.xfail(
reason="Flaky: external webhook delivery timing",
strict=False,
run=True,
)
def test_webhook_delivery_timing(api_client):
...
strict=False means: a pass is acceptable, a fail is not a hard failure.
This keeps the test visible without blocking CI.
Flakiness Risk Register
| Pattern | Flakiness risk | Mitigation |
|---|---|---|
time.sleep() in test |
High | Replace with wait_until |
| Hardcoded IDs | High | Use UUIDs or generated values |
| Global fixture state | High | Scope fixtures to function |
| Ordered test dependency | High | Use explicit fixtures, not ordering |
| External HTTP in unit test | Medium | Mock at HTTP boundary |
| Async test without timeout | Medium | Always set asyncio.wait_for timeout |