Skip to content

Test Reliability and Flakiness

Causes of Flakiness

Cause Description Example
Timing issues Test proceeds before async operation completes Assert before API response settles
Shared state One test's side effect breaks another Global DB row modified by two tests
Async behaviour Event loop or queue processing order not guaranteed WebSocket message arrives late
External dependencies Third-party API rate limit or downtime OAuth token refresh fails
Port conflicts Two tests bind to the same port Parallel integration tests
Clock dependency Test relies on datetime.now() Time-based expiry check
Random ordering Test passes only in specific order Order-dependent fixtures

Solutions

Wait Strategies

Avoid arbitrary fixed time.sleep() in test bodies. Use polling or event-driven waits instead.

Polling with timeout:

import time
import logging

logger = logging.getLogger(__name__)


def wait_until(condition_fn, timeout: float = 5.0, interval: float = 0.2) -> None:
    deadline = time.monotonic() + timeout
    while time.monotonic() < deadline:
        if condition_fn():
            return
        logger.debug("Condition not met, retrying in %.1fs", interval)
        time.sleep(interval)
    raise TimeoutError(f"Condition not met within {timeout}s")

Usage:

def test_async_job_completes(api_client, job_id):
    wait_until(
        lambda: api_client.get(f"/jobs/{job_id}").json()["status"] == "done",
        timeout=10.0,
    )

Playwright explicit waits:

page.wait_for_selector('[data-testid="result"]', timeout=5000)
page.wait_for_response(lambda r: r.url.endswith("/api/data") and r.status == 200)

Retry Logic

See 01-execution-strategies.md for retry configuration.

Apply retry only at the boundaries (network call, external service). Never retry assertion failures caused by business logic bugs.

Isolation

Isolation technique What it solves
Function-scoped DB fixture Shared state between tests
Unique test data (UUID-based IDs) Collision in parallel runs
Mocked clock (freezegun) Time-dependent test behaviour
WireMock / httpx mock External service instability
Separate DB schema per worker Parallel DB state conflicts

Freeze time:

from freezegun import freeze_time

@freeze_time("2026-01-15 12:00:00")
def test_token_expires_after_one_hour(auth_service):
    token = auth_service.create_token(expires_in=3600)
    with freeze_time("2026-01-15 13:00:01"):
        assert auth_service.is_expired(token)


Flakiness Detection

Track flaky tests systematically:

Method Description
Run suite N times in CI pytest --count=5 (pytest-repeat)
Randomise test order pytest-randomly
Record failure rates CI metrics over time
Quarantine known flaky tests @pytest.mark.xfail(strict=False)

Quarantine Pattern

@pytest.mark.xfail(
    reason="Flaky: external webhook delivery timing",
    strict=False,
    run=True,
)
def test_webhook_delivery_timing(api_client):
    ...

strict=False means: a pass is acceptable, a fail is not a hard failure. This keeps the test visible without blocking CI.


Flakiness Risk Register

Pattern Flakiness risk Mitigation
time.sleep() in test High Replace with wait_until
Hardcoded IDs High Use UUIDs or generated values
Global fixture state High Scope fixtures to function
Ordered test dependency High Use explicit fixtures, not ordering
External HTTP in unit test Medium Mock at HTTP boundary
Async test without timeout Medium Always set asyncio.wait_for timeout