Pytest Playbook — Flakiness Debugging

Workflow for tests that pass locally but fail randomly in CI.

Common Causes

Cause	Symptom
Shared mutable state	Passes alone, fails with other tests
Time-dependent logic	Fails around midnight or timezone boundaries
Network dependency	Timeout or DNS failure in CI
Order dependency	Fails only with specific seed
Environment drift	Locale, timezone, OS differ from local

Fast Triage Commands

# re-run only last failed test with full output
pytest --lf -x -vv
# show slowest 20 tests — often a clue for race conditions
pytest --durations=20
# run with short traceback for quick scan
pytest --maxfail=1 --tb=short

Detect Order Dependency

# run with two different random seeds
pytest --randomly-seed=42
pytest --randomly-seed=99

If one seed fails and another passes — tests have hidden coupling.

Isolation Checklist

No module-level mutable state (global dicts, class variables).
No writes to shared temp paths (use tmp_path fixture).
No reliance on test execution order.
No real network calls in unit tests.
No real time in assertions (use freezegun).

Stabilize Async Tests

import asyncio
import time


async def wait_until(predicate, timeout: float = 2.0, step: float = 0.05):
    """Poll predicate instead of arbitrary sleep."""
    start = time.monotonic()
    while time.monotonic() - start < timeout:
        if predicate():
            return True
        await asyncio.sleep(step)
    raise TimeoutError(f"Condition not met within {timeout}s")

Rules: - Await explicit conditions, not arbitrary sleeps. - Close background tasks in fixture teardown. - Set asyncio_mode = "auto" to avoid missing markers.

CI-Specific Safeguards

Run -n auto only after tests are isolation-safe.
Run nightly with randomized order (pytest-randomly).
Fail on warnings: filterwarnings = ["error"].
Keep retry plugins as temporary mitigation only.

Debug Evidence to Capture

Random seed value (--randomly-seed).
Failed test node ID (full path).
Log snippets (caplog output).
Environment: python -V, OS, TZ.
Exact repro command.

Remediation Strategy

Reproduce locally with same seed and env.
Minimize test to smallest failing case.
Fix root cause (state leak, race, network call).
Add regression test.
Remove temporary rerun workaround.