Logging, Reporting & Observability
Structured Test Logging
Tests must produce logs that help diagnose failures without re-running.
Logger Setup
import logging
import sys
def configure_logging(level: str = "INFO") -> None:
logging.basicConfig(
level=getattr(logging, level.upper()),
format="%(asctime)s %(levelname)-8s %(name)s: %(message)s",
datefmt="%H:%M:%S",
stream=sys.stdout,
)
# Silence noisy third-party loggers
logging.getLogger("playwright").setLevel(logging.WARNING)
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("httpx").setLevel(logging.WARNING)
Logging Test Steps
import logging
log = logging.getLogger(__name__)
class UsersClient:
def create(self, data: dict) -> dict:
log.debug("POST /users payload=%s", data)
response = self._driver.post("/users", json=data)
log.debug("Response %d: %s", response.status_code, response.text[:200])
response.raise_for_status()
result = response.json()
log.info("User created: id=%s email=%s", result.get("id"), result.get("email"))
return result
Log levels in tests:
- DEBUG — request/response bodies, selector values
- INFO — test step completion, resource IDs
- WARNING — retry attempts, slow operations
- ERROR — unexpected failures with context
Request/Response Logging
Log every HTTP call for failure diagnosis:
class HttpDriver:
def _log_request(self, method: str, url: str, **kwargs) -> None:
log.debug(
"%s %s | body=%s headers=%s",
method.upper(),
url,
str(kwargs.get("json", ""))[:300],
{k: v for k, v in kwargs.get("headers", {}).items() if k != "Authorization"},
)
def _log_response(self, response) -> None:
log.debug(
"Response %d %s | body=%s",
response.status_code,
response.url,
response.text[:300],
)
Reporting
Allure Report
Allure is the standard for rich test reports in 2026.
uv add allure-pytest
uv run pytest --alluredir=test-results/allure-results
allure serve test-results/allure-results
import allure
@allure.feature("User Management")
@allure.story("User Registration")
def test_register_with_valid_email(api_client):
with allure.step("Create user payload"):
payload = UserBuilder().build()
with allure.step("POST /users"):
response = api_client.post("/users", json=payload)
with allure.step("Verify 201 Created"):
assert response.status_code == 201
HTML Report (lightweight)
uv run pytest --html=test-results/report.html --self-contained-html
JUnit XML (for CI integration)
uv run pytest --junitxml=test-results/junit.xml
Most CI systems (GitHub Actions, Jenkins, GitLab) parse JUnit XML natively.
Screenshots & Videos
Screenshot on Failure (Playwright)
# conftest.py
import pytest
from pathlib import Path
@pytest.hookimpl(tryfirst=True, hookwrapper=True)
def pytest_runtest_makereport(item, call):
outcome = yield
report = outcome.get_result()
if report.when == "call" and report.failed:
page = item.funcargs.get("page")
if page:
path = Path("test-results/screenshots") / f"{item.nodeid.replace('/', '_')}.png"
path.parent.mkdir(parents=True, exist_ok=True)
page.screenshot(path=str(path), full_page=True)
Video Recording
# playwright.config.py equivalent in pytest-playwright
@pytest.fixture(scope="session")
def browser_context_args(browser_context_args):
return {
**browser_context_args,
"record_video_dir": "test-results/videos",
"record_video_size": {"width": 1280, "height": 720},
}
Observability Metrics
Track these to monitor framework health over time:
| Metric | Description | Target |
|---|---|---|
| Test duration (p95) | 95th percentile test execution time | < 30s per test |
| Suite duration | Total CI pipeline test stage time | < 10 min |
| Pass rate | Tests passing / total tests | > 99% on main |
| Flakiness rate | Tests that passed on rerun / total | < 1% |
| Coverage delta | Coverage change per PR | Never decrease |
Collecting Metrics in CI
# GitHub Actions
- name: Publish test metrics
run: |
uv run python scripts/collect_metrics.py \
--junit test-results/junit.xml \
--output test-results/metrics.json
Parse JUnit XML to extract total, passed, failed, duration_seconds, pass_rate.
Feed the output into Datadog, Grafana, or a Slack notification.