Test Data Strategies & Isolation
Three Data Strategies
Static Data
Predefined, version-controlled. Loaded from YAML or JSON files. Good for: configuration data, lookup tables, reference values.
# data/fixtures/roles.yaml
roles:
- name: "ADMIN"
permissions: ["read", "write", "delete"]
- name: "USER"
permissions: ["read"]
- name: "VIEWER"
permissions: ["read"]
import yaml
import pytest
@pytest.fixture(scope="session")
def roles_data() -> dict:
with open("data/fixtures/roles.yaml") as f:
return yaml.safe_load(f)
Risks: - Static data drifts from the actual schema - Multiple tests sharing same static record cause interference
Dynamic Data
Created at test runtime, unique per test run. Good for: users, orders, products — anything tests write to or modify.
import uuid
def unique_email() -> str:
return f"test-{uuid.uuid4().hex[:8]}@example.com"
def unique_username() -> str:
return f"user_{uuid.uuid4().hex[:6]}"
Unique data eliminates test pollution without cleanup. Even if cleanup fails, the next run creates a new unique record.
Randomized Data
Faker-based realistic data for edge case coverage and fuzzing.
from faker import Faker
faker = Faker()
def realistic_user() -> dict:
return {
"name": faker.name(),
"email": faker.email(),
"address": faker.address(),
"phone": faker.phone_number(),
}
Risk: randomised tests are non-deterministic by default. Mitigation: seed Faker with a fixed value when reproducing failures.
faker = Faker()
Faker.seed(42) # reproducible in CI
Data Isolation
Principle: Each Test Owns Its Data
✓ Test creates its own user → modifies it → asserts → deletes it
✗ Test relies on user created by a previous test
✗ Test reads a shared user modified by another test
Isolation Techniques
Unique identifiers — the simplest approach. Each test creates data with a UUID-suffixed identifier. Collision probability is negligible. No cleanup required.
Transactional rollback — for database-touching tests. Wrap each test in a DB transaction, roll back after assertion. Fast. No cleanup code needed. Requires DB access from test process.
@pytest.fixture
def db_session(engine):
connection = engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
session.close()
transaction.rollback()
connection.close()
Dedicated test namespace — use a test-only tenant or prefix.
All test data lives under test_* prefix. A scheduled job deletes it.
Useful when DB access from tests is not available.
Cleanup Strategies
Strategy 1: Teardown in Fixture
@pytest.fixture
def created_user(users_client) -> dict:
user = users_client.create(UserBuilder().build())
yield user
users_client.delete(user["id"]) # always runs, even on failure
yield ensures cleanup runs even when the test fails.
Strategy 2: Cleanup Before, Not After
@pytest.fixture
def clean_test_users(users_client) -> None:
users_client.delete_all_where(email_contains="@test.com")
Run before test instead of after. Advantage: previous run's garbage is cleaned too.
Strategy 3: Unique Data (No Cleanup Needed)
def test_user_can_update_profile(api_client):
email = f"test-{uuid.uuid4().hex[:8]}@example.com"
user = api_client.users.create({"email": email})
# ... test body
# No cleanup — unique email, will not conflict with any future test
Preferred for fast iteration. Accept minor data accumulation in test environments.
Data Isolation Decision Guide
| Situation | Strategy |
|---|---|
| Tests modify data | Unique IDs or transactional rollback |
| Tests read shared reference data | Static fixtures (session-scoped) |
| Tests need realistic-looking data | Dynamic with Faker |
| CI environment with ephemeral DB | No cleanup needed |
| Shared staging environment | Unique prefix + periodic cleanup job |