Metamorphic Testing
Concept
Metamorphic testing is used when there is no easy way to determine the exact expected output, but you know the relationship between different inputs and their outputs.
Instead of checking one answer, you check that outputs are consistent when inputs change in a predictable way.
Input A → Output A
Input B → Output B
If A and B are related, then A_output and B_output must satisfy a known relation (metamorphic relation).
When to Use
| Scenario | Why Exact Output Is Unknown |
|---|---|
| Search / ranking | Too many valid orderings |
| Machine learning models | Ground truth is expensive |
| Numerical computations | Floating-point results vary |
| Sorting with complex keys | Output depends on tie-breaking |
| Recommendation engines | Personalized, non-deterministic |
ISTQB Advanced Level (2025) includes metamorphic testing as a formal technique.
Metamorphic Relations
A metamorphic relation (MR) is a property that must hold between related inputs and outputs.
| MR | Description |
|---|---|
| Subset | Narrower query → result is a subset of broader query |
| Monotonicity | More specific input → fewer or equal results |
| Consistency | Same query in different case → same result |
| Permutation | Reordering catalog → same items returned |
| Addition | Adding unrelated data → original results unaffected |
Example: Product Search
def search_products(query: str, catalog: list[str]) -> list[str]:
return [p for p in catalog if query.lower() in p.lower()]
CATALOG = ["Python Book", "FastAPI Guide", "Python Course", "Go Manual"]
class TestMetamorphicSearch:
def test_narrower_query_returns_subset(self) -> None:
broad = search_products("Python", CATALOG)
narrow = search_products("Python Book", CATALOG)
assert all(item in broad for item in narrow)
assert len(narrow) <= len(broad)
def test_case_insensitivity(self) -> None:
lower = search_products("python", CATALOG)
upper = search_products("PYTHON", CATALOG)
assert lower == upper
def test_larger_catalog_keeps_old_results(self) -> None:
original = search_products("Python", CATALOG)
extended = [*CATALOG, "Rust Handbook"]
new_results = search_products("Python", extended)
assert all(item in new_results for item in original)
def test_empty_query_returns_all(self) -> None:
result = search_products("", CATALOG)
assert result == CATALOG
Sorting Example
def sort_by_price(items: list[dict]) -> list[dict]:
return sorted(items, key=lambda x: x["price"])
def test_sort_order_preserved_after_adding_item() -> None:
items = [{"name": "A", "price": 30}, {"name": "B", "price": 10}]
sorted_original = sort_by_price(items)
items_with_middle = [*items, {"name": "C", "price": 20}]
sorted_extended = sort_by_price(items_with_middle)
original_names = [i["name"] for i in sorted_original] # ["B", "A"]
extended_names = [i["name"] for i in sorted_extended]
# MR: Adding an unrelated middle element keeps relative order of old elements.
assert extended_names.index(original_names[0]) < extended_names.index(original_names[1])
def test_sort_is_stable_across_equivalent_prices() -> None:
items = [{"name": "A", "price": 10}, {"name": "B", "price": 10}]
result1 = sort_by_price(items)
result2 = sort_by_price(items)
assert result1 == result2
def test_search_results_as_set_are_permutation_invariant() -> None:
original_catalog = ["Python Book", "FastAPI Guide", "Python Course", "Go Manual"]
permuted_catalog = ["Go Manual", "Python Course", "FastAPI Guide", "Python Book"]
original = search_products("Python", original_catalog)
permuted = search_products("Python", permuted_catalog)
assert set(original) == set(permuted)
Combining with Property-Based Testing
The hypothesis library extends metamorphic testing with automatic input generation:
from hypothesis import given, strategies as st
@given(st.text(min_size=1))
def test_search_nonempty_query_subset_of_empty(query: str) -> None:
full = search_products("", CATALOG)
filtered = search_products(query, CATALOG)
assert all(item in full for item in filtered)
Risks
| Risk | Description | Mitigation |
|---|---|---|
| Weak MR | Relation too loose to catch real bugs | Define MRs from domain invariants, not implementation |
| Missing MR | Not all relationships covered | List all known invariants before writing tests |
| Test depends on order | Output order matters for subset check | Use set comparison when order is irrelevant |