Testing & Debugging

11 min read 2296 words

Table of Contents

pytest Fundamentals & Architecture
Fixtures Deep Dive
Parametrize & Test Organization
Mocking & Patching
Advanced pytest Features
Debugging Techniques
Property-Based Testing
Test Architecture Decisions
Summary: Decision Table

Testing is the only way to have confidence your code does what you think it does. Debugging is what you do when that confidence was wrong. Both are skills — not chores — and the gap between junior and senior engineers is mostly here.

pytest Fundamentals & Architecture

pytest won the Python testing wars. Not because it’s newer, but because it’s genuinely better designed.

Why pytest beats unittest:

Concern	unittest	pytest
Test functions	Must be methods in a class	Plain functions
Assertions	`self.assertEqual(a, b)`	`assert a == b`
Failure messages	Generic	Introspects the expression
Fixtures	`setUp/tearDown` on the class	Injected by name, composable
Plugins	None built-in	1000+ plugins
Parametrize	Verbose	`@pytest.mark.parametrize`

The assert rewriting is the killer feature. When assert x == y fails, pytest rewrites the bytecode to show you the actual values — not just “AssertionError.”

Test discovery rules — pytest finds tests by:

Start from current dir (or args)
Recurse into directories not excluded by norecursedirs
Match files: test_*.py or *_test.py
Inside files: functions named test_*, methods named test_* in classes named Test* (no __init__)

# pytest finds this
def test_something():
    assert 1 + 1 == 2

# pytest finds this class but not test_helper (not Test* prefix)
class TestPayment:
    def test_charge(self):
        ...
    def helper(self):   # not collected
        ...

conftest.py is pytest’s dependency injection hook. It’s automatically loaded for the directory it lives in and all subdirectories. No import needed. Use it for:

Shared fixtures
Custom hooks (pytest_configure, pytest_collection_modifyitems)
Plugins local to that subtree

Configuration via pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --tb=short"
markers = [
    "slow: marks tests as slow (deselect with '-m not slow')",
    "integration: requires external services",
]

ELI5: conftest.py is like a staff room at a school. Any teacher (test file) in that building can use whatever’s in the staff room without having to carry it from home. Rooms on higher floors inherit the staff room below.

Fixtures Deep Dive

Fixtures solve the setup/teardown problem without inheritance hierarchies. They’re functions that provide resources to tests — and pytest resolves them automatically by matching argument names.

@pytest.fixture
def db_connection():
    conn = create_connection("sqlite:///:memory:")
    yield conn          # test runs here
    conn.close()        # always runs, even if test fails

def test_user_insert(db_connection):   # pytest injects it
    db_connection.execute("INSERT INTO users ...")
    ...

Scope controls how often the fixture runs:

Scope	Created once per…	Use for
`function` (default)	test function	Small, cheap resources
`class`	test class	Shared state within a class
`module`	test file	Expensive setup shared across tests
`package`	package directory	Database seeding
`session`	entire test run	External services, DB connections

Higher scope fixtures can only depend on same or higher scope. You can’t inject a function-scoped fixture into a session-scoped one.

yield vs request.addfinalizer:

yield fixtures are cleaner and recommended. request.addfinalizer is for when you need to register cleanup conditionally:

@pytest.fixture
def temp_dir(request):
    d = tempfile.mkdtemp()
    def cleanup():
        shutil.rmtree(d)
    request.addfinalizer(cleanup)   # runs even if setup raises mid-way
    return d

Parametrized fixtures run every test that uses the fixture once per parameter value:

@pytest.fixture(params=["sqlite", "postgres"])
def database(request):
    return create_db(request.param)

def test_query(database):   # runs twice: once per db
    assert database.query("SELECT 1")

Factory fixtures are the pattern for “give me N of these”:

@pytest.fixture
def make_user():
    created = []
    def _make(name="alice", role="user"):
        u = User.create(name=name, role=role)
        created.append(u)
        return u
    yield _make
    for u in created:
        u.delete()

def test_admin_access(make_user):
    admin = make_user(role="admin")
    assert admin.can_delete()

autouse=True runs the fixture for every test in scope without being listed as an argument. Use sparingly:

@pytest.fixture(autouse=True)
def reset_settings():
    original = settings.copy()
    yield
    settings.update(original)

ELI5: Fixtures are like mise en place in cooking. You prep your ingredients once (or once per dish), use them when needed, and clean up afterward. autouse is the chef who silently cleans the counter after every plate without being asked.

Common mistake: Using autouse=True for things that only some tests need. This slows down your entire suite and makes test behavior surprising. Only autouse truly global concerns like resetting global state.

Parametrize & Test Organization

@pytest.mark.parametrize is how you test many inputs without copy-pasting test functions:

@pytest.mark.parametrize("input,expected", [
    ("hello", 5),
    ("", 0),
    ("hi", 2),
])
def test_length(input, expected):
    assert len(input) == expected

Multiple decorators create a cartesian product:

@pytest.mark.parametrize("x", [1, 2])
@pytest.mark.parametrize("y", [10, 20])
def test_multiply(x, y):
    # runs: (1,10), (1,20), (2,10), (2,20)
    assert x * y > 0

pytest.param for readable test IDs and special marks:

@pytest.mark.parametrize("val,expected", [
    pytest.param(0, True, id="zero-is-falsy"),
    pytest.param(1, False, id="one-is-truthy"),
    pytest.param(None, True, id="none-is-falsy", marks=pytest.mark.xfail),
])
def test_falsy(val, expected):
    assert not val == expected

Custom markers let you slice test runs:

# pyproject.toml: register your markers first (suppress warnings)
@pytest.mark.slow
@pytest.mark.integration
def test_full_pipeline():
    ...

# Run only fast tests in CI
# pytest -m "not slow"

skip and xfail:

@pytest.mark.skip(reason="TODO: fix after migration")
def test_legacy_behavior(): ...

@pytest.mark.skipif(sys.platform == "win32", reason="Unix only")
def test_file_permissions(): ...

@pytest.mark.xfail(strict=True)  # fail if it accidentally PASSES
def test_known_bug(): ...

Test organization — when to use classes:

Use a class when tests share setup/state and form a logical group. Don’t use classes just to organize — modules do that. Use directories for large test suites mirroring your source layout.

Mocking & Patching

Mocking replaces real dependencies with controllable fakes during tests. It’s powerful and widely misused.

The core rule: patch where it’s used, not where it’s defined.

# myapp/orders.py
from myapp.email import send_email

def place_order(order):
    send_email(order.customer, "Order received")

# Wrong: patches the original module, but orders.py already imported it
@patch("myapp.email.send_email")

# Right: patches the name as orders.py sees it
@patch("myapp.orders.send_email")

patch as decorator vs context manager:

# Decorator: mock lives for the whole test
@patch("myapp.orders.send_email")
def test_order_sends_email(mock_send):
    place_order(fake_order)
    mock_send.assert_called_once()

# Context manager: precise scope
def test_order():
    with patch("myapp.orders.send_email") as mock_send:
        place_order(fake_order)
    mock_send.assert_called_once()

MagicMock vs Mock:

MagicMock supports magic/dunder methods automatically. Use it when you’re mocking something that will be used as a context manager, iterator, or with arithmetic operators.

mock = Mock()
mock.__enter__ = ...   # must set manually on Mock

magic = MagicMock()
magic.__enter__.return_value = magic   # already works

spec=True is underused and important:

mock = Mock(spec=PaymentService)
mock.charge_card(100)       # OK
mock.chrage_card(100)       # AttributeError — catches the typo!

Without spec, mocks accept any attribute access or call, silently. spec makes mocks fail loudly when you misuse them.

side_effect for dynamic behavior:

mock.side_effect = ValueError("network error")   # raises on call
mock.side_effect = [1, 2, 3]                     # returns 1, then 2, then 3
mock.side_effect = lambda x: x * 2              # function

ELI5: Mocks are like stunt doubles in movies. The real actor (your database, email server) is replaced by someone who just does what you tell them to. spec=True is casting a stunt double that can only do what the original actor can — so you notice if the script asks them to fly.

When NOT to mock:

Don’t mock what you don’t own (third-party libraries). If you must, wrap the third-party code and mock your wrapper.
Don’t mock simple value objects or data classes. Just use real ones.
Don’t mock your own internals to test your own internals — that’s testing the mock, not the code.

Common mistake: Over-mocking. If every line of a test is setting up mocks, you’re testing that your code calls things in a certain order — which is implementation, not behavior. Test behavior.

Advanced pytest Features

pytest.raises — the right way to test exceptions:

def test_invalid_input():
    with pytest.raises(ValueError, match="cannot be negative"):
        calculate(-1)

The match parameter is a regex against the exception message. Without it you only assert the exception type, not what it says.

tmp_path — built-in temporary directory per test:

def test_write_file(tmp_path):
    f = tmp_path / "output.txt"
    f.write_text("hello")
    assert f.read_text() == "hello"

capsys — capture stdout/stderr:

def test_print_output(capsys):
    my_function_that_prints()
    captured = capsys.readouterr()
    assert "Expected output" in captured.out

monkeypatch — safer than patch for env vars and attributes:

def test_env_var(monkeypatch):
    monkeypatch.setenv("API_KEY", "test-key")
    monkeypatch.setattr(module, "TIMEOUT", 0.1)
    # automatically reverted after test

Key plugins:

Plugin	Purpose	Install
`pytest-xdist`	Parallel execution	`-n auto`
`pytest-cov`	Coverage	`--cov=myapp --cov-report=html`
`pytest-asyncio`	Async tests	`@pytest.mark.asyncio`
`pytest-benchmark`	Perf regression	`benchmark(func, arg)`
`pytest-mock`	`mocker` fixture (cleaner `patch`)	`mocker.patch(...)`

Async tests with pytest-asyncio:

@pytest.mark.asyncio
async def test_async_api():
    result = await fetch_data()
    assert result.status == 200

ELI5: pytest-xdist is like hiring multiple workers to paint a house instead of one. Each worker gets their own set of rooms (tests) and they all work at the same time. You need to make sure they don’t both try to paint the same room (shared state).

Debugging Techniques

pdb command reference — the basics you must know:

Command	What it does
`n`	Next line (step over)
`s`	Step into function call
`c`	Continue until next breakpoint
`l`	List source code around current line
`p expr`	Print expression
`pp expr`	Pretty-print expression
`bt`	Backtrace (call stack)
`up` / `down`	Move up/down the call stack
`b line`	Set breakpoint
`q`	Quit

breakpoint() is the Python 3.7+ way. It respects PYTHONBREAKPOINT:

# Use ipdb instead of pdb everywhere
PYTHONBREAKPOINT=ipdb.set_trace python myscript.py

# Disable all breakpoints (useful in CI)
PYTHONBREAKPOINT=0 python myscript.py

Post-mortem debugging — inspect after a crash without re-running:

import pdb, traceback, sys

try:
    problematic_function()
except Exception:
    pdb.post_mortem()   # drops you into pdb at the crash point

Or from the command line: python -m pdb -c continue script.py — runs the script, drops into pdb on exception.

logging vs print debugging:

Never use print in real code. logging wins because:

You can leave it in (log levels let you filter)
You can redirect output without code changes
You get timestamps, module names, and line numbers for free
Libraries use it too — you can see the whole picture

import logging
log = logging.getLogger(__name__)

log.debug("processing item: %s", item)   # not f-string — lazy eval
log.warning("rate limit hit, retrying in %ds", delay)

python -X dev turns on extra warnings — ResourceWarning, DeprecationWarning, asyncio debug mode. Run your test suite with it occasionally.

faulthandler — essential for debugging hangs and segfaults:

import faulthandler
faulthandler.enable()   # dumps traceback on SIGSEGV, SIGFPE, etc.

# Or enable in pytest: -p faulthandler (on by default in modern pytest)

On a hung process: kill -SIGUSR1 <pid> will dump stack traces if faulthandler is enabled.

ELI5: faulthandler is a black box recorder on an airplane. If the plane crashes (segfault) or goes missing (hang), you still get a recording of the last known state. Without it, you get nothing.

Common mistake: Using print statements and then forgetting to remove them before committing. Set up logging from the start. If you really need a quick print during local dev, use # noqa and grep for it before committing.

Property-Based Testing

Unit tests check examples you thought of. Property-based tests generate examples you didn’t.

hypothesis basics:

from hypothesis import given, settings
from hypothesis import strategies as st

@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
    assert a + b == b + a   # must hold for ALL integers

@given(st.text())
def test_encode_decode(s):
    assert s.encode("utf-8").decode("utf-8") == s

Hypothesis generates hundreds of examples, then shrinks failing cases to the minimal example. If add(1000000, -999999) fails, it’ll find that add(1, 0) also fails and report that.

Common strategies:

st.integers(min_value=0, max_value=100)
st.text(alphabet=st.characters(whitelist_categories=("L",)))
st.lists(st.integers(), min_size=1)
st.one_of(st.none(), st.integers())
st.builds(User, name=st.text(), age=st.integers(min_value=0))

Stateful testing — test sequences of operations:

from hypothesis.stateful import RuleBasedStateMachine, rule, invariant

class QueueMachine(RuleBasedStateMachine):
    def __init__(self):
        super().__init__()
        self.queue = MyQueue()
        self.model = []

    @rule(item=st.integers())
    def enqueue(self, item):
        self.queue.push(item)
        self.model.append(item)

    @rule()
    def dequeue(self):
        if self.model:
            assert self.queue.pop() == self.model.pop(0)

    @invariant()
    def length_matches(self):
        assert len(self.queue) == len(self.model)

ELI5: Unit tests are like checking if your recipe works with the three ingredients you tried. Hypothesis is like having a kitchen assistant who tries every combination they can find, then when something fails, hands you the simplest possible broken version. You didn’t have to think of the weird edge cases — the computer found them.

Test Architecture Decisions

The testing pyramid: lots of unit tests, fewer integration tests, very few end-to-end tests. This isn’t about religious adherence — it’s economics. Unit tests are cheap to run, fast to diagnose. E2E tests are expensive, slow, and flaky.

Testing trophy (Kent C. Dodds’ variation): integration tests in the middle may be most valuable per dollar — they test real behavior without full system cost.

Level	Speed	Confidence	Cost	When to write
Unit	Fast (ms)	Low-medium	Cheap	Pure functions, complex logic
Integration	Medium (sec)	High	Medium	Service boundaries, DB queries
E2E	Slow (min)	Very high	Expensive	Critical user paths only

What to test:

Business logic, edge cases, error paths
Public interfaces (not private implementation)
Things that have broken before
Anything complex enough to misread

What NOT to test:

Getter/setter boilerplate
Third-party libraries (they have their own tests)
Framework plumbing
Things that would require mocking everything interesting

Test isolation — each test must be independent. If test B passes only when test A runs first, you have hidden coupling. Use fixtures to set up state fresh each time.

CI/CD strategy: Fast tests first. Run unit tests on every push (seconds). Run integration tests on PR merge (minutes). Run E2E only before release or on schedule. This gives you fast feedback without slow pipelines.

ELI5: The testing pyramid is like a building’s foundations. You need lots of strong bricks at the bottom (unit tests) to hold up fewer, heavier floors (integration, E2E). Try to build the pyramid upside-down and it falls.

Common mistake: Writing tests that test implementation rather than behavior. If renaming an internal variable breaks your tests, your tests are wrong. Tests should survive refactoring.

Summary: Decision Table

You want to…	Use
Test many input/output pairs	`@pytest.mark.parametrize`
Share setup across tests	`@pytest.fixture`
Share setup across test files	`conftest.py`
Replace a dependency	`unittest.mock.patch` or `pytest-mock`
Ensure mock matches real interface	`Mock(spec=MyClass)`
Test exception is raised	`pytest.raises(ExcType, match=r"...")`
Test edge cases you haven’t thought of	`hypothesis` + `@given`
Run tests in parallel	`pytest-xdist` (`-n auto`)
Measure coverage	`pytest-cov`
Debug a crash interactively	`breakpoint()` or `pdb.post_mortem()`
Debug a hang or segfault	`faulthandler.enable()`
Log instead of print	`logging.getLogger(__name__)`
Test async code	`pytest-asyncio` + `@pytest.mark.asyncio`
Skip a test conditionally	`@pytest.mark.skipif(condition, reason=...)`
Mark a test as expected to fail	`@pytest.mark.xfail(strict=True)`

pytest Fundamentals & Architecture#

Fixtures Deep Dive#

Parametrize & Test Organization#

Mocking & Patching#

Advanced pytest Features#

Debugging Techniques#

Property-Based Testing#

Test Architecture Decisions#

Summary: Decision Table#