Testing & Debugging
Testing is the only way to have confidence your code does what you think it does. Debugging is what you do when that confidence was wrong. Both are skills — not chores — and the gap between junior and senior engineers is mostly here.
pytest Fundamentals & Architecture
pytest won the Python testing wars. Not because it’s newer, but because it’s genuinely better designed.
Why pytest beats unittest:
| Concern | unittest | pytest |
|---|---|---|
| Test functions | Must be methods in a class | Plain functions |
| Assertions | self.assertEqual(a, b) | assert a == b |
| Failure messages | Generic | Introspects the expression |
| Fixtures | setUp/tearDown on the class | Injected by name, composable |
| Plugins | None built-in | 1000+ plugins |
| Parametrize | Verbose | @pytest.mark.parametrize |
The assert rewriting is the killer feature. When assert x == y fails, pytest rewrites the bytecode to show you the actual values — not just “AssertionError.”
Test discovery rules — pytest finds tests by:
- Start from current dir (or args)
- Recurse into directories not excluded by
norecursedirs - Match files:
test_*.pyor*_test.py - Inside files: functions named
test_*, methods namedtest_*in classes namedTest*(no__init__)
# pytest finds this
def test_something():
assert 1 + 1 == 2
# pytest finds this class but not test_helper (not Test* prefix)
class TestPayment:
def test_charge(self):
...
def helper(self): # not collected
...
conftest.py is pytest’s dependency injection hook. It’s automatically loaded for the directory it lives in and all subdirectories. No import needed. Use it for:
- Shared fixtures
- Custom hooks (
pytest_configure,pytest_collection_modifyitems) - Plugins local to that subtree
Configuration via pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-v --tb=short"
markers = [
"slow: marks tests as slow (deselect with '-m not slow')",
"integration: requires external services",
]
ELI5:
conftest.pyis like a staff room at a school. Any teacher (test file) in that building can use whatever’s in the staff room without having to carry it from home. Rooms on higher floors inherit the staff room below.
Fixtures Deep Dive
Fixtures solve the setup/teardown problem without inheritance hierarchies. They’re functions that provide resources to tests — and pytest resolves them automatically by matching argument names.
@pytest.fixture
def db_connection():
conn = create_connection("sqlite:///:memory:")
yield conn # test runs here
conn.close() # always runs, even if test fails
def test_user_insert(db_connection): # pytest injects it
db_connection.execute("INSERT INTO users ...")
...
Scope controls how often the fixture runs:
| Scope | Created once per… | Use for |
|---|---|---|
function (default) | test function | Small, cheap resources |
class | test class | Shared state within a class |
module | test file | Expensive setup shared across tests |
package | package directory | Database seeding |
session | entire test run | External services, DB connections |
Higher scope fixtures can only depend on same or higher scope. You can’t inject a function-scoped fixture into a session-scoped one.
yield vs request.addfinalizer:
yield fixtures are cleaner and recommended. request.addfinalizer is for when you need to register cleanup conditionally:
@pytest.fixture
def temp_dir(request):
d = tempfile.mkdtemp()
def cleanup():
shutil.rmtree(d)
request.addfinalizer(cleanup) # runs even if setup raises mid-way
return d
Parametrized fixtures run every test that uses the fixture once per parameter value:
@pytest.fixture(params=["sqlite", "postgres"])
def database(request):
return create_db(request.param)
def test_query(database): # runs twice: once per db
assert database.query("SELECT 1")
Factory fixtures are the pattern for “give me N of these”:
@pytest.fixture
def make_user():
created = []
def _make(name="alice", role="user"):
u = User.create(name=name, role=role)
created.append(u)
return u
yield _make
for u in created:
u.delete()
def test_admin_access(make_user):
admin = make_user(role="admin")
assert admin.can_delete()
autouse=True runs the fixture for every test in scope without being listed as an argument. Use sparingly:
@pytest.fixture(autouse=True)
def reset_settings():
original = settings.copy()
yield
settings.update(original)
ELI5: Fixtures are like mise en place in cooking. You prep your ingredients once (or once per dish), use them when needed, and clean up afterward.
autouseis the chef who silently cleans the counter after every plate without being asked.
Common mistake: Using
autouse=Truefor things that only some tests need. This slows down your entire suite and makes test behavior surprising. Onlyautousetruly global concerns like resetting global state.
Parametrize & Test Organization
@pytest.mark.parametrize is how you test many inputs without copy-pasting test functions:
@pytest.mark.parametrize("input,expected", [
("hello", 5),
("", 0),
("hi", 2),
])
def test_length(input, expected):
assert len(input) == expected
Multiple decorators create a cartesian product:
@pytest.mark.parametrize("x", [1, 2])
@pytest.mark.parametrize("y", [10, 20])
def test_multiply(x, y):
# runs: (1,10), (1,20), (2,10), (2,20)
assert x * y > 0
pytest.param for readable test IDs and special marks:
@pytest.mark.parametrize("val,expected", [
pytest.param(0, True, id="zero-is-falsy"),
pytest.param(1, False, id="one-is-truthy"),
pytest.param(None, True, id="none-is-falsy", marks=pytest.mark.xfail),
])
def test_falsy(val, expected):
assert not val == expected
Custom markers let you slice test runs:
# pyproject.toml: register your markers first (suppress warnings)
@pytest.mark.slow
@pytest.mark.integration
def test_full_pipeline():
...
# Run only fast tests in CI
# pytest -m "not slow"
skip and xfail:
@pytest.mark.skip(reason="TODO: fix after migration")
def test_legacy_behavior(): ...
@pytest.mark.skipif(sys.platform == "win32", reason="Unix only")
def test_file_permissions(): ...
@pytest.mark.xfail(strict=True) # fail if it accidentally PASSES
def test_known_bug(): ...
Test organization — when to use classes:
Use a class when tests share setup/state and form a logical group. Don’t use classes just to organize — modules do that. Use directories for large test suites mirroring your source layout.
Mocking & Patching
Mocking replaces real dependencies with controllable fakes during tests. It’s powerful and widely misused.
The core rule: patch where it’s used, not where it’s defined.
# myapp/orders.py
from myapp.email import send_email
def place_order(order):
send_email(order.customer, "Order received")
# Wrong: patches the original module, but orders.py already imported it
@patch("myapp.email.send_email")
# Right: patches the name as orders.py sees it
@patch("myapp.orders.send_email")
patch as decorator vs context manager:
# Decorator: mock lives for the whole test
@patch("myapp.orders.send_email")
def test_order_sends_email(mock_send):
place_order(fake_order)
mock_send.assert_called_once()
# Context manager: precise scope
def test_order():
with patch("myapp.orders.send_email") as mock_send:
place_order(fake_order)
mock_send.assert_called_once()
MagicMock vs Mock:
MagicMock supports magic/dunder methods automatically. Use it when you’re mocking something that will be used as a context manager, iterator, or with arithmetic operators.
mock = Mock()
mock.__enter__ = ... # must set manually on Mock
magic = MagicMock()
magic.__enter__.return_value = magic # already works
spec=True is underused and important:
mock = Mock(spec=PaymentService)
mock.charge_card(100) # OK
mock.chrage_card(100) # AttributeError — catches the typo!
Without spec, mocks accept any attribute access or call, silently. spec makes mocks fail loudly when you misuse them.
side_effect for dynamic behavior:
mock.side_effect = ValueError("network error") # raises on call
mock.side_effect = [1, 2, 3] # returns 1, then 2, then 3
mock.side_effect = lambda x: x * 2 # function
ELI5: Mocks are like stunt doubles in movies. The real actor (your database, email server) is replaced by someone who just does what you tell them to.
spec=Trueis casting a stunt double that can only do what the original actor can — so you notice if the script asks them to fly.
When NOT to mock:
- Don’t mock what you don’t own (third-party libraries). If you must, wrap the third-party code and mock your wrapper.
- Don’t mock simple value objects or data classes. Just use real ones.
- Don’t mock your own internals to test your own internals — that’s testing the mock, not the code.
Common mistake: Over-mocking. If every line of a test is setting up mocks, you’re testing that your code calls things in a certain order — which is implementation, not behavior. Test behavior.
Advanced pytest Features
pytest.raises — the right way to test exceptions:
def test_invalid_input():
with pytest.raises(ValueError, match="cannot be negative"):
calculate(-1)
The match parameter is a regex against the exception message. Without it you only assert the exception type, not what it says.
tmp_path — built-in temporary directory per test:
def test_write_file(tmp_path):
f = tmp_path / "output.txt"
f.write_text("hello")
assert f.read_text() == "hello"
capsys — capture stdout/stderr:
def test_print_output(capsys):
my_function_that_prints()
captured = capsys.readouterr()
assert "Expected output" in captured.out
monkeypatch — safer than patch for env vars and attributes:
def test_env_var(monkeypatch):
monkeypatch.setenv("API_KEY", "test-key")
monkeypatch.setattr(module, "TIMEOUT", 0.1)
# automatically reverted after test
Key plugins:
| Plugin | Purpose | Install |
|---|---|---|
pytest-xdist | Parallel execution | -n auto |
pytest-cov | Coverage | --cov=myapp --cov-report=html |
pytest-asyncio | Async tests | @pytest.mark.asyncio |
pytest-benchmark | Perf regression | benchmark(func, arg) |
pytest-mock | mocker fixture (cleaner patch) | mocker.patch(...) |
Async tests with pytest-asyncio:
@pytest.mark.asyncio
async def test_async_api():
result = await fetch_data()
assert result.status == 200
ELI5:
pytest-xdistis like hiring multiple workers to paint a house instead of one. Each worker gets their own set of rooms (tests) and they all work at the same time. You need to make sure they don’t both try to paint the same room (shared state).
Debugging Techniques
pdb command reference — the basics you must know:
| Command | What it does |
|---|---|
n | Next line (step over) |
s | Step into function call |
c | Continue until next breakpoint |
l | List source code around current line |
p expr | Print expression |
pp expr | Pretty-print expression |
bt | Backtrace (call stack) |
up / down | Move up/down the call stack |
b line | Set breakpoint |
q | Quit |
breakpoint() is the Python 3.7+ way. It respects PYTHONBREAKPOINT:
# Use ipdb instead of pdb everywhere
PYTHONBREAKPOINT=ipdb.set_trace python myscript.py
# Disable all breakpoints (useful in CI)
PYTHONBREAKPOINT=0 python myscript.py
Post-mortem debugging — inspect after a crash without re-running:
import pdb, traceback, sys
try:
problematic_function()
except Exception:
pdb.post_mortem() # drops you into pdb at the crash point
Or from the command line: python -m pdb -c continue script.py — runs the script, drops into pdb on exception.
logging vs print debugging:
Never use print in real code. logging wins because:
- You can leave it in (log levels let you filter)
- You can redirect output without code changes
- You get timestamps, module names, and line numbers for free
- Libraries use it too — you can see the whole picture
import logging
log = logging.getLogger(__name__)
log.debug("processing item: %s", item) # not f-string — lazy eval
log.warning("rate limit hit, retrying in %ds", delay)
python -X dev turns on extra warnings — ResourceWarning, DeprecationWarning, asyncio debug mode. Run your test suite with it occasionally.
faulthandler — essential for debugging hangs and segfaults:
import faulthandler
faulthandler.enable() # dumps traceback on SIGSEGV, SIGFPE, etc.
# Or enable in pytest: -p faulthandler (on by default in modern pytest)
On a hung process: kill -SIGUSR1 <pid> will dump stack traces if faulthandler is enabled.
ELI5:
faulthandleris a black box recorder on an airplane. If the plane crashes (segfault) or goes missing (hang), you still get a recording of the last known state. Without it, you get nothing.
Common mistake: Using
# noqaand grep for it before committing.
Property-Based Testing
Unit tests check examples you thought of. Property-based tests generate examples you didn’t.
hypothesis basics:
from hypothesis import given, settings
from hypothesis import strategies as st
@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
assert a + b == b + a # must hold for ALL integers
@given(st.text())
def test_encode_decode(s):
assert s.encode("utf-8").decode("utf-8") == s
Hypothesis generates hundreds of examples, then shrinks failing cases to the minimal example. If add(1000000, -999999) fails, it’ll find that add(1, 0) also fails and report that.
Common strategies:
st.integers(min_value=0, max_value=100)
st.text(alphabet=st.characters(whitelist_categories=("L",)))
st.lists(st.integers(), min_size=1)
st.one_of(st.none(), st.integers())
st.builds(User, name=st.text(), age=st.integers(min_value=0))
Stateful testing — test sequences of operations:
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
class QueueMachine(RuleBasedStateMachine):
def __init__(self):
super().__init__()
self.queue = MyQueue()
self.model = []
@rule(item=st.integers())
def enqueue(self, item):
self.queue.push(item)
self.model.append(item)
@rule()
def dequeue(self):
if self.model:
assert self.queue.pop() == self.model.pop(0)
@invariant()
def length_matches(self):
assert len(self.queue) == len(self.model)
ELI5: Unit tests are like checking if your recipe works with the three ingredients you tried. Hypothesis is like having a kitchen assistant who tries every combination they can find, then when something fails, hands you the simplest possible broken version. You didn’t have to think of the weird edge cases — the computer found them.
Test Architecture Decisions
The testing pyramid: lots of unit tests, fewer integration tests, very few end-to-end tests. This isn’t about religious adherence — it’s economics. Unit tests are cheap to run, fast to diagnose. E2E tests are expensive, slow, and flaky.
Testing trophy (Kent C. Dodds’ variation): integration tests in the middle may be most valuable per dollar — they test real behavior without full system cost.
| Level | Speed | Confidence | Cost | When to write |
|---|---|---|---|---|
| Unit | Fast (ms) | Low-medium | Cheap | Pure functions, complex logic |
| Integration | Medium (sec) | High | Medium | Service boundaries, DB queries |
| E2E | Slow (min) | Very high | Expensive | Critical user paths only |
What to test:
- Business logic, edge cases, error paths
- Public interfaces (not private implementation)
- Things that have broken before
- Anything complex enough to misread
What NOT to test:
- Getter/setter boilerplate
- Third-party libraries (they have their own tests)
- Framework plumbing
- Things that would require mocking everything interesting
Test isolation — each test must be independent. If test B passes only when test A runs first, you have hidden coupling. Use fixtures to set up state fresh each time.
CI/CD strategy: Fast tests first. Run unit tests on every push (seconds). Run integration tests on PR merge (minutes). Run E2E only before release or on schedule. This gives you fast feedback without slow pipelines.
ELI5: The testing pyramid is like a building’s foundations. You need lots of strong bricks at the bottom (unit tests) to hold up fewer, heavier floors (integration, E2E). Try to build the pyramid upside-down and it falls.
Common mistake: Writing tests that test implementation rather than behavior. If renaming an internal variable breaks your tests, your tests are wrong. Tests should survive refactoring.
Summary: Decision Table
| You want to… | Use |
|---|---|
| Test many input/output pairs | @pytest.mark.parametrize |
| Share setup across tests | @pytest.fixture |
| Share setup across test files | conftest.py |
| Replace a dependency | unittest.mock.patch or pytest-mock |
| Ensure mock matches real interface | Mock(spec=MyClass) |
| Test exception is raised | pytest.raises(ExcType, match=r"...") |
| Test edge cases you haven’t thought of | hypothesis + @given |
| Run tests in parallel | pytest-xdist (-n auto) |
| Measure coverage | pytest-cov |
| Debug a crash interactively | breakpoint() or pdb.post_mortem() |
| Debug a hang or segfault | faulthandler.enable() |
| Log instead of print | logging.getLogger(__name__) |
| Test async code | pytest-asyncio + @pytest.mark.asyncio |
| Skip a test conditionally | @pytest.mark.skipif(condition, reason=...) |
| Mark a test as expected to fail | @pytest.mark.xfail(strict=True) |