Iterators, Generators & Functional Patterns

12 min read 2523 words

Table of Contents

The Iterator Protocol
Generators — Lazy Iterators Made Easy
Coroutines via Generators (Legacy)
itertools — The Power Tools
Closures and Scope
functools Power Tools
Functional Programming Patterns in Python
Advanced Patterns
Summary: When to Use What

Python’s iteration model is one of its best-designed subsystems. Once you fully understand iterators and generators, you stop writing code that loads everything into memory first and think in terms of lazy pipelines instead. This note covers the mechanics, the gotchas, and the patterns that separate journeyman Python from expert Python.

The Iterator Protocol

Two methods, one contract: __iter__() returns the iterator object itself, __next__() returns the next value or raises StopIteration. That’s the whole protocol.

The distinction between iterable and iterator trips people up constantly:

Concept	Contract	Can re-iterate?	Example
Iterable	has `__iter__()`	Yes	`list`, `str`, `dict`, custom class
Iterator	has `__iter__()` + `__next__()`	No — one pass	`zip`, `map`, file objects, generators
Sequence	`__getitem__` + `__len__`	Yes	`list`, `tuple`, `str`

An iterable is a factory that produces iterators. A list is iterable: every time you for x in my_list, Python calls iter(my_list) and gets a fresh list_iterator. A generator object is already an iterator — calling iter() on it just returns itself, so consuming it once drains it permanently.

ELI5: An iterable is a book. An iterator is your bookmark. The book can be re-read by many people. The bookmark is yours and moves forward only. When you finish, you need a new bookmark, not a new book.

# How a for loop actually works
it = iter(some_iterable)          # calls __iter__
while True:
    try:
        value = next(it)          # calls __next__
        # loop body here
    except StopIteration:
        break

Building a custom iterator. Here’s a fibonacci iterator that shows all the mechanics:

class Fibonacci:
    def __init__(self, limit):
        self.limit = limit
        self.a, self.b = 0, 1
        self.count = 0

    def __iter__(self):
        return self          # iterator returns itself

    def __next__(self):
        if self.count >= self.limit:
            raise StopIteration
        value = self.a
        self.a, self.b = self.b, self.a + self.b
        self.count += 1
        return value

list(Fibonacci(8))  # [0, 1, 1, 2, 3, 5, 8, 13]

Sentinel pattern. iter(callable, sentinel) is underused. It calls the callable repeatedly until the return value equals the sentinel:

import io
f = io.StringIO("line1\nline2\nSTOP\nline4\n")
for line in iter(f.readline, "STOP\n"):
    print(line, end="")
# prints line1 and line2 only

This also works great for reading fixed-size chunks from binary streams: iter(lambda: f.read(4096), b"").

Generators — Lazy Iterators Made Easy

A generator function has at least one yield. Calling it doesn’t execute the body — it returns a generator object. The body runs only when you call next() on that object.

ELI5: A generator function is a recipe card that sits on your counter. Nothing happens when you write the recipe. You only cook one step at a time when someone asks “what’s next?” and you pause between steps.

Generator state machine:

CREATED ──next()──► RUNNING ──yield──► SUSPENDED
                       │                    │
                    return/               next()
                    exception               │
                       │                    ▼
                     CLOSED ◄──────── RUNNING

The key: local variables and execution position are preserved between next() calls. That’s what “suspended” means — the stack frame stays alive.

def countdown(n):
    print("Starting")
    while n > 0:
        yield n
        n -= 1
    print("Done")

gen = countdown(3)    # nothing printed yet — CREATED
next(gen)             # prints "Starting", returns 3
next(gen)             # returns 2
next(gen)             # returns 1
next(gen)             # prints "Done", raises StopIteration

Generator expressions are the lazy equivalent of list comprehensions:

# List comp: builds full list in memory immediately
squares_list = [x**2 for x in range(10_000_000)]  # ~80 MB

# Generator expr: produces one value at a time
squares_gen = (x**2 for x in range(10_000_000))   # ~200 bytes

# Both work in for loops, sum(), max(), etc.
total = sum(x**2 for x in range(10_000_000))  # no intermediate list

yield from delegates iteration to a sub-generator, forwarding values transparently in both directions. It’s the clean way to compose generators:

def flatten(nested):
    for item in nested:
        if isinstance(item, list):
            yield from flatten(item)   # recurse without manual loop
        else:
            yield item

list(flatten([1, [2, [3, 4]], 5]))  # [1, 2, 3, 4, 5]

Without yield from you’d need to loop over the sub-generator and re-yield each value manually, which also breaks .send() and .throw() propagation.

Common mistake: forgetting that a generator is one-pass. If you consume it, it’s gone.
gen = (x for x in range(5))
list(gen)   # [0, 1, 2, 3, 4]
list(gen)   # [] — already exhausted

Coroutines via Generators (Legacy)

Before async/await, coroutines were implemented by overloading yield. Knowing this helps you read pre-3.5 code and async frameworks like gevent or Tornado (early versions).

def accumulator():
    total = 0
    while True:
        value = yield total     # yield sends out total, receives new value
        if value is None:
            break
        total += value

gen = accumulator()
next(gen)           # prime the generator (advance to first yield)
gen.send(10)        # sends 10, returns 10
gen.send(20)        # sends 20, returns 30
gen.send(5)         # returns 35
gen.close()         # raises GeneratorExit inside the generator

The coroutine interface:

.send(value) — resumes and injects a value at the yield expression
.throw(exc) — raises an exception at the current yield
.close() — throws GeneratorExit, triggers cleanup

Why it was replaced: The mental model is confusing (generators doing double duty), and the two-role design made it hard to compose async code cleanly. PEP 492 (Python 3.5) introduced async def / await as first-class syntax. Today: use generators for iteration, use async def for concurrency.

Era	Syntax	PEP	Use case
Classic generators	`yield`	255 (2001)	Lazy iterators
Enhanced generators	`yield` + `.send()`	342 (2005)	Coroutines (legacy)
`yield from`	`yield from`	380 (2011)	Generator composition
Native coroutines	`async def` / `await`	492 (2015)	Async I/O

itertools — The Power Tools

itertools is C code wrapped in a Python-friendly API. Everything in it is lazy (returns iterators), and the C implementation makes it significantly faster than equivalent Python loops.

ELI5: itertools is like a LEGO set for data pipelines. Each piece does one small thing, but snapped together they handle almost any iteration problem without loading data into memory.

Infinite iterators — use with islice or takewhile, never bare in a for loop:

from itertools import count, cycle, repeat

count(10, 2)       # 10, 12, 14, 16, ...
cycle("ABC")       # A, B, C, A, B, C, ...
repeat(7, 3)       # 7, 7, 7 (stop after 3)

Terminating iterators:

from itertools import chain, islice, takewhile, dropwhile, groupby, zip_longest

list(chain([1,2], [3,4], [5]))               # [1,2,3,4,5]
list(islice(count(), 5))                     # [0,1,2,3,4]
list(takewhile(lambda x: x < 4, [1,2,3,4,5]))  # [1,2,3]
list(dropwhile(lambda x: x < 4, [1,2,3,4,5]))  # [4,5]

groupby gotcha — it groups consecutive equal keys, not all equal keys. If the input isn’t sorted by the key, you get multiple groups for the same key value:

from itertools import groupby

data = [("a", 1), ("a", 2), ("b", 3), ("a", 4)]

# WRONG — unsorted, "a" appears in two separate groups
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3)]
# a [('a', 4)]   ← separate "a" group!

# RIGHT — sort first
for key, group in groupby(sorted(data, key=lambda x: x[0]), key=lambda x: x[0]):
    print(key, list(group))

Combinatorics:

from itertools import product, permutations, combinations, combinations_with_replacement

list(product("AB", repeat=2))           # AA AB BA BB
list(permutations("ABC", 2))            # AB AC BA BC CA CB
list(combinations("ABC", 2))            # AB AC BC
list(combinations_with_replacement("AB", 2))  # AA AB BB

Useful recipes (from the official docs and more-itertools):

from itertools import pairwise, batched  # Python 3.10+, 3.12+

list(pairwise([1, 2, 3, 4]))   # [(1,2), (2,3), (3,4)]
list(batched([1,2,3,4,5], 2))  # [(1,2), (3,4), (5,)]

For older Python, these are trivial to implement, or just pip install more-itertools.

Closures and Scope

Python resolves names with the LEGB rule — Local, Enclosing, Global, Built-in — searched in that order. A closure is a function that captures variables from its enclosing scope.

def make_multiplier(n):
    def multiply(x):
        return x * n    # n is a free variable, captured from enclosing scope
    return multiply

double = make_multiplier(2)
double(5)   # 10 — n=2 is "closed over"

The late binding trap. This is one of the most common Python bugs:

# BROKEN — all lambdas share the same variable i
funcs = [lambda: i for i in range(5)]
[f() for f in funcs]   # [4, 4, 4, 4, 4]

# WHY: i is looked up when the lambda is CALLED, not when it's created
# By then, the loop finished and i == 4

# FIX — default argument captures current value at definition time
funcs = [lambda i=i: i for i in range(5)]
[f() for f in funcs]   # [0, 1, 2, 3, 4]

ELI5: The broken version is like writing “bring me whatever is in box i” on a note. By the time you read the note, someone moved box i to contain 4. The fixed version is like writing “bring me the value 3” — it’s captured right now.

nonlocal keyword — required when you want to reassign (not just mutate) a variable in the enclosing scope:

def make_counter():
    count = 0
    def increment():
        nonlocal count    # without this, count += 1 creates a local variable
        count += 1
        return count
    return increment

c = make_counter()
c()   # 1
c()   # 2

Without nonlocal, count += 1 is count = count + 1 — Python sees the assignment and treats count as a local, then fails because it’s referenced before assignment.

Common mistake: using global when you mean nonlocal. global reaches all the way to the module level. nonlocal reaches to the nearest enclosing function scope.

functools Power Tools

from functools import partial, reduce, lru_cache, cache, singledispatch, total_ordering, wraps

partial — freeze some arguments:

from functools import partial

def power(base, exp):
    return base ** exp

square = partial(power, exp=2)
cube = partial(power, exp=3)

square(4)   # 16
cube(3)     # 27

lru_cache / cache — memoize function results. cache (3.9+) is lru_cache(maxsize=None) — unbounded:

from functools import cache

@cache
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

fib(100)   # instant, not O(2^n) recursion

ELI5: lru_cache is a sticky note on the function: “if you’ve asked me this before, here’s the answer from last time.” LRU (Least Recently Used) means it forgets old answers when the cache fills up.

reduce — fold a sequence into a single value:

from functools import reduce
reduce(lambda acc, x: acc + x, [1, 2, 3, 4])   # 10
reduce(lambda acc, x: acc * x, range(1, 6))     # 120 (5!)

singledispatch — generic functions based on first argument type:

from functools import singledispatch

@singledispatch
def process(data):
    raise TypeError(f"Unsupported: {type(data)}")

@process.register(str)
def _(data):
    return data.upper()

@process.register(list)
def _(data):
    return [process(x) for x in data]

process("hello")        # "HELLO"
process(["a", "b"])     # ["A", "B"]

wraps — preserves function metadata when writing decorators:

from functools import wraps

def debug(func):
    @wraps(func)       # without this, wrapper.__name__ would be "wrapper"
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__}")
        return func(*args, **kwargs)
    return wrapper

Tool	Use case
`partial`	Freeze args, create specialized versions
`cache` / `lru_cache`	Memoize pure functions, DP problems
`reduce`	Fold sequences to single value
`singledispatch`	Type-based dispatch without class hierarchy
`total_ordering`	Define `__eq__` + one comparison, get the rest free
`wraps`	Decorator boilerplate — always use it

Functional Programming Patterns in Python

Python is not a functional language, but it borrows the useful ideas.

map, filter, zip are all lazy (return iterators in Python 3):

nums = [1, 2, 3, 4, 5]
list(map(lambda x: x**2, nums))       # [1, 4, 9, 16, 25]
list(filter(lambda x: x % 2, nums))   # [1, 3, 5]

In practice, comprehensions are more Pythonic than map/filter for simple cases:

# Prefer these
[x**2 for x in nums]
[x for x in nums if x % 2]

# Use map/filter when
# 1. The function already exists (no lambda needed)
list(map(str, nums))                          # cleaner than [str(x) for x in nums]
list(filter(str.isupper, ["A", "b", "C"]))   # cleaner with method refs

operator module — function equivalents of operators, great as key functions:

from operator import itemgetter, attrgetter, methodcaller

# Instead of lambda x: x[1]
sorted(pairs, key=itemgetter(1))

# Instead of lambda x: x.name
sorted(objects, key=attrgetter("name"))

# Instead of lambda x: x.strip()
list(map(methodcaller("strip"), strings))

ELI5: operator.itemgetter(1) is a function factory. It gives you a function that does x[1]. Same for attrgetter with dot-access. They’re slightly faster than lambdas and read better in sort keys.

Why Python isn’t truly functional:

No tail call optimization — deep recursion still overflows the stack
Everything is mutable by default — you have to discipline yourself to avoid side effects
Statements (like if, for) aren’t expressions — you can’t use them inside lambdas
No algebraic data types, no pattern matching (well, match exists in 3.10+, but it’s structural)

Python borrows map, filter, reduce, closures, higher-order functions — but embraces side effects and mutability where it makes code clearer.

Advanced Patterns

Pipeline pattern — chain generators for streaming data processing:

def read_lines(filename):
    with open(filename) as f:
        yield from f

def grep(pattern, lines):
    return (line for line in lines if pattern in line)

def strip_lines(lines):
    return (line.strip() for line in lines)

def parse_numbers(lines):
    return (float(line) for line in lines)

# Compose the pipeline — nothing runs until you consume
pipeline = parse_numbers(strip_lines(grep("ERROR", read_lines("app.log"))))
total = sum(pipeline)

Each stage is lazy. The file is read one line at a time, filtered, stripped, and parsed without ever building an intermediate list. This pattern handles arbitrarily large files with constant memory.

Two-phase generator (the contextlib.contextmanager pattern):

from contextlib import contextmanager

@contextmanager
def managed_resource():
    resource = acquire()       # setup
    try:
        yield resource         # hand control to the with block
    finally:
        release(resource)      # teardown, always runs

The generator yields exactly once. Everything before yield is __enter__, everything after is __exit__. This is how contextlib.contextmanager works internally — it wraps a generator function and calls next() to enter and re-enters with .throw() to propagate exceptions.

ELI5: The generator pauses at yield, lets your with block run, then wakes up to clean up. It’s a before-and-after pattern with the user’s code sandwiched in the middle.

Infinite sequences with early termination:

from itertools import count, takewhile

def primes():
    """Infinite prime generator using trial division."""
    yield 2
    candidates = count(3, 2)
    primes_found = [2]
    for n in candidates:
        if all(n % p for p in primes_found if p*p <= n):
            primes_found.append(n)
            yield n

list(takewhile(lambda p: p < 50, primes()))   # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Common mistake: using islice thinking it will stop the generator cleanly. It does stop consuming, but it doesn’t call .close() — the generator stays in SUSPENDED state until GC. Use takewhile or explicitly .close() when you’re done early.

more-itertools fills the gaps that the stdlib doesn’t cover:

# pip install more-itertools
from more_itertools import chunked, windowed, interleave, partition

list(chunked([1,2,3,4,5], 2))           # [[1,2], [3,4], [5]]
list(windowed([1,2,3,4,5], 3))          # [(1,2,3), (2,3,4), (3,4,5)]
evens, odds = partition(lambda x: x%2, range(10))

Summary: When to Use What

Situation	Reach for
Consume a collection once	`for` loop or generator expression
Consume multiple times	`list`, `tuple` — materialize it
Infinite or very large sequence	Generator function
Compose multiple transformation stages	Generator pipeline
Standard iteration combinator	`itertools`
Cache expensive function calls	`functools.cache` / `lru_cache`
Freeze function arguments	`functools.partial`
Type-based dispatch	`functools.singledispatch`
Before/after resource management	`@contextmanager` generator
Sort/group by field or index	`operator.attrgetter` / `itemgetter`
Need `itertools` but it’s missing something	`more-itertools`

The underlying principle: prefer lazy over eager unless you need random access or multiple passes. Generator pipelines are not just memory-efficient — they’re the natural way to express data transformations where each step is independent.

The Iterator Protocol#

Generators — Lazy Iterators Made Easy#

Coroutines via Generators (Legacy)#

itertools — The Power Tools#

Closures and Scope#

functools Power Tools#

Functional Programming Patterns in Python#

Advanced Patterns#

Summary: When to Use What#