Iterators, Generators & Functional Patterns
Python’s iteration model is one of its best-designed subsystems. Once you fully understand iterators and generators, you stop writing code that loads everything into memory first and think in terms of lazy pipelines instead. This note covers the mechanics, the gotchas, and the patterns that separate journeyman Python from expert Python.
The Iterator Protocol
Two methods, one contract: __iter__() returns the iterator object itself, __next__() returns the next value or raises StopIteration. That’s the whole protocol.
The distinction between iterable and iterator trips people up constantly:
| Concept | Contract | Can re-iterate? | Example |
|---|---|---|---|
| Iterable | has __iter__() | Yes | list, str, dict, custom class |
| Iterator | has __iter__() + __next__() | No — one pass | zip, map, file objects, generators |
| Sequence | __getitem__ + __len__ | Yes | list, tuple, str |
An iterable is a factory that produces iterators. A list is iterable: every time you for x in my_list, Python calls iter(my_list) and gets a fresh list_iterator. A generator object is already an iterator — calling iter() on it just returns itself, so consuming it once drains it permanently.
ELI5: An iterable is a book. An iterator is your bookmark. The book can be re-read by many people. The bookmark is yours and moves forward only. When you finish, you need a new bookmark, not a new book.
# How a for loop actually works
it = iter(some_iterable) # calls __iter__
while True:
try:
value = next(it) # calls __next__
# loop body here
except StopIteration:
break
Building a custom iterator. Here’s a fibonacci iterator that shows all the mechanics:
class Fibonacci:
def __init__(self, limit):
self.limit = limit
self.a, self.b = 0, 1
self.count = 0
def __iter__(self):
return self # iterator returns itself
def __next__(self):
if self.count >= self.limit:
raise StopIteration
value = self.a
self.a, self.b = self.b, self.a + self.b
self.count += 1
return value
list(Fibonacci(8)) # [0, 1, 1, 2, 3, 5, 8, 13]
Sentinel pattern. iter(callable, sentinel) is underused. It calls the callable repeatedly until the return value equals the sentinel:
import io
f = io.StringIO("line1\nline2\nSTOP\nline4\n")
for line in iter(f.readline, "STOP\n"):
print(line, end="")
# prints line1 and line2 only
This also works great for reading fixed-size chunks from binary streams: iter(lambda: f.read(4096), b"").
Generators — Lazy Iterators Made Easy
A generator function has at least one yield. Calling it doesn’t execute the body — it returns a generator object. The body runs only when you call next() on that object.
ELI5: A generator function is a recipe card that sits on your counter. Nothing happens when you write the recipe. You only cook one step at a time when someone asks “what’s next?” and you pause between steps.
Generator state machine:
CREATED ──next()──► RUNNING ──yield──► SUSPENDED
│ │
return/ next()
exception │
│ ▼
CLOSED ◄──────── RUNNING
The key: local variables and execution position are preserved between next() calls. That’s what “suspended” means — the stack frame stays alive.
def countdown(n):
print("Starting")
while n > 0:
yield n
n -= 1
print("Done")
gen = countdown(3) # nothing printed yet — CREATED
next(gen) # prints "Starting", returns 3
next(gen) # returns 2
next(gen) # returns 1
next(gen) # prints "Done", raises StopIteration
Generator expressions are the lazy equivalent of list comprehensions:
# List comp: builds full list in memory immediately
squares_list = [x**2 for x in range(10_000_000)] # ~80 MB
# Generator expr: produces one value at a time
squares_gen = (x**2 for x in range(10_000_000)) # ~200 bytes
# Both work in for loops, sum(), max(), etc.
total = sum(x**2 for x in range(10_000_000)) # no intermediate list
yield from delegates iteration to a sub-generator, forwarding values transparently in both directions. It’s the clean way to compose generators:
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item) # recurse without manual loop
else:
yield item
list(flatten([1, [2, [3, 4]], 5])) # [1, 2, 3, 4, 5]
Without yield from you’d need to loop over the sub-generator and re-yield each value manually, which also breaks .send() and .throw() propagation.
Common mistake: forgetting that a generator is one-pass. If you consume it, it’s gone.
gen = (x for x in range(5)) list(gen) # [0, 1, 2, 3, 4] list(gen) # [] — already exhausted
Coroutines via Generators (Legacy)
Before async/await, coroutines were implemented by overloading yield. Knowing this helps you read pre-3.5 code and async frameworks like gevent or Tornado (early versions).
def accumulator():
total = 0
while True:
value = yield total # yield sends out total, receives new value
if value is None:
break
total += value
gen = accumulator()
next(gen) # prime the generator (advance to first yield)
gen.send(10) # sends 10, returns 10
gen.send(20) # sends 20, returns 30
gen.send(5) # returns 35
gen.close() # raises GeneratorExit inside the generator
The coroutine interface:
.send(value)— resumes and injects a value at theyieldexpression.throw(exc)— raises an exception at the currentyield.close()— throwsGeneratorExit, triggers cleanup
Why it was replaced: The mental model is confusing (generators doing double duty), and the two-role design made it hard to compose async code cleanly. PEP 492 (Python 3.5) introduced async def / await as first-class syntax. Today: use generators for iteration, use async def for concurrency.
| Era | Syntax | PEP | Use case |
|---|---|---|---|
| Classic generators | yield | 255 (2001) | Lazy iterators |
| Enhanced generators | yield + .send() | 342 (2005) | Coroutines (legacy) |
yield from | yield from | 380 (2011) | Generator composition |
| Native coroutines | async def / await | 492 (2015) | Async I/O |
itertools — The Power Tools
itertools is C code wrapped in a Python-friendly API. Everything in it is lazy (returns iterators), and the C implementation makes it significantly faster than equivalent Python loops.
ELI5: itertools is like a LEGO set for data pipelines. Each piece does one small thing, but snapped together they handle almost any iteration problem without loading data into memory.
Infinite iterators — use with islice or takewhile, never bare in a for loop:
from itertools import count, cycle, repeat
count(10, 2) # 10, 12, 14, 16, ...
cycle("ABC") # A, B, C, A, B, C, ...
repeat(7, 3) # 7, 7, 7 (stop after 3)
Terminating iterators:
from itertools import chain, islice, takewhile, dropwhile, groupby, zip_longest
list(chain([1,2], [3,4], [5])) # [1,2,3,4,5]
list(islice(count(), 5)) # [0,1,2,3,4]
list(takewhile(lambda x: x < 4, [1,2,3,4,5])) # [1,2,3]
list(dropwhile(lambda x: x < 4, [1,2,3,4,5])) # [4,5]
groupby gotcha — it groups consecutive equal keys, not all equal keys. If the input isn’t sorted by the key, you get multiple groups for the same key value:
from itertools import groupby
data = [("a", 1), ("a", 2), ("b", 3), ("a", 4)]
# WRONG — unsorted, "a" appears in two separate groups
for key, group in groupby(data, key=lambda x: x[0]):
print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3)]
# a [('a', 4)] ← separate "a" group!
# RIGHT — sort first
for key, group in groupby(sorted(data, key=lambda x: x[0]), key=lambda x: x[0]):
print(key, list(group))
Combinatorics:
from itertools import product, permutations, combinations, combinations_with_replacement
list(product("AB", repeat=2)) # AA AB BA BB
list(permutations("ABC", 2)) # AB AC BA BC CA CB
list(combinations("ABC", 2)) # AB AC BC
list(combinations_with_replacement("AB", 2)) # AA AB BB
Useful recipes (from the official docs and more-itertools):
from itertools import pairwise, batched # Python 3.10+, 3.12+
list(pairwise([1, 2, 3, 4])) # [(1,2), (2,3), (3,4)]
list(batched([1,2,3,4,5], 2)) # [(1,2), (3,4), (5,)]
For older Python, these are trivial to implement, or just pip install more-itertools.
Closures and Scope
Python resolves names with the LEGB rule — Local, Enclosing, Global, Built-in — searched in that order. A closure is a function that captures variables from its enclosing scope.
def make_multiplier(n):
def multiply(x):
return x * n # n is a free variable, captured from enclosing scope
return multiply
double = make_multiplier(2)
double(5) # 10 — n=2 is "closed over"
The late binding trap. This is one of the most common Python bugs:
# BROKEN — all lambdas share the same variable i
funcs = [lambda: i for i in range(5)]
[f() for f in funcs] # [4, 4, 4, 4, 4]
# WHY: i is looked up when the lambda is CALLED, not when it's created
# By then, the loop finished and i == 4
# FIX — default argument captures current value at definition time
funcs = [lambda i=i: i for i in range(5)]
[f() for f in funcs] # [0, 1, 2, 3, 4]
ELI5: The broken version is like writing “bring me whatever is in box i” on a note. By the time you read the note, someone moved box i to contain 4. The fixed version is like writing “bring me the value 3” — it’s captured right now.
nonlocal keyword — required when you want to reassign (not just mutate) a variable in the enclosing scope:
def make_counter():
count = 0
def increment():
nonlocal count # without this, count += 1 creates a local variable
count += 1
return count
return increment
c = make_counter()
c() # 1
c() # 2
Without nonlocal, count += 1 is count = count + 1 — Python sees the assignment and treats count as a local, then fails because it’s referenced before assignment.
Common mistake: using
globalwhen you meannonlocal.globalreaches all the way to the module level.nonlocalreaches to the nearest enclosing function scope.
functools Power Tools
from functools import partial, reduce, lru_cache, cache, singledispatch, total_ordering, wraps
partial — freeze some arguments:
from functools import partial
def power(base, exp):
return base ** exp
square = partial(power, exp=2)
cube = partial(power, exp=3)
square(4) # 16
cube(3) # 27
lru_cache / cache — memoize function results. cache (3.9+) is lru_cache(maxsize=None) — unbounded:
from functools import cache
@cache
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
fib(100) # instant, not O(2^n) recursion
ELI5:
lru_cacheis a sticky note on the function: “if you’ve asked me this before, here’s the answer from last time.” LRU (Least Recently Used) means it forgets old answers when the cache fills up.
reduce — fold a sequence into a single value:
from functools import reduce
reduce(lambda acc, x: acc + x, [1, 2, 3, 4]) # 10
reduce(lambda acc, x: acc * x, range(1, 6)) # 120 (5!)
singledispatch — generic functions based on first argument type:
from functools import singledispatch
@singledispatch
def process(data):
raise TypeError(f"Unsupported: {type(data)}")
@process.register(str)
def _(data):
return data.upper()
@process.register(list)
def _(data):
return [process(x) for x in data]
process("hello") # "HELLO"
process(["a", "b"]) # ["A", "B"]
wraps — preserves function metadata when writing decorators:
from functools import wraps
def debug(func):
@wraps(func) # without this, wrapper.__name__ would be "wrapper"
def wrapper(*args, **kwargs):
print(f"Calling {func.__name__}")
return func(*args, **kwargs)
return wrapper
| Tool | Use case |
|---|---|
partial | Freeze args, create specialized versions |
cache / lru_cache | Memoize pure functions, DP problems |
reduce | Fold sequences to single value |
singledispatch | Type-based dispatch without class hierarchy |
total_ordering | Define __eq__ + one comparison, get the rest free |
wraps | Decorator boilerplate — always use it |
Functional Programming Patterns in Python
Python is not a functional language, but it borrows the useful ideas.
map, filter, zip are all lazy (return iterators in Python 3):
nums = [1, 2, 3, 4, 5]
list(map(lambda x: x**2, nums)) # [1, 4, 9, 16, 25]
list(filter(lambda x: x % 2, nums)) # [1, 3, 5]
In practice, comprehensions are more Pythonic than map/filter for simple cases:
# Prefer these
[x**2 for x in nums]
[x for x in nums if x % 2]
# Use map/filter when
# 1. The function already exists (no lambda needed)
list(map(str, nums)) # cleaner than [str(x) for x in nums]
list(filter(str.isupper, ["A", "b", "C"])) # cleaner with method refs
operator module — function equivalents of operators, great as key functions:
from operator import itemgetter, attrgetter, methodcaller
# Instead of lambda x: x[1]
sorted(pairs, key=itemgetter(1))
# Instead of lambda x: x.name
sorted(objects, key=attrgetter("name"))
# Instead of lambda x: x.strip()
list(map(methodcaller("strip"), strings))
ELI5:
operator.itemgetter(1)is a function factory. It gives you a function that doesx[1]. Same forattrgetterwith dot-access. They’re slightly faster than lambdas and read better in sort keys.
Why Python isn’t truly functional:
- No tail call optimization — deep recursion still overflows the stack
- Everything is mutable by default — you have to discipline yourself to avoid side effects
- Statements (like
if,for) aren’t expressions — you can’t use them inside lambdas - No algebraic data types, no pattern matching (well,
matchexists in 3.10+, but it’s structural)
Python borrows map, filter, reduce, closures, higher-order functions — but embraces side effects and mutability where it makes code clearer.
Advanced Patterns
Pipeline pattern — chain generators for streaming data processing:
def read_lines(filename):
with open(filename) as f:
yield from f
def grep(pattern, lines):
return (line for line in lines if pattern in line)
def strip_lines(lines):
return (line.strip() for line in lines)
def parse_numbers(lines):
return (float(line) for line in lines)
# Compose the pipeline — nothing runs until you consume
pipeline = parse_numbers(strip_lines(grep("ERROR", read_lines("app.log"))))
total = sum(pipeline)
Each stage is lazy. The file is read one line at a time, filtered, stripped, and parsed without ever building an intermediate list. This pattern handles arbitrarily large files with constant memory.
Two-phase generator (the contextlib.contextmanager pattern):
from contextlib import contextmanager
@contextmanager
def managed_resource():
resource = acquire() # setup
try:
yield resource # hand control to the with block
finally:
release(resource) # teardown, always runs
The generator yields exactly once. Everything before yield is __enter__, everything after is __exit__. This is how contextlib.contextmanager works internally — it wraps a generator function and calls next() to enter and re-enters with .throw() to propagate exceptions.
ELI5: The generator pauses at
yield, lets yourwithblock run, then wakes up to clean up. It’s a before-and-after pattern with the user’s code sandwiched in the middle.
Infinite sequences with early termination:
from itertools import count, takewhile
def primes():
"""Infinite prime generator using trial division."""
yield 2
candidates = count(3, 2)
primes_found = [2]
for n in candidates:
if all(n % p for p in primes_found if p*p <= n):
primes_found.append(n)
yield n
list(takewhile(lambda p: p < 50, primes())) # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
Common mistake: using
islicethinking it will stop the generator cleanly. It does stop consuming, but it doesn’t call.close()— the generator stays in SUSPENDED state until GC. Usetakewhileor explicitly.close()when you’re done early.
more-itertools fills the gaps that the stdlib doesn’t cover:
# pip install more-itertools
from more_itertools import chunked, windowed, interleave, partition
list(chunked([1,2,3,4,5], 2)) # [[1,2], [3,4], [5]]
list(windowed([1,2,3,4,5], 3)) # [(1,2,3), (2,3,4), (3,4,5)]
evens, odds = partition(lambda x: x%2, range(10))
Summary: When to Use What
| Situation | Reach for |
|---|---|
| Consume a collection once | for loop or generator expression |
| Consume multiple times | list, tuple — materialize it |
| Infinite or very large sequence | Generator function |
| Compose multiple transformation stages | Generator pipeline |
| Standard iteration combinator | itertools |
| Cache expensive function calls | functools.cache / lru_cache |
| Freeze function arguments | functools.partial |
| Type-based dispatch | functools.singledispatch |
| Before/after resource management | @contextmanager generator |
| Sort/group by field or index | operator.attrgetter / itemgetter |
Need itertools but it’s missing something | more-itertools |
The underlying principle: prefer lazy over eager unless you need random access or multiple passes. Generator pipelines are not just memory-efficient — they’re the natural way to express data transformations where each step is independent.