Object Model & Data Model
Python’s data model is the contract between your code and the interpreter. Master it and everything clicks — descriptors, super(), property, @classmethod, slots, all of it. Skip it and you cargo-cult patterns you don’t understand.
Everything Is an Object
In Python, “everything is an object” isn’t marketing copy — it’s a precise statement. Functions, classes, modules, None, integers, types themselves — all are instances of some class, all have an identity, a type, and a value.
>>> type(int) # int is an instance of type
<class 'type'>
>>> type(type) # type is an instance of itself
<class 'type'>
>>> isinstance(int, object) # and a subclass of object
True
The three identity tools you must know:
| Tool | What it gives you | When to use |
|---|---|---|
id(x) | Memory address (CPython) | Debugging identity |
type(x) | The exact class, no inheritance | Type dispatch |
x is y | Same object in memory | None checks, singletons |
x == y | Calls __eq__, may be overridden | Value equality |
is vs == is not interchangeable. Use is only for None, True, False, and sentinel objects. For everything else, use ==.
CPython interning: the trap
CPython caches small integers (-5 to 256) and many short strings as a performance optimization. This creates a subtle trap:
a = 256; b = 256
a is b # True — same cached object
a = 257; b = 257
a is b # False in most contexts — two separate objects
ELI5: Integers -5 to 256 are like pre-printed forms in a government office — everyone gets the same sheet because they’re so common. Bigger numbers are printed fresh each time.
ischecks if you got the same sheet of paper;==checks if the paper says the same thing.
Common mistake: Testing if x is 1 or if x is "hello". This works coincidentally in the REPL because the REPL interns aggressively. It breaks in production.
The Attribute Lookup Chain
When you write obj.attr, Python doesn’t just look in one place. It runs a multi-step algorithm that most people don’t know:
1. type(obj).__mro__ → search for a DATA DESCRIPTOR in the class hierarchy
2. obj.__dict__ → instance dictionary
3. type(obj).__mro__ → search for non-data descriptor or plain class attribute
4. type(obj).__getattr__(obj, 'attr') → fallback if defined
__getattribute__ intercepts everything
Every attribute access on an object calls type(obj).__getattribute__(obj, name). You almost never override this directly — it’s the engine running the lookup chain above. Override __getattr__ instead, which is only called when normal lookup fails.
class Strict:
def __getattr__(self, name):
raise AttributeError(f"No dynamic attributes allowed: {name}")
ELI5:
__getattribute__is the receptionist who handles every visitor.__getattr__is the lost-and-found office — only called when the receptionist can’t find where you’re supposed to go.
Data descriptors vs non-data descriptors
This is where lookup priority gets non-obvious:
| Type | Has __get__ | Has __set__/__delete__ | Priority vs instance dict |
|---|---|---|---|
| Data descriptor | Yes | Yes | Higher than instance dict |
| Non-data descriptor | Yes | No | Lower than instance dict |
| Plain class attr | No | No | Lower than instance dict |
property is a data descriptor (it has __get__ and __set__). This is why setting an instance attribute with the same name as a property doesn’t shadow it — the property wins.
class Foo:
@property
def x(self): return 42
f = Foo()
f.__dict__['x'] = 99 # force write to instance dict
f.x # still 42 — data descriptor wins
ELI5: Imagine a hotel safe (data descriptor) and a drawer in the room (instance dict). The hotel safe always takes priority because it’s controlled by the hotel. A note left in the drawer doesn’t override the safe.
Descriptors Deep Dive
Descriptors are the mechanism under property, classmethod, staticmethod, and Django/SQLAlchemy fields. Once you understand them, you stop treating these as magic.
The protocol
class Descriptor:
def __set_name__(self, owner, name): # called at class creation
self.name = name
def __get__(self, obj, objtype=None): # obj is None when accessed from class
if obj is None:
return self
return obj.__dict__.get(self.name)
def __set__(self, obj, value): # makes this a DATA descriptor
obj.__dict__[self.name] = value
def __delete__(self, obj):
del obj.__dict__[self.name]
__set_name__ is called when the class body is executed — this is how a descriptor knows its own attribute name without being told.
Real example: type-checked attribute
class Typed:
def __set_name__(self, owner, name):
self.name = name
self.private = f"_{name}"
def __get__(self, obj, objtype=None):
if obj is None:
return self
return getattr(obj, self.private, None)
def __set__(self, obj, value):
if not isinstance(value, self.expected_type):
raise TypeError(
f"{self.name} must be {self.expected_type.__name__}, "
f"got {type(value).__name__}"
)
setattr(obj, self.private, value)
class Int(Typed):
expected_type = int
class Person:
age = Int() # descriptor instance lives on the CLASS
def __init__(self, age):
self.age = age # calls Int.__set__
p = Person(30) # fine
p.age = "old" # TypeError
Non-data descriptor: lazy property pattern
class lazy:
def __init__(self, func):
self.func = func
def __get__(self, obj, objtype=None):
if obj is None:
return self
value = self.func(obj)
# write to instance dict — next access bypasses descriptor
obj.__dict__[self.func.__name__] = value
return value
class Circle:
def __init__(self, r): self.r = r
@lazy
def area(self):
return 3.14159 * self.r ** 2
This works because lazy has no __set__ (non-data descriptor), so after the first call the instance dict entry shadows it.
ELI5: A lazy property is like a package that’s only assembled when you first open the box. After that, the assembled item sits there — you never rebuild it. A regular property is like a factory that rebuilds it every time you ask.
Method Resolution Order (MRO)
Python uses C3 linearization to compute the MRO. You don’t need to memorize the algorithm, but you need to understand what it guarantees:
- A class always comes before its parents
- If multiple classes share a parent, they keep the order from the class definition
- If the above two rules conflict,
TypeErroris raised
class A: pass
class B(A): pass
class C(A): pass
class D(B, C): pass
D.__mro__
# (<class 'D'>, <class 'B'>, <class 'C'>, <class 'A'>, <class 'object'>)
The diamond is handled: A appears once, after both B and C.
super() is next-in-MRO, not parent
This is the single most misunderstood thing about super():
class B(A):
def method(self):
super().method() # NOT "call A.method"
# "call the next class after B in the MRO of the actual instance"
If the actual instance is a D, and D.__mro__ is [D, B, C, A], then super() inside B.method calls C.method, not A.method.
Cooperative multiple inheritance
The pattern that makes super() work across diamond hierarchies:
class A:
def setup(self, **kwargs):
super().setup(**kwargs) # must call super even if you think you're at the top
class B(A):
def setup(self, b_param=None, **kwargs):
self.b = b_param
super().setup(**kwargs)
class C(A):
def setup(self, c_param=None, **kwargs):
self.c = c_param
super().setup(**kwargs)
class D(B, C):
def setup(self, **kwargs):
super().setup(**kwargs)
Every class in the chain must accept **kwargs and pass them along. If any class in the chain swallows **kwargs without calling super(), the chain breaks.
ELI5: MRO is like a relay race order decided before the race starts.
super()doesn’t say “pass the baton to my parent” — it says “pass to whoever is next in the relay order.” That might not be your direct parent.
Common mistake: Writing super(ClassName, self) in Python 3. Just write super() — Python 3 fills in the class and instance automatically from the surrounding context.
__slots__
What it does
By default, every instance stores its attributes in a __dict__ (a full Python dictionary). __slots__ replaces that with a fixed-size C-level struct:
class Point:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x
self.y = y
Memory comparison
A __dict__ on a modern CPython costs ~200-300 bytes before you store anything. A slot costs roughly 8 bytes per attribute (one pointer).
| 1M instances of Point(x, y) | Memory |
|---|---|
With __dict__ | ~350 MB |
With __slots__ | ~56 MB |
Trade-offs
With __slots__ | Without __slots__ |
|---|---|
| Fixed attribute set | Dynamic attributes |
| ~6x memory savings | Flexible |
| Slightly faster access | Compatible with __dict__-based tools |
| Complicates inheritance | Simple inheritance |
No __weakref__ by default | Weak references work |
Inheritance complication: If a parent doesn’t define __slots__, subclasses get __dict__ anyway. You need __slots__ at every level for full savings.
class Base:
__slots__ = () # empty slots, no __dict__ allocated
class Point(Base):
__slots__ = ('x', 'y') # now truly no __dict__
ELI5:
__dict__is like carrying an expandable backpack for each object — flexible but heavy.__slots__is a clipboard with labeled fields — rigid but much lighter. When you have a million clipboards, the weight difference matters.
When to use: Only when you’re creating millions of instances of the same shape and memory is measurably a problem. Profile first. Don’t add __slots__ defensively.
Dunder Protocol Methods
The data model lets your objects participate in Python syntax. Don’t implement protocol methods unless you’re building a type that fits that protocol.
Container protocol
| Method | Triggered by | Notes |
|---|---|---|
__len__ | len(x), bool(x) if no __bool__ | Return int >= 0 |
__getitem__ | x[key], iteration fallback | Must raise IndexError/KeyError to stop iteration |
__setitem__ | x[key] = val | |
__delitem__ | del x[key] | |
__contains__ | val in x | Falls back to __iter__ scan |
__iter__ | for item in x, iter(x) | Return an iterator |
__reversed__ | reversed(x) | Optional; falls back to __len__+__getitem__ |
Numeric protocol
Python tries the left operand first (__add__), then the right with the reflected method (__radd__). If both return NotImplemented, raises TypeError.
class Vector:
def __add__(self, other):
if isinstance(other, Vector):
return Vector(self.x + other.x, self.y + other.y)
return NotImplemented # NOT raise TypeError — return NotImplemented
def __radd__(self, other): # handles: 0 + vector (useful for sum())
return self.__add__(other)
def __iadd__(self, other): # handles +=, should mutate self or return new
...
ELI5:
__add__is “can you handleself + other?” and__radd__is “can you handleother + self?” whenotherdoesn’t know how to add your type.NotImplementedmeans “I can’t do this, ask the other side” — it’s not an exception.
__repr__ vs __str__
| Method | Called by | Purpose | Rule |
|---|---|---|---|
__repr__ | repr(), REPL display, !r format | Developer view, should be unambiguous | Should look like a constructor call if possible |
__str__ | print(), str(), !s format | User view, readable | Falls back to __repr__ if not defined |
If you only implement one, implement __repr__. __str__ falls back to __repr__, not the reverse.
class Point:
def __repr__(self):
return f"Point({self.x!r}, {self.y!r})" # unambiguous, reproducible
def __str__(self):
return f"({self.x}, {self.y})" # clean for display
__hash__ and __eq__ consistency rule
Python enforces: objects that compare equal must have the same hash.
# If you define __eq__, Python SETS __hash__ = None (unhashable) automatically
# You must explicitly define __hash__ to keep your objects hashable
class Point:
def __eq__(self, other):
return (self.x, self.y) == (other.x, other.y)
def __hash__(self):
return hash((self.x, self.y)) # must use same fields as __eq__
Common mistake: Defining __eq__ and then being surprised that your objects can’t be used as dict keys. Python is protecting you from a hash table invariant violation.
ELI5:
__hash__and__eq__are like the library catalog system — books on the same shelf (same hash bucket) might not be the same book, but two books that ARE the same must always be in the same place. If you redefine “same book” (__eq__) without updating the shelving rule (__hash__), the catalog breaks.
Object Creation: __new__ vs __init__
Most Python developers write __init__ and never think about __new__. Here’s when that matters:
obj = MyClass(args)
# Python actually does:
# 1. obj = MyClass.__new__(MyClass, args) ← allocates + creates the instance
# 2. MyClass.__init__(obj, args) ← initializes it
# 3. returns obj
When you need __new__
Immutable types: You can’t change an immutable object in __init__ because by then it’s already created. __new__ is your only chance.
class UpperStr(str):
def __new__(cls, value):
return super().__new__(cls, value.upper())
# can't do this in __init__ — str is already immutable by then
Singletons:
class Singleton:
_instance = None
def __new__(cls, *args, **kwargs):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
ELI5:
__new__is the architect who designs and builds the building.__init__is the interior decorator who furnishes it. For most buildings, you only care about the furniture. But if the building itself has weird shape constraints (like “must be made of marble” i.e., immutable), the architect needs explicit instructions.
Copy Semantics
Assignment is binding, not copying
a = [1, 2, 3]
b = a # b points to the SAME list
b.append(4)
print(a) # [1, 2, 3, 4] — a is affected
Shallow vs deep copy
import copy
original = [[1, 2], [3, 4]]
shallow = copy.copy(original) # new list, same inner lists
deep = copy.deepcopy(original) # new list, new inner lists
shallow[0].append(99)
print(original[0]) # [1, 2, 99] — shallow copy shares inner objects
deep[0].append(99)
print(original[0]) # [1, 2] — deep copy is fully independent
| Operation | Creates new outer? | Creates new inner? | Use when |
|---|---|---|---|
= | No | No | Always binding |
copy.copy() | Yes | No | Flat containers, performance-sensitive |
copy.deepcopy() | Yes | Yes | Nested mutable structures |
list[:], dict.copy() | Yes | No | Idiomatic shallow copy |
The mutable default argument trap
# WRONG — default list is created ONCE at function definition time
def add_item(item, lst=[]):
lst.append(item)
return lst
add_item(1) # [1]
add_item(2) # [1, 2] ← surprise
# RIGHT
def add_item(item, lst=None):
if lst is None:
lst = []
lst.append(item)
return lst
ELI5: A mutable default argument is like putting a communal notepad in your office — every call shares the same notepad. What the previous caller wrote is still there. Use
Noneand create a fresh notepad inside the function.
Common mistake: This also bites in class definitions — class Foo: items = [] means all instances share the same list unless you assign self.items = [] in __init__.
Summary: When to Reach for What
| You want to… | Use |
|---|---|
| Validate attributes on assignment | Data descriptor or property |
| Compute an attribute once and cache it | Non-data descriptor (lazy property) |
| Save memory for millions of simple instances | __slots__ |
| Create an immutable type subclass | Override __new__ |
Make objects work with +, *, len(), in | Implement the relevant dunder protocol |
| Control attribute lookup globally | Override __getattribute__ (rarely) |
| Handle missing attributes gracefully | Override __getattr__ |
| Debug MRO issues | Print ClassName.__mro__ |
Make objects hashable after defining __eq__ | Also define __hash__ using same fields |
| Avoid shared mutable state in defaults | Use None sentinel, create inside function |