Object Model & Data Model

11 min read 2333 words

Python’s data model is the contract between your code and the interpreter. Master it and everything clicks — descriptors, super(), property, @classmethod, slots, all of it. Skip it and you cargo-cult patterns you don’t understand.

Everything Is an Object

In Python, “everything is an object” isn’t marketing copy — it’s a precise statement. Functions, classes, modules, None, integers, types themselves — all are instances of some class, all have an identity, a type, and a value.

>>> type(int)        # int is an instance of type
<class 'type'>
>>> type(type)       # type is an instance of itself
<class 'type'>
>>> isinstance(int, object)  # and a subclass of object
True

The three identity tools you must know:

Tool	What it gives you	When to use
`id(x)`	Memory address (CPython)	Debugging identity
`type(x)`	The exact class, no inheritance	Type dispatch
`x is y`	Same object in memory	`None` checks, singletons
`x == y`	Calls `__eq__`, may be overridden	Value equality

is vs == is not interchangeable. Use is only for None, True, False, and sentinel objects. For everything else, use ==.

CPython interning: the trap

CPython caches small integers (-5 to 256) and many short strings as a performance optimization. This creates a subtle trap:

a = 256; b = 256
a is b   # True — same cached object

a = 257; b = 257
a is b   # False in most contexts — two separate objects

ELI5: Integers -5 to 256 are like pre-printed forms in a government office — everyone gets the same sheet because they’re so common. Bigger numbers are printed fresh each time. is checks if you got the same sheet of paper; == checks if the paper says the same thing.

Common mistake: Testing if x is 1 or if x is "hello". This works coincidentally in the REPL because the REPL interns aggressively. It breaks in production.

The Attribute Lookup Chain

When you write obj.attr, Python doesn’t just look in one place. It runs a multi-step algorithm that most people don’t know:

1. type(obj).__mro__  → search for a DATA DESCRIPTOR in the class hierarchy
2. obj.__dict__       → instance dictionary
3. type(obj).__mro__  → search for non-data descriptor or plain class attribute
4. type(obj).__getattr__(obj, 'attr')  → fallback if defined

`getattribute` intercepts everything

Every attribute access on an object calls type(obj).__getattribute__(obj, name). You almost never override this directly — it’s the engine running the lookup chain above. Override __getattr__ instead, which is only called when normal lookup fails.

class Strict:
    def __getattr__(self, name):
        raise AttributeError(f"No dynamic attributes allowed: {name}")

ELI5: __getattribute__ is the receptionist who handles every visitor. __getattr__ is the lost-and-found office — only called when the receptionist can’t find where you’re supposed to go.

Data descriptors vs non-data descriptors

This is where lookup priority gets non-obvious:

Type	Has `__get__`	Has `__set__`/`__delete__`	Priority vs instance dict
Data descriptor	Yes	Yes	Higher than instance dict
Non-data descriptor	Yes	No	Lower than instance dict
Plain class attr	No	No	Lower than instance dict

property is a data descriptor (it has __get__ and __set__). This is why setting an instance attribute with the same name as a property doesn’t shadow it — the property wins.

class Foo:
    @property
    def x(self): return 42

f = Foo()
f.__dict__['x'] = 99   # force write to instance dict
f.x                    # still 42 — data descriptor wins

ELI5: Imagine a hotel safe (data descriptor) and a drawer in the room (instance dict). The hotel safe always takes priority because it’s controlled by the hotel. A note left in the drawer doesn’t override the safe.

Descriptors Deep Dive

Descriptors are the mechanism under property, classmethod, staticmethod, and Django/SQLAlchemy fields. Once you understand them, you stop treating these as magic.

The protocol

class Descriptor:
    def __set_name__(self, owner, name):   # called at class creation
        self.name = name

    def __get__(self, obj, objtype=None):  # obj is None when accessed from class
        if obj is None:
            return self
        return obj.__dict__.get(self.name)

    def __set__(self, obj, value):         # makes this a DATA descriptor
        obj.__dict__[self.name] = value

    def __delete__(self, obj):
        del obj.__dict__[self.name]

__set_name__ is called when the class body is executed — this is how a descriptor knows its own attribute name without being told.

Real example: type-checked attribute

class Typed:
    def __set_name__(self, owner, name):
        self.name = name
        self.private = f"_{name}"

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return getattr(obj, self.private, None)

    def __set__(self, obj, value):
        if not isinstance(value, self.expected_type):
            raise TypeError(
                f"{self.name} must be {self.expected_type.__name__}, "
                f"got {type(value).__name__}"
            )
        setattr(obj, self.private, value)

class Int(Typed):
    expected_type = int

class Person:
    age = Int()   # descriptor instance lives on the CLASS

    def __init__(self, age):
        self.age = age  # calls Int.__set__

p = Person(30)   # fine
p.age = "old"    # TypeError

Non-data descriptor: lazy property pattern

class lazy:
    def __init__(self, func):
        self.func = func

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        value = self.func(obj)
        # write to instance dict — next access bypasses descriptor
        obj.__dict__[self.func.__name__] = value
        return value

class Circle:
    def __init__(self, r): self.r = r

    @lazy
    def area(self):
        return 3.14159 * self.r ** 2

This works because lazy has no __set__ (non-data descriptor), so after the first call the instance dict entry shadows it.

ELI5: A lazy property is like a package that’s only assembled when you first open the box. After that, the assembled item sits there — you never rebuild it. A regular property is like a factory that rebuilds it every time you ask.

Method Resolution Order (MRO)

Python uses C3 linearization to compute the MRO. You don’t need to memorize the algorithm, but you need to understand what it guarantees:

A class always comes before its parents
If multiple classes share a parent, they keep the order from the class definition
If the above two rules conflict, TypeError is raised

class A: pass
class B(A): pass
class C(A): pass
class D(B, C): pass

D.__mro__
# (<class 'D'>, <class 'B'>, <class 'C'>, <class 'A'>, <class 'object'>)

The diamond is handled: A appears once, after both B and C.

`super()` is next-in-MRO, not parent

This is the single most misunderstood thing about super():

class B(A):
    def method(self):
        super().method()   # NOT "call A.method"
                           # "call the next class after B in the MRO of the actual instance"

If the actual instance is a D, and D.__mro__ is [D, B, C, A], then super() inside B.method calls C.method, not A.method.

Cooperative multiple inheritance

The pattern that makes super() work across diamond hierarchies:

class A:
    def setup(self, **kwargs):
        super().setup(**kwargs)   # must call super even if you think you're at the top

class B(A):
    def setup(self, b_param=None, **kwargs):
        self.b = b_param
        super().setup(**kwargs)

class C(A):
    def setup(self, c_param=None, **kwargs):
        self.c = c_param
        super().setup(**kwargs)

class D(B, C):
    def setup(self, **kwargs):
        super().setup(**kwargs)

Every class in the chain must accept **kwargs and pass them along. If any class in the chain swallows **kwargs without calling super(), the chain breaks.

ELI5: MRO is like a relay race order decided before the race starts. super() doesn’t say “pass the baton to my parent” — it says “pass to whoever is next in the relay order.” That might not be your direct parent.

Common mistake: Writing super(ClassName, self) in Python 3. Just write super() — Python 3 fills in the class and instance automatically from the surrounding context.

`slots`

What it does

By default, every instance stores its attributes in a __dict__ (a full Python dictionary). __slots__ replaces that with a fixed-size C-level struct:

class Point:
    __slots__ = ('x', 'y')

    def __init__(self, x, y):
        self.x = x
        self.y = y

Memory comparison

A __dict__ on a modern CPython costs ~200-300 bytes before you store anything. A slot costs roughly 8 bytes per attribute (one pointer).

1M instances of Point(x, y)	Memory
With `__dict__`	~350 MB
With `__slots__`	~56 MB

Trade-offs

With `__slots__`	Without `__slots__`
Fixed attribute set	Dynamic attributes
~6x memory savings	Flexible
Slightly faster access	Compatible with `__dict__`-based tools
Complicates inheritance	Simple inheritance
No `__weakref__` by default	Weak references work

Inheritance complication: If a parent doesn’t define __slots__, subclasses get __dict__ anyway. You need __slots__ at every level for full savings.

class Base:
    __slots__ = ()          # empty slots, no __dict__ allocated

class Point(Base):
    __slots__ = ('x', 'y')  # now truly no __dict__

ELI5: __dict__ is like carrying an expandable backpack for each object — flexible but heavy. __slots__ is a clipboard with labeled fields — rigid but much lighter. When you have a million clipboards, the weight difference matters.

When to use: Only when you’re creating millions of instances of the same shape and memory is measurably a problem. Profile first. Don’t add __slots__ defensively.

Dunder Protocol Methods

The data model lets your objects participate in Python syntax. Don’t implement protocol methods unless you’re building a type that fits that protocol.

Container protocol

Method	Triggered by	Notes
`__len__`	`len(x)`, `bool(x)` if no `__bool__`	Return int >= 0
`__getitem__`	`x[key]`, iteration fallback	Must raise `IndexError`/`KeyError` to stop iteration
`__setitem__`	`x[key] = val`
`__delitem__`	`del x[key]`
`__contains__`	`val in x`	Falls back to `__iter__` scan
`__iter__`	`for item in x`, `iter(x)`	Return an iterator
`__reversed__`	`reversed(x)`	Optional; falls back to `__len__`+`__getitem__`

Numeric protocol

Python tries the left operand first (__add__), then the right with the reflected method (__radd__). If both return NotImplemented, raises TypeError.

class Vector:
    def __add__(self, other):
        if isinstance(other, Vector):
            return Vector(self.x + other.x, self.y + other.y)
        return NotImplemented    # NOT raise TypeError — return NotImplemented

    def __radd__(self, other):   # handles: 0 + vector (useful for sum())
        return self.__add__(other)

    def __iadd__(self, other):   # handles +=, should mutate self or return new
        ...

ELI5: __add__ is “can you handle self + other?” and __radd__ is “can you handle other + self?” when other doesn’t know how to add your type. NotImplemented means “I can’t do this, ask the other side” — it’s not an exception.

`repr` vs `str`

Method	Called by	Purpose	Rule
`__repr__`	`repr()`, REPL display, `!r` format	Developer view, should be unambiguous	Should look like a constructor call if possible
`__str__`	`print()`, `str()`, `!s` format	User view, readable	Falls back to `__repr__` if not defined

If you only implement one, implement __repr__. __str__ falls back to __repr__, not the reverse.

class Point:
    def __repr__(self):
        return f"Point({self.x!r}, {self.y!r})"  # unambiguous, reproducible

    def __str__(self):
        return f"({self.x}, {self.y})"            # clean for display

`hash` and `eq` consistency rule

Python enforces: objects that compare equal must have the same hash.

# If you define __eq__, Python SETS __hash__ = None (unhashable) automatically
# You must explicitly define __hash__ to keep your objects hashable

class Point:
    def __eq__(self, other):
        return (self.x, self.y) == (other.x, other.y)

    def __hash__(self):
        return hash((self.x, self.y))   # must use same fields as __eq__

Common mistake: Defining __eq__ and then being surprised that your objects can’t be used as dict keys. Python is protecting you from a hash table invariant violation.

ELI5: __hash__ and __eq__ are like the library catalog system — books on the same shelf (same hash bucket) might not be the same book, but two books that ARE the same must always be in the same place. If you redefine “same book” (__eq__) without updating the shelving rule (__hash__), the catalog breaks.

Object Creation: `new` vs `init`

Most Python developers write __init__ and never think about __new__. Here’s when that matters:

obj = MyClass(args)
# Python actually does:
# 1. obj = MyClass.__new__(MyClass, args)   ← allocates + creates the instance
# 2. MyClass.__init__(obj, args)            ← initializes it
# 3. returns obj

When you need `new`

Immutable types: You can’t change an immutable object in __init__ because by then it’s already created. __new__ is your only chance.

class UpperStr(str):
    def __new__(cls, value):
        return super().__new__(cls, value.upper())
        # can't do this in __init__ — str is already immutable by then

Singletons:

class Singleton:
    _instance = None

    def __new__(cls, *args, **kwargs):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

ELI5: __new__ is the architect who designs and builds the building. __init__ is the interior decorator who furnishes it. For most buildings, you only care about the furniture. But if the building itself has weird shape constraints (like “must be made of marble” i.e., immutable), the architect needs explicit instructions.

Copy Semantics

Assignment is binding, not copying

a = [1, 2, 3]
b = a           # b points to the SAME list
b.append(4)
print(a)        # [1, 2, 3, 4] — a is affected

Shallow vs deep copy

import copy

original = [[1, 2], [3, 4]]

shallow = copy.copy(original)    # new list, same inner lists
deep = copy.deepcopy(original)   # new list, new inner lists

shallow[0].append(99)
print(original[0])   # [1, 2, 99] — shallow copy shares inner objects

deep[0].append(99)
print(original[0])   # [1, 2] — deep copy is fully independent

Operation	Creates new outer?	Creates new inner?	Use when
`=`	No	No	Always binding
`copy.copy()`	Yes	No	Flat containers, performance-sensitive
`copy.deepcopy()`	Yes	Yes	Nested mutable structures
`list[:]`, `dict.copy()`	Yes	No	Idiomatic shallow copy

The mutable default argument trap

# WRONG — default list is created ONCE at function definition time
def add_item(item, lst=[]):
    lst.append(item)
    return lst

add_item(1)  # [1]
add_item(2)  # [1, 2] ← surprise

# RIGHT
def add_item(item, lst=None):
    if lst is None:
        lst = []
    lst.append(item)
    return lst

ELI5: A mutable default argument is like putting a communal notepad in your office — every call shares the same notepad. What the previous caller wrote is still there. Use None and create a fresh notepad inside the function.

Common mistake: This also bites in class definitions — class Foo: items = [] means all instances share the same list unless you assign self.items = [] in __init__.

Summary: When to Reach for What

You want to…	Use
Validate attributes on assignment	Data descriptor or `property`
Compute an attribute once and cache it	Non-data descriptor (lazy property)
Save memory for millions of simple instances	`__slots__`
Create an immutable type subclass	Override `__new__`
Make objects work with `+`, `*`, `len()`, `in`	Implement the relevant dunder protocol
Control attribute lookup globally	Override `__getattribute__` (rarely)
Handle missing attributes gracefully	Override `__getattr__`
Debug MRO issues	Print `ClassName.__mro__`
Make objects hashable after defining `__eq__`	Also define `__hash__` using same fields
Avoid shared mutable state in defaults	Use `None` sentinel, create inside function

Everything Is an Object#

CPython interning: the trap#

The Attribute Lookup Chain#

__getattribute__ intercepts everything#

Data descriptors vs non-data descriptors#

Descriptors Deep Dive#

The protocol#

Real example: type-checked attribute#

Non-data descriptor: lazy property pattern#

Method Resolution Order (MRO)#

super() is next-in-MRO, not parent#

Cooperative multiple inheritance#

__slots__#

What it does#

Memory comparison#

Trade-offs#

Dunder Protocol Methods#

Container protocol#

Numeric protocol#

__repr__ vs __str__#

__hash__ and __eq__ consistency rule#

Object Creation: __new__ vs __init__#

When you need __new__#

Copy Semantics#

Assignment is binding, not copying#

Shallow vs deep copy#

The mutable default argument trap#

Summary: When to Reach for What#