Container Storage
Container Storage
Data loss is career-ending. If you don’t understand where your container’s bytes live and what survives a restart vs. what vanishes, you will lose data in production.
The Fundamental Rule
Container writable layers are ephemeral. When a container is removed, its writable layer is deleted. Everything written inside the container (logs, uploads, database files) is gone.
Container removed → writable layer deleted → data lost
volumes survive ✓
bind mounts survive ✓
tmpfs gone ✗
ELI5: A container’s filesystem is like a whiteboard in a hotel meeting room. You can write whatever you want during your stay, but when you check out, housekeeping erases it. If you want to keep your notes, you need to save them to your own notebook (volume) or the hotel’s filing cabinet (bind mount) before checkout.
Storage Options Compared
| Type | Managed by Docker? | Persists after docker rm? | Shared between containers? | Host path visible? | Use case |
|---|---|---|---|---|---|
| Named Volume | Yes | Yes | Yes | /var/lib/docker/volumes/ | Databases, app state, production data |
| Anonymous Volume | Yes | No (unless --rm is not used) | No | Auto-generated path | Temporary persistent data |
| Bind Mount | No | Yes (it’s host filesystem) | Yes | Wherever you point it | Config files, source code, dev workflows |
| tmpfs | No | No (RAM only) | No | Not on disk at all | Secrets, sensitive temp data, /tmp |
Decision framework: Need persistent data? → Named volume. Need to inject host files? → Bind mount. Need in-memory scratch space? → tmpfs. Not sure? → Named volume. It’s the safest default.
Volumes Deep Dive
Named Volumes
# Create explicitly
docker volume create mydata
# Or create implicitly on first use
docker run -v mydata:/var/lib/mysql mysql:8
# Inspect
docker volume inspect mydata
# → Mountpoint: /var/lib/docker/volumes/mydata/_data
Docker manages the lifecycle. The volume survives container removal. Multiple containers can mount the same volume (but concurrent writes need application-level coordination).
Volume Drivers
Default driver stores data on the local filesystem. Volume drivers extend this to external storage:
| Driver | Backend | Use Case |
|---|---|---|
| local (default) | Host filesystem | Single-node, dev/test |
| nfs | NFS server | Shared storage across hosts |
| rexray/ebs | AWS EBS | Persistent block storage in AWS |
| portworx | Portworx cluster | Multi-cloud persistent storage |
| local-persist | Local with custom path | Named volume at specific host location |
# Create NFS-backed volume
docker volume create --driver local \
--opt type=nfs \
--opt o=addr=nfs-server.local,rw \
--opt device=:/exports/data \
nfs-data
Common mistake: Using the local driver in a multi-node Swarm/K8s cluster and expecting volumes to follow containers across nodes. Local volumes are stuck on one node. Use NFS, cloud block storage, or a distributed storage system.
Bind Mounts
# Map host directory into container
docker run -v /host/path:/container/path nginx
# or new syntax:
docker run --mount type=bind,source=/host/path,target=/container/path nginx
Key behaviors:
- If host path exists: container sees host contents
- If host path doesn’t exist: Docker creates it as an empty directory (owned by root)
- Container can modify host files (unless
:roread-only flag) - File permissions follow host UID/GID mapping
ELI5: A bind mount is like a window between two rooms. Both rooms can see and touch the same objects on the windowsill. If someone in Room A (container) moves something, Room B (host) sees the change immediately, and vice versa.
Bind Mount vs Volume: When to Use Which
| Scenario | Use Volume | Use Bind Mount |
|---|---|---|
| Database files | ✅ Docker manages it | ❌ Permission issues, fragile |
| Application config files | ❌ Overkill | ✅ Edit on host, container picks up changes |
| Source code in development | ❌ | ✅ Live reload with hot reloading |
| Log files (need host access) | ❌ | ✅ Host log collection agents can read them |
| Shared data between containers | ✅ Clean, managed | ⚠️ Works but less portable |
| CI/CD build artifacts | ❌ | ✅ Host needs the output |
Common mistake: Using bind mounts for database storage in production. Bind mounts expose raw host paths, making containers non-portable. The host path might not exist on another node, permissions might differ, and there’s no lifecycle management. Use named volumes for any data you care about.
tmpfs Mounts
docker run --tmpfs /tmp:rw,noexec,nosuid,size=100m myapp
# or
docker run --mount type=tmpfs,target=/run/secrets,tmpfs-size=64m myapp
Stored in RAM (or swap). Never written to disk. Gone when container stops.
When to use:
- Secrets that should never touch disk (API keys, tokens)
- Temporary files that need fast I/O
/tmpfor applications that write temp data aggressively
Why this matters for security: If your container writes secrets to the writable layer, those secrets exist on disk in
/var/lib/docker/and persist even after the container is removed (until Docker prunes). With tmpfs, secrets only exist in RAM and vanish when the container stops.
Kubernetes Storage
The Abstraction Stack
PersistentVolumeClaim (PVC) ← Developer says "I need 10Gi fast storage"
↓
PersistentVolume (PV) ← Admin/dynamic provisioner provides actual storage
↓
StorageClass ← Defines the type of storage (gp3, io2, nfs, etc.)
↓
CSI Driver ← Translates K8s requests to storage backend API calls
↓
Actual Storage ← AWS EBS, GCP PD, Azure Disk, NFS, Ceph, etc.
ELI5: Imagine ordering food at a restaurant. You (developer) write your order on a slip (PVC: “I want pasta”). The waiter (StorageClass) knows which kitchen (storage backend) makes pasta. The kitchen (CSI driver) actually cooks it and delivers a plate (PV). You don’t need to know which kitchen or how they cook — you just get your pasta.
PersistentVolume and PVC
# PVC — what the developer asks for
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: db-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp3-encrypted
Access Modes
| Mode | Abbreviation | What It Means | Supported By |
|---|---|---|---|
| ReadWriteOnce | RWO | One node can mount read-write | Most block storage (EBS, GCP PD) |
| ReadOnlyMany | ROX | Many nodes mount read-only | NFS, cloud file storage |
| ReadWriteMany | RWX | Many nodes mount read-write | NFS, EFS, CephFS, Portworx |
| ReadWriteOncePod | RWOP | Exactly one pod can mount read-write | CSI drivers (K8s 1.27+) |
Common mistake: Requesting RWX access mode with AWS EBS. EBS is block storage — it can only attach to one node at a time (RWO). Use EFS (NFS-based) for RWX in AWS.
StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
type: gp3
encrypted: "true"
iops: "3000"
throughput: "125"
reclaimPolicy: Retain # Don't delete the volume when PVC is deleted
volumeBindingMode: WaitForFirstConsumer # Don't provision until pod is scheduled
allowVolumeExpansion: true # Allow PVC resize
Reclaim Policies
| Policy | What Happens When PVC Is Deleted | Use Case |
|---|---|---|
| Retain | PV and data preserved, manual cleanup needed | Production databases — never auto-delete data |
| Delete | PV and underlying storage deleted | Dev/test — clean up automatically |
| Recycle | rm -rf /volume/* then reuse | Deprecated. Don’t use. |
Think of it this way: Retain = hotel keeps your luggage after checkout until you come back for it. Delete = hotel throws your luggage away when you check out. Recycle = hotel empties your suitcase and gives it to the next guest (gross, deprecated).
Common mistake: Using Delete reclaim policy for production databases. One accidental kubectl delete pvc and your data is gone. Always use Retain for anything you can’t regenerate.
StatefulSets and Stable Storage
StatefulSets give pods stable identities and persistent storage:
mysql-0 → pvc-mysql-0 → pv-xxx (10Gi on node A)
mysql-1 → pvc-mysql-1 → pv-yyy (10Gi on node B)
mysql-2 → pvc-mysql-2 → pv-zzz (10Gi on node C)
Each pod always gets the same PVC/PV, even after rescheduling. This is essential for databases, message queues, and any stateful workload.
Key behaviors:
- Pods are created/deleted in order (0, 1, 2… and 2, 1, 0)
- Each pod gets a stable network identity:
<name>-<ordinal>.<headless-service> - PVCs are NOT deleted when the StatefulSet is deleted (data safety)
Storage Driver Performance
The storage driver (overlay2, etc.) affects container I/O performance for writes to the container’s writable layer.
| Operation | Performance | Why |
|---|---|---|
| Read from image layer | Fast | Direct from host filesystem |
| First write to image file | Slow (CoW) | Entire file copied to writable layer |
| Subsequent writes to same file | Normal | Already in writable layer |
| Write to volume/bind mount | Full speed | Bypasses storage driver entirely |
Why this matters: Databases, log files, uploads — anything write-heavy MUST use volumes, not the container’s writable layer. The copy-on-write penalty for the first write can be huge for large files (think: modifying a 500MB SQLite database copies the entire file).
Backup Strategies
Docker Volumes
# Backup: run a temp container that mounts the volume and tars it
docker run --rm -v mydata:/data -v $(pwd):/backup alpine \
tar czf /backup/mydata-backup.tar.gz -C /data .
# Restore
docker run --rm -v mydata:/data -v $(pwd):/backup alpine \
sh -c "cd /data && tar xzf /backup/mydata-backup.tar.gz"
Kubernetes PV Backup
| Method | How | Pros | Cons |
|---|---|---|---|
| Volume snapshots | VolumeSnapshot CRD + CSI driver | Fast, storage-level, point-in-time | CSI driver must support it |
| Velero | Backup operator (PVs + K8s resources) | Full cluster backup, scheduled | Adds complexity |
| Application-level | pg_dump, mysqldump, etc. | Consistent, application-aware | Slower, app-specific |
| Storage-level | AWS EBS snapshots, GCP disk snapshots | Fast, infrastructure-native | Cloud-specific |
Decision framework: Application-level backup for consistency (databases). Volume snapshots for speed (everything else). Velero for disaster recovery (full cluster). Never rely on just one method.
Key Takeaways for Interviews
- “Volume vs bind mount?” → Volumes are Docker-managed, portable, production-ready. Bind mounts are host paths, for dev/config. Use volumes for data you care about.
- “How does K8s persistent storage work?” → PVC requests storage → StorageClass provisions PV via CSI driver → Pod mounts PV. StatefulSets for stable per-pod storage.
- “RWO vs RWX?” → RWO = one node (block storage). RWX = many nodes (file storage like NFS/EFS). Choose based on workload access pattern.
- “What happens when you delete a PVC?” → Depends on reclaim policy. Retain = data preserved. Delete = data gone. Use Retain in production.
- “How do you handle database storage in containers?” → StatefulSet + PVC with Retain policy + volume snapshots for backup + application-level backup for consistency.