Container Storage

8 min read 1592 words

Container Storage

Data loss is career-ending. If you don’t understand where your container’s bytes live and what survives a restart vs. what vanishes, you will lose data in production.

The Fundamental Rule

Container writable layers are ephemeral. When a container is removed, its writable layer is deleted. Everything written inside the container (logs, uploads, database files) is gone.

Container removed → writable layer deleted → data lost
                    volumes survive ✓
                    bind mounts survive ✓
                    tmpfs gone ✗

ELI5: A container’s filesystem is like a whiteboard in a hotel meeting room. You can write whatever you want during your stay, but when you check out, housekeeping erases it. If you want to keep your notes, you need to save them to your own notebook (volume) or the hotel’s filing cabinet (bind mount) before checkout.

Storage Options Compared

Type	Managed by Docker?	Persists after `docker rm`?	Shared between containers?	Host path visible?	Use case
Named Volume	Yes	Yes	Yes	`/var/lib/docker/volumes/`	Databases, app state, production data
Anonymous Volume	Yes	No (unless `--rm` is not used)	No	Auto-generated path	Temporary persistent data
Bind Mount	No	Yes (it’s host filesystem)	Yes	Wherever you point it	Config files, source code, dev workflows
tmpfs	No	No (RAM only)	No	Not on disk at all	Secrets, sensitive temp data, /tmp

Decision framework: Need persistent data? → Named volume. Need to inject host files? → Bind mount. Need in-memory scratch space? → tmpfs. Not sure? → Named volume. It’s the safest default.

Volumes Deep Dive

Named Volumes

# Create explicitly
docker volume create mydata

# Or create implicitly on first use
docker run -v mydata:/var/lib/mysql mysql:8

# Inspect
docker volume inspect mydata
# → Mountpoint: /var/lib/docker/volumes/mydata/_data

Docker manages the lifecycle. The volume survives container removal. Multiple containers can mount the same volume (but concurrent writes need application-level coordination).

Volume Drivers

Default driver stores data on the local filesystem. Volume drivers extend this to external storage:

Driver	Backend	Use Case
local (default)	Host filesystem	Single-node, dev/test
nfs	NFS server	Shared storage across hosts
rexray/ebs	AWS EBS	Persistent block storage in AWS
portworx	Portworx cluster	Multi-cloud persistent storage
local-persist	Local with custom path	Named volume at specific host location

# Create NFS-backed volume
docker volume create --driver local \
  --opt type=nfs \
  --opt o=addr=nfs-server.local,rw \
  --opt device=:/exports/data \
  nfs-data

Common mistake: Using the local driver in a multi-node Swarm/K8s cluster and expecting volumes to follow containers across nodes. Local volumes are stuck on one node. Use NFS, cloud block storage, or a distributed storage system.

Bind Mounts

# Map host directory into container
docker run -v /host/path:/container/path nginx
# or new syntax:
docker run --mount type=bind,source=/host/path,target=/container/path nginx

Key behaviors:

If host path exists: container sees host contents
If host path doesn’t exist: Docker creates it as an empty directory (owned by root)
Container can modify host files (unless :ro read-only flag)
File permissions follow host UID/GID mapping

ELI5: A bind mount is like a window between two rooms. Both rooms can see and touch the same objects on the windowsill. If someone in Room A (container) moves something, Room B (host) sees the change immediately, and vice versa.

Bind Mount vs Volume: When to Use Which

Scenario	Use Volume	Use Bind Mount
Database files	✅ Docker manages it	❌ Permission issues, fragile
Application config files	❌ Overkill	✅ Edit on host, container picks up changes
Source code in development	❌	✅ Live reload with hot reloading
Log files (need host access)	❌	✅ Host log collection agents can read them
Shared data between containers	✅ Clean, managed	⚠️ Works but less portable
CI/CD build artifacts	❌	✅ Host needs the output

Common mistake: Using bind mounts for database storage in production. Bind mounts expose raw host paths, making containers non-portable. The host path might not exist on another node, permissions might differ, and there’s no lifecycle management. Use named volumes for any data you care about.

tmpfs Mounts

docker run --tmpfs /tmp:rw,noexec,nosuid,size=100m myapp
# or
docker run --mount type=tmpfs,target=/run/secrets,tmpfs-size=64m myapp

Stored in RAM (or swap). Never written to disk. Gone when container stops.

When to use:

Secrets that should never touch disk (API keys, tokens)
Temporary files that need fast I/O
/tmp for applications that write temp data aggressively

Why this matters for security: If your container writes secrets to the writable layer, those secrets exist on disk in /var/lib/docker/ and persist even after the container is removed (until Docker prunes). With tmpfs, secrets only exist in RAM and vanish when the container stops.

Kubernetes Storage

The Abstraction Stack

PersistentVolumeClaim (PVC)  ← Developer says "I need 10Gi fast storage"
         ↓
PersistentVolume (PV)        ← Admin/dynamic provisioner provides actual storage
         ↓
StorageClass                 ← Defines the type of storage (gp3, io2, nfs, etc.)
         ↓
CSI Driver                   ← Translates K8s requests to storage backend API calls
         ↓
Actual Storage               ← AWS EBS, GCP PD, Azure Disk, NFS, Ceph, etc.

ELI5: Imagine ordering food at a restaurant. You (developer) write your order on a slip (PVC: “I want pasta”). The waiter (StorageClass) knows which kitchen (storage backend) makes pasta. The kitchen (CSI driver) actually cooks it and delivers a plate (PV). You don’t need to know which kitchen or how they cook — you just get your pasta.

PersistentVolume and PVC

# PVC — what the developer asks for
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: db-storage
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3-encrypted

Access Modes

Mode	Abbreviation	What It Means	Supported By
ReadWriteOnce	RWO	One node can mount read-write	Most block storage (EBS, GCP PD)
ReadOnlyMany	ROX	Many nodes mount read-only	NFS, cloud file storage
ReadWriteMany	RWX	Many nodes mount read-write	NFS, EFS, CephFS, Portworx
ReadWriteOncePod	RWOP	Exactly one pod can mount read-write	CSI drivers (K8s 1.27+)

Common mistake: Requesting RWX access mode with AWS EBS. EBS is block storage — it can only attach to one node at a time (RWO). Use EFS (NFS-based) for RWX in AWS.

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  iops: "3000"
  throughput: "125"
reclaimPolicy: Retain      # Don't delete the volume when PVC is deleted
volumeBindingMode: WaitForFirstConsumer  # Don't provision until pod is scheduled
allowVolumeExpansion: true  # Allow PVC resize

Reclaim Policies

Policy	What Happens When PVC Is Deleted	Use Case
Retain	PV and data preserved, manual cleanup needed	Production databases — never auto-delete data
Delete	PV and underlying storage deleted	Dev/test — clean up automatically
Recycle	`rm -rf /volume/*` then reuse	Deprecated. Don’t use.

Think of it this way: Retain = hotel keeps your luggage after checkout until you come back for it. Delete = hotel throws your luggage away when you check out. Recycle = hotel empties your suitcase and gives it to the next guest (gross, deprecated).

Common mistake: Using Delete reclaim policy for production databases. One accidental kubectl delete pvc and your data is gone. Always use Retain for anything you can’t regenerate.

StatefulSets and Stable Storage

StatefulSets give pods stable identities and persistent storage:

mysql-0 → pvc-mysql-0 → pv-xxx (10Gi on node A)
mysql-1 → pvc-mysql-1 → pv-yyy (10Gi on node B)  
mysql-2 → pvc-mysql-2 → pv-zzz (10Gi on node C)

Each pod always gets the same PVC/PV, even after rescheduling. This is essential for databases, message queues, and any stateful workload.

Key behaviors:

Pods are created/deleted in order (0, 1, 2… and 2, 1, 0)
Each pod gets a stable network identity: <name>-<ordinal>.<headless-service>
PVCs are NOT deleted when the StatefulSet is deleted (data safety)

Storage Driver Performance

The storage driver (overlay2, etc.) affects container I/O performance for writes to the container’s writable layer.

Operation	Performance	Why
Read from image layer	Fast	Direct from host filesystem
First write to image file	Slow (CoW)	Entire file copied to writable layer
Subsequent writes to same file	Normal	Already in writable layer
Write to volume/bind mount	Full speed	Bypasses storage driver entirely

Why this matters: Databases, log files, uploads — anything write-heavy MUST use volumes, not the container’s writable layer. The copy-on-write penalty for the first write can be huge for large files (think: modifying a 500MB SQLite database copies the entire file).

Backup Strategies

Docker Volumes

# Backup: run a temp container that mounts the volume and tars it
docker run --rm -v mydata:/data -v $(pwd):/backup alpine \
  tar czf /backup/mydata-backup.tar.gz -C /data .

# Restore
docker run --rm -v mydata:/data -v $(pwd):/backup alpine \
  sh -c "cd /data && tar xzf /backup/mydata-backup.tar.gz"

Kubernetes PV Backup

Method	How	Pros	Cons
Volume snapshots	VolumeSnapshot CRD + CSI driver	Fast, storage-level, point-in-time	CSI driver must support it
Velero	Backup operator (PVs + K8s resources)	Full cluster backup, scheduled	Adds complexity
Application-level	pg_dump, mysqldump, etc.	Consistent, application-aware	Slower, app-specific
Storage-level	AWS EBS snapshots, GCP disk snapshots	Fast, infrastructure-native	Cloud-specific

Decision framework: Application-level backup for consistency (databases). Volume snapshots for speed (everything else). Velero for disaster recovery (full cluster). Never rely on just one method.

Key Takeaways for Interviews

“Volume vs bind mount?” → Volumes are Docker-managed, portable, production-ready. Bind mounts are host paths, for dev/config. Use volumes for data you care about.
“How does K8s persistent storage work?” → PVC requests storage → StorageClass provisions PV via CSI driver → Pod mounts PV. StatefulSets for stable per-pod storage.
“RWO vs RWX?” → RWO = one node (block storage). RWX = many nodes (file storage like NFS/EFS). Choose based on workload access pattern.
“What happens when you delete a PVC?” → Depends on reclaim policy. Retain = data preserved. Delete = data gone. Use Retain in production.
“How do you handle database storage in containers?” → StatefulSet + PVC with Retain policy + volume snapshots for backup + application-level backup for consistency.

Container Storage#

The Fundamental Rule#

Storage Options Compared#

Volumes Deep Dive#

Named Volumes#

Volume Drivers#

Bind Mounts#

Bind Mount vs Volume: When to Use Which#

tmpfs Mounts#

Kubernetes Storage#

The Abstraction Stack#

PersistentVolume and PVC#

Access Modes#

StorageClass#

Reclaim Policies#

StatefulSets and Stable Storage#

Storage Driver Performance#

Backup Strategies#

Docker Volumes#

Kubernetes PV Backup#

Key Takeaways for Interviews#

Container Storage

The Fundamental Rule

Storage Options Compared

Volumes Deep Dive

Named Volumes

Volume Drivers

Bind Mounts

Bind Mount vs Volume: When to Use Which

tmpfs Mounts

Kubernetes Storage

The Abstraction Stack

PersistentVolume and PVC

Access Modes

StorageClass

Reclaim Policies

StatefulSets and Stable Storage

Storage Driver Performance

Backup Strategies

Docker Volumes

Kubernetes PV Backup

Key Takeaways for Interviews