← Docker & Containers Advanced

Container Storage

Container Storage

Data loss is career-ending. If you don’t understand where your container’s bytes live and what survives a restart vs. what vanishes, you will lose data in production.


The Fundamental Rule

Container writable layers are ephemeral. When a container is removed, its writable layer is deleted. Everything written inside the container (logs, uploads, database files) is gone.

Container removed → writable layer deleted → data lost
                    volumes survive ✓
                    bind mounts survive ✓
                    tmpfs gone ✗

ELI5: A container’s filesystem is like a whiteboard in a hotel meeting room. You can write whatever you want during your stay, but when you check out, housekeeping erases it. If you want to keep your notes, you need to save them to your own notebook (volume) or the hotel’s filing cabinet (bind mount) before checkout.


Storage Options Compared

TypeManaged by Docker?Persists after docker rm?Shared between containers?Host path visible?Use case
Named VolumeYesYesYes/var/lib/docker/volumes/Databases, app state, production data
Anonymous VolumeYesNo (unless --rm is not used)NoAuto-generated pathTemporary persistent data
Bind MountNoYes (it’s host filesystem)YesWherever you point itConfig files, source code, dev workflows
tmpfsNoNo (RAM only)NoNot on disk at allSecrets, sensitive temp data, /tmp

Decision framework: Need persistent data? → Named volume. Need to inject host files? → Bind mount. Need in-memory scratch space? → tmpfs. Not sure? → Named volume. It’s the safest default.


Volumes Deep Dive

Named Volumes

# Create explicitly
docker volume create mydata

# Or create implicitly on first use
docker run -v mydata:/var/lib/mysql mysql:8

# Inspect
docker volume inspect mydata
# → Mountpoint: /var/lib/docker/volumes/mydata/_data

Docker manages the lifecycle. The volume survives container removal. Multiple containers can mount the same volume (but concurrent writes need application-level coordination).

Volume Drivers

Default driver stores data on the local filesystem. Volume drivers extend this to external storage:

DriverBackendUse Case
local (default)Host filesystemSingle-node, dev/test
nfsNFS serverShared storage across hosts
rexray/ebsAWS EBSPersistent block storage in AWS
portworxPortworx clusterMulti-cloud persistent storage
local-persistLocal with custom pathNamed volume at specific host location
# Create NFS-backed volume
docker volume create --driver local \
  --opt type=nfs \
  --opt o=addr=nfs-server.local,rw \
  --opt device=:/exports/data \
  nfs-data

Common mistake: Using the local driver in a multi-node Swarm/K8s cluster and expecting volumes to follow containers across nodes. Local volumes are stuck on one node. Use NFS, cloud block storage, or a distributed storage system.


Bind Mounts

# Map host directory into container
docker run -v /host/path:/container/path nginx
# or new syntax:
docker run --mount type=bind,source=/host/path,target=/container/path nginx

Key behaviors:

  • If host path exists: container sees host contents
  • If host path doesn’t exist: Docker creates it as an empty directory (owned by root)
  • Container can modify host files (unless :ro read-only flag)
  • File permissions follow host UID/GID mapping

ELI5: A bind mount is like a window between two rooms. Both rooms can see and touch the same objects on the windowsill. If someone in Room A (container) moves something, Room B (host) sees the change immediately, and vice versa.

Bind Mount vs Volume: When to Use Which

ScenarioUse VolumeUse Bind Mount
Database files✅ Docker manages it❌ Permission issues, fragile
Application config files❌ Overkill✅ Edit on host, container picks up changes
Source code in development✅ Live reload with hot reloading
Log files (need host access)✅ Host log collection agents can read them
Shared data between containers✅ Clean, managed⚠️ Works but less portable
CI/CD build artifacts✅ Host needs the output

Common mistake: Using bind mounts for database storage in production. Bind mounts expose raw host paths, making containers non-portable. The host path might not exist on another node, permissions might differ, and there’s no lifecycle management. Use named volumes for any data you care about.


tmpfs Mounts

docker run --tmpfs /tmp:rw,noexec,nosuid,size=100m myapp
# or
docker run --mount type=tmpfs,target=/run/secrets,tmpfs-size=64m myapp

Stored in RAM (or swap). Never written to disk. Gone when container stops.

When to use:

  • Secrets that should never touch disk (API keys, tokens)
  • Temporary files that need fast I/O
  • /tmp for applications that write temp data aggressively

Why this matters for security: If your container writes secrets to the writable layer, those secrets exist on disk in /var/lib/docker/ and persist even after the container is removed (until Docker prunes). With tmpfs, secrets only exist in RAM and vanish when the container stops.


Kubernetes Storage

The Abstraction Stack

PersistentVolumeClaim (PVC)  ← Developer says "I need 10Gi fast storage"
         ↓
PersistentVolume (PV)        ← Admin/dynamic provisioner provides actual storage
         ↓
StorageClass                 ← Defines the type of storage (gp3, io2, nfs, etc.)
         ↓
CSI Driver                   ← Translates K8s requests to storage backend API calls
         ↓
Actual Storage               ← AWS EBS, GCP PD, Azure Disk, NFS, Ceph, etc.

ELI5: Imagine ordering food at a restaurant. You (developer) write your order on a slip (PVC: “I want pasta”). The waiter (StorageClass) knows which kitchen (storage backend) makes pasta. The kitchen (CSI driver) actually cooks it and delivers a plate (PV). You don’t need to know which kitchen or how they cook — you just get your pasta.

PersistentVolume and PVC

# PVC — what the developer asks for
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: db-storage
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3-encrypted

Access Modes

ModeAbbreviationWhat It MeansSupported By
ReadWriteOnceRWOOne node can mount read-writeMost block storage (EBS, GCP PD)
ReadOnlyManyROXMany nodes mount read-onlyNFS, cloud file storage
ReadWriteManyRWXMany nodes mount read-writeNFS, EFS, CephFS, Portworx
ReadWriteOncePodRWOPExactly one pod can mount read-writeCSI drivers (K8s 1.27+)

Common mistake: Requesting RWX access mode with AWS EBS. EBS is block storage — it can only attach to one node at a time (RWO). Use EFS (NFS-based) for RWX in AWS.

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  iops: "3000"
  throughput: "125"
reclaimPolicy: Retain      # Don't delete the volume when PVC is deleted
volumeBindingMode: WaitForFirstConsumer  # Don't provision until pod is scheduled
allowVolumeExpansion: true  # Allow PVC resize

Reclaim Policies

PolicyWhat Happens When PVC Is DeletedUse Case
RetainPV and data preserved, manual cleanup neededProduction databases — never auto-delete data
DeletePV and underlying storage deletedDev/test — clean up automatically
Recyclerm -rf /volume/* then reuseDeprecated. Don’t use.

Think of it this way: Retain = hotel keeps your luggage after checkout until you come back for it. Delete = hotel throws your luggage away when you check out. Recycle = hotel empties your suitcase and gives it to the next guest (gross, deprecated).

Common mistake: Using Delete reclaim policy for production databases. One accidental kubectl delete pvc and your data is gone. Always use Retain for anything you can’t regenerate.

StatefulSets and Stable Storage

StatefulSets give pods stable identities and persistent storage:

mysql-0 → pvc-mysql-0 → pv-xxx (10Gi on node A)
mysql-1 → pvc-mysql-1 → pv-yyy (10Gi on node B)  
mysql-2 → pvc-mysql-2 → pv-zzz (10Gi on node C)

Each pod always gets the same PVC/PV, even after rescheduling. This is essential for databases, message queues, and any stateful workload.

Key behaviors:

  • Pods are created/deleted in order (0, 1, 2… and 2, 1, 0)
  • Each pod gets a stable network identity: <name>-<ordinal>.<headless-service>
  • PVCs are NOT deleted when the StatefulSet is deleted (data safety)

Storage Driver Performance

The storage driver (overlay2, etc.) affects container I/O performance for writes to the container’s writable layer.

OperationPerformanceWhy
Read from image layerFastDirect from host filesystem
First write to image fileSlow (CoW)Entire file copied to writable layer
Subsequent writes to same fileNormalAlready in writable layer
Write to volume/bind mountFull speedBypasses storage driver entirely

Why this matters: Databases, log files, uploads — anything write-heavy MUST use volumes, not the container’s writable layer. The copy-on-write penalty for the first write can be huge for large files (think: modifying a 500MB SQLite database copies the entire file).


Backup Strategies

Docker Volumes

# Backup: run a temp container that mounts the volume and tars it
docker run --rm -v mydata:/data -v $(pwd):/backup alpine \
  tar czf /backup/mydata-backup.tar.gz -C /data .

# Restore
docker run --rm -v mydata:/data -v $(pwd):/backup alpine \
  sh -c "cd /data && tar xzf /backup/mydata-backup.tar.gz"

Kubernetes PV Backup

MethodHowProsCons
Volume snapshotsVolumeSnapshot CRD + CSI driverFast, storage-level, point-in-timeCSI driver must support it
VeleroBackup operator (PVs + K8s resources)Full cluster backup, scheduledAdds complexity
Application-levelpg_dump, mysqldump, etc.Consistent, application-awareSlower, app-specific
Storage-levelAWS EBS snapshots, GCP disk snapshotsFast, infrastructure-nativeCloud-specific

Decision framework: Application-level backup for consistency (databases). Volume snapshots for speed (everything else). Velero for disaster recovery (full cluster). Never rely on just one method.


Key Takeaways for Interviews

  1. “Volume vs bind mount?” → Volumes are Docker-managed, portable, production-ready. Bind mounts are host paths, for dev/config. Use volumes for data you care about.
  2. “How does K8s persistent storage work?” → PVC requests storage → StorageClass provisions PV via CSI driver → Pod mounts PV. StatefulSets for stable per-pod storage.
  3. “RWO vs RWX?” → RWO = one node (block storage). RWX = many nodes (file storage like NFS/EFS). Choose based on workload access pattern.
  4. “What happens when you delete a PVC?” → Depends on reclaim policy. Retain = data preserved. Delete = data gone. Use Retain in production.
  5. “How do you handle database storage in containers?” → StatefulSet + PVC with Retain policy + volume snapshots for backup + application-level backup for consistency.