← Docker & Containers Advanced

Interview Scenarios

Interview Scenarios

What interviewers actually ask at senior/principal DevOps level. Not trivia — design questions, debugging scenarios, and trade-off discussions that test depth of understanding.


Design Questions

“Design a CI/CD pipeline for containerized microservices”

Expected answer structure:

Source → Build → Test → Scan → Push → Deploy → Verify
StageWhat HappensTools
SourcePR triggers pipeline, lint DockerfilesGitHub Actions, GitLab CI
BuildMulti-stage Docker build, BuildKit cacheDocker/Kaniko/BuildKit
TestUnit tests in container, integration tests with Composepytest, Jest in containers
ScanImage vulnerability scan, SBOM generationTrivy, Docker Scout, Syft
PushTag with git SHA + semver, push to registryECR, GCR, Harbor
DeployGitOps sync (ArgoCD) or rolling deployArgoCD, Helm, Kustomize
VerifySmoke tests, canary analysis, auto-rollbackArgo Rollouts, Flagger

Key points to mention:

  • Image tags: use git SHA (immutable), never :latest in production
  • Build cache: mount cache layers for npm/pip to speed up builds
  • Security gate: fail pipeline if HIGH/CRITICAL CVEs found
  • Secrets: never in Dockerfile, use CI secrets manager + BuildKit secret mounts
  • Multi-arch: build for amd64 + arm64 if targeting Graviton

“Design container orchestration for a stateful application (PostgreSQL)”

Expected answer:

StatefulSet + PVC (Retain policy) + Headless Service
ComponentWhy
StatefulSet (not Deployment)Stable pod identity, ordered create/delete, stable storage
PVC per replica with Retain policyData survives pod rescheduling, no accidental deletion
Headless ServiceDirect DNS to each pod: postgres-0.postgres-svc
Anti-affinitySpread replicas across nodes/AZs
Resource requests = limitsGuaranteed QoS, predictable performance
Backup CronJobpg_dump to S3 + volume snapshots

Bonus points:

  • Mention operators (CloudNativePG, Zalando Postgres Operator) for production databases
  • Discuss when NOT to containerize databases (very high I/O, regulatory requirements)
  • Mention that most teams use managed databases (RDS, Cloud SQL) and only containerize stateless services

Interview tip: If asked “should you run databases in containers?” — the answer is nuanced. For development and testing, always. For production, it depends on team expertise, I/O requirements, and operational maturity. Managed services (RDS) are simpler but more expensive. Operators (CloudNativePG) make it viable but require K8s expertise.


“Design a multi-tenant container platform”

Expected answer covers:

LayerHow to Isolate
NamespaceOne namespace per tenant
NetworkNetworkPolicy: deny all, allow only within tenant namespace
ResourcesResourceQuotas per namespace (CPU, memory, pods, PVCs)
RBACPer-tenant ServiceAccount, scoped to their namespace
RuntimegVisor RuntimeClass for untrusted workloads
StoragePer-tenant StorageClass with quotas
RegistryPer-tenant image repository or admission control for allowed images
ObservabilityPer-tenant labels, Prometheus multi-tenancy, Loki tenant ID

Key considerations to discuss:

  • Namespace-level isolation is NOT as strong as separate clusters. Kernel is shared.
  • For strict isolation (financial services, healthcare), consider separate clusters per tenant
  • Cost allocation: use labels + Kubecost for per-tenant billing
  • Noisy neighbor: ResourceQuotas prevent one tenant from consuming all resources

Debugging Scenarios

“A container keeps restarting. How do you debug it?”

Step-by-step approach:

# 1. Check the restart reason
kubectl describe pod <pod>
# Look for: OOMKilled, CrashLoopBackOff, Error, exit code

# 2. Check logs from the crashed container
kubectl logs <pod> --previous

# 3. Check events
kubectl get events --sort-by=.metadata.creationTimestamp | grep <pod>

# 4. Common causes by exit code:
Exit CodeMeaningCommon Cause
0SuccessApp finished (shouldn’t happen for long-running) — check CMD
1Application errorUnhandled exception, missing config
137SIGKILL (128 + 9)OOM killed or kubectl delete --force
139SIGSEGV (128 + 11)Segmentation fault — native code bug
143SIGTERM (128 + 15)Graceful shutdown (normal for pod deletion)

Key insight: Exit code 137 is almost always OOM kill. Check kubectl describe pod for Reason: OOMKilled. Fix: increase memory limit or fix the memory leak. Don’t just increase the limit blindly — profile the app first.


“Pods can’t communicate across nodes”

Debugging checklist:

# 1. Can pods on the SAME node communicate?
kubectl exec pod-on-node-a -- curl pod-on-node-a-ip:port
# If yes → cross-node networking issue
# If no → CNI plugin broken on that node

# 2. Check CNI pods are running
kubectl get pods -n kube-system | grep -E 'calico|cilium|flannel|weave'
# Any CrashLoopBackOff? Restart them.

# 3. Check node firewall rules
# VXLAN (Flannel, Calico VXLAN): UDP 4789
# Calico BGP: TCP 179
# WireGuard (Calico, Cilium): UDP 51820

# 4. Check kube-proxy mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# iptables or IPVS — check respective rules on nodes

# 5. Check node-to-node connectivity
kubectl debug node/<node> -it --image=busybox -- ping <other-node-ip>

“DNS resolution is slow in the cluster”

Root cause is usually the ndots:5 default:

# Default /etc/resolv.conf in a K8s pod:
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

A lookup for api.example.com (4 dots including implicit trailing) triggers:

  1. api.example.com.default.svc.cluster.local → NXDOMAIN
  2. api.example.com.svc.cluster.local → NXDOMAIN
  3. api.example.com.cluster.local → NXDOMAIN
  4. api.example.com → SUCCESS

Four unnecessary DNS queries before the real one. At scale, this hammers CoreDNS.

Fixes:

# Option 1: Reduce ndots in pod spec
dnsConfig:
  options:
  - name: ndots
    value: "2"

# Option 2: Use FQDN with trailing dot (bypasses search domains)
# curl http://api.example.com.    ← trailing dot = absolute, no search

# Option 3: Scale CoreDNS
# kubectl scale deployment coredns -n kube-system --replicas=5

# Option 4: NodeLocal DNS Cache (DaemonSet that caches DNS on each node)

Trade-off Discussions

“Monolith in a container vs microservices?”

FactorMonolith ContainerMicroservices
ComplexityLow — one Dockerfile, one deployHigh — many services, dependencies, network
Team sizeSmall team (<10 devs)Large team (>10 devs, multiple squads)
ScalingScale entire app (wasteful)Scale individual services (efficient)
DebuggingSimple (one process)Hard (distributed tracing needed)
Deploy speedSlow (deploy everything)Fast (deploy one service)
Data consistencyEasy (one database)Hard (eventual consistency, sagas)
Starting a new projectStart hereMigrate when you hit scaling/team problems

Principal-level answer: “Start with a monolith in a container. It gets you containerization benefits (reproducible builds, consistent environments, CI/CD) without microservice complexity. Extract services when you have a clear need: a component needs independent scaling, teams need independent deploy cycles, or a bounded context is well-defined. Most premature microservice decompositions create distributed monoliths that are worse than the original.”

“Managed Kubernetes vs self-managed?”

FactorManaged (EKS/GKE/AKS)Self-managed (kubeadm/k3s)
Control planeProvider manages (HA, upgrades, patching)You manage everything
Cost$72-150/month for control plane + node costsOnly node costs
CustomizationLimited (can’t modify API server flags easily)Full control
Upgrade effortPush-button (provider handles)You plan and execute
SupportProvider SLA, support ticketsCommunity or paid support
Expertise neededModerateHigh

Principal-level answer: “Managed K8s for 95% of cases. The control plane management alone is worth the cost — HA etcd, API server upgrades, certificate rotation. Self-managed only when you need custom API server configuration (unusual admission controllers, specific authentication), air-gapped environments, or edge deployments where cloud isn’t available.”

“When NOT to use containers?”

  • High-performance computing (HPC) where direct hardware access matters (GPU passthrough has overhead)
  • Windows-native applications that don’t containerize well (COM, registry-dependent)
  • Extremely latency-sensitive workloads where any kernel overhead matters (HFT)
  • Embedded/IoT with very limited resources (though podman/containerd work on ARM)
  • Regulatory environments where VM isolation is mandated by compliance

Rapid Fire Questions

QuestionKey Points
ENTRYPOINT vs CMD?ENTRYPOINT = the executable. CMD = default arguments. Together: ENTRYPOINT ["python"] + CMD ["app.py"]. CMD can be overridden at runtime.
ADD vs COPY?COPY copies files. ADD also extracts tars and fetches URLs. Always use COPY unless you specifically need tar extraction.
Expose vs publish?EXPOSE is documentation only. -p (publish) creates the actual port mapping with iptables.
Docker stop vs kill?stop = SIGTERM → wait → SIGKILL. kill = SIGKILL immediately. Always try stop first.
Image vs container?Image = template (read-only layers). Container = running instance (image + writable layer + process).
Swarm vs K8s?Swarm: simpler, Docker-native, limited ecosystem. K8s: complex, massive ecosystem, industry standard.
What is kube-proxy?Implements Service networking. Modes: iptables (default, O(n) rules) or IPVS (hash table, better at scale).
What does kubelet do?Node agent. Ensures containers in pods are running. Manages pod lifecycle, health checks, resource reporting.
etcd?Distributed key-value store. THE source of truth for all K8s cluster state. Back it up or lose everything.
What is a CRD?Custom Resource Definition. Extends K8s API with custom resource types. Operators use CRDs.

Certification Exam Domains Reference

Docker Certified Associate (DCA)

DomainWeight
Orchestration25%
Image Creation, Management & Registry20%
Installation & Configuration15%
Networking15%
Security15%
Storage & Volumes10%

Certified Kubernetes Administrator (CKA)

DomainWeight
Troubleshooting30%
Cluster Architecture, Installation & Configuration25%
Services & Networking20%
Workloads & Scheduling15%
Storage10%

Certified Kubernetes Security Specialist (CKS)

DomainTopics
Cluster SetupAPI server flags, kubelet security, audit logging
Cluster HardeningRBAC, ServiceAccount, PSS
System HardeningAppArmor, SELinux, seccomp
Microservice VulnerabilitiesImage scanning, pod security, secrets
Supply Chain SecurityImage signing, admission controllers
Runtime SecurityFalco, audit logs, monitoring

Study Recommendations

ResourceBest For
Bret Fisher’s Docker Mastery (Udemy)Docker deep dive, endorsed by Docker Inc.
KodeKloud CKA/CKS pathsHands-on labs with browser-based practice
“Kubernetes the Hard Way” (Kelsey Hightower)Understanding K8s internals by building from scratch
CKA Exam Guide (GitHub)Exam-aligned study with practice questions
Killer.shCKA/CKAD/CKS practice exams (realistic environment)