Interview Scenarios

8 min read 1506 words

Table of Contents

Interview Scenarios

Interview Scenarios

What interviewers actually ask at senior/principal DevOps level. Not trivia — design questions, debugging scenarios, and trade-off discussions that test depth of understanding.

Design Questions

“Design a CI/CD pipeline for containerized microservices”

Expected answer structure:

Source → Build → Test → Scan → Push → Deploy → Verify

Stage	What Happens	Tools
Source	PR triggers pipeline, lint Dockerfiles	GitHub Actions, GitLab CI
Build	Multi-stage Docker build, BuildKit cache	Docker/Kaniko/BuildKit
Test	Unit tests in container, integration tests with Compose	pytest, Jest in containers
Scan	Image vulnerability scan, SBOM generation	Trivy, Docker Scout, Syft
Push	Tag with git SHA + semver, push to registry	ECR, GCR, Harbor
Deploy	GitOps sync (ArgoCD) or rolling deploy	ArgoCD, Helm, Kustomize
Verify	Smoke tests, canary analysis, auto-rollback	Argo Rollouts, Flagger

Key points to mention:

Image tags: use git SHA (immutable), never :latest in production
Build cache: mount cache layers for npm/pip to speed up builds
Security gate: fail pipeline if HIGH/CRITICAL CVEs found
Secrets: never in Dockerfile, use CI secrets manager + BuildKit secret mounts
Multi-arch: build for amd64 + arm64 if targeting Graviton

“Design container orchestration for a stateful application (PostgreSQL)”

Expected answer:

StatefulSet + PVC (Retain policy) + Headless Service

Component	Why
StatefulSet (not Deployment)	Stable pod identity, ordered create/delete, stable storage
PVC per replica with Retain policy	Data survives pod rescheduling, no accidental deletion
Headless Service	Direct DNS to each pod: `postgres-0.postgres-svc`
Anti-affinity	Spread replicas across nodes/AZs
Resource requests = limits	Guaranteed QoS, predictable performance
Backup CronJob	pg_dump to S3 + volume snapshots

Bonus points:

Mention operators (CloudNativePG, Zalando Postgres Operator) for production databases
Discuss when NOT to containerize databases (very high I/O, regulatory requirements)
Mention that most teams use managed databases (RDS, Cloud SQL) and only containerize stateless services

Interview tip: If asked “should you run databases in containers?” — the answer is nuanced. For development and testing, always. For production, it depends on team expertise, I/O requirements, and operational maturity. Managed services (RDS) are simpler but more expensive. Operators (CloudNativePG) make it viable but require K8s expertise.

“Design a multi-tenant container platform”

Expected answer covers:

Layer	How to Isolate
Namespace	One namespace per tenant
Network	NetworkPolicy: deny all, allow only within tenant namespace
Resources	ResourceQuotas per namespace (CPU, memory, pods, PVCs)
RBAC	Per-tenant ServiceAccount, scoped to their namespace
Runtime	gVisor RuntimeClass for untrusted workloads
Storage	Per-tenant StorageClass with quotas
Registry	Per-tenant image repository or admission control for allowed images
Observability	Per-tenant labels, Prometheus multi-tenancy, Loki tenant ID

Key considerations to discuss:

Namespace-level isolation is NOT as strong as separate clusters. Kernel is shared.
For strict isolation (financial services, healthcare), consider separate clusters per tenant
Cost allocation: use labels + Kubecost for per-tenant billing
Noisy neighbor: ResourceQuotas prevent one tenant from consuming all resources

Debugging Scenarios

“A container keeps restarting. How do you debug it?”

Step-by-step approach:

# 1. Check the restart reason
kubectl describe pod <pod>
# Look for: OOMKilled, CrashLoopBackOff, Error, exit code

# 2. Check logs from the crashed container
kubectl logs <pod> --previous

# 3. Check events
kubectl get events --sort-by=.metadata.creationTimestamp | grep <pod>

# 4. Common causes by exit code:

Exit Code	Meaning	Common Cause
0	Success	App finished (shouldn’t happen for long-running) — check CMD
1	Application error	Unhandled exception, missing config
137	SIGKILL (128 + 9)	OOM killed or `kubectl delete --force`
139	SIGSEGV (128 + 11)	Segmentation fault — native code bug
143	SIGTERM (128 + 15)	Graceful shutdown (normal for pod deletion)

Key insight: Exit code 137 is almost always OOM kill. Check kubectl describe pod for Reason: OOMKilled. Fix: increase memory limit or fix the memory leak. Don’t just increase the limit blindly — profile the app first.

“Pods can’t communicate across nodes”

Debugging checklist:

# 1. Can pods on the SAME node communicate?
kubectl exec pod-on-node-a -- curl pod-on-node-a-ip:port
# If yes → cross-node networking issue
# If no → CNI plugin broken on that node

# 2. Check CNI pods are running
kubectl get pods -n kube-system | grep -E 'calico|cilium|flannel|weave'
# Any CrashLoopBackOff? Restart them.

# 3. Check node firewall rules
# VXLAN (Flannel, Calico VXLAN): UDP 4789
# Calico BGP: TCP 179
# WireGuard (Calico, Cilium): UDP 51820

# 4. Check kube-proxy mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# iptables or IPVS — check respective rules on nodes

# 5. Check node-to-node connectivity
kubectl debug node/<node> -it --image=busybox -- ping <other-node-ip>

“DNS resolution is slow in the cluster”

Root cause is usually the ndots:5 default:

# Default /etc/resolv.conf in a K8s pod:
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

A lookup for api.example.com (4 dots including implicit trailing) triggers:

api.example.com.default.svc.cluster.local → NXDOMAIN
api.example.com.svc.cluster.local → NXDOMAIN
api.example.com.cluster.local → NXDOMAIN
api.example.com → SUCCESS

Four unnecessary DNS queries before the real one. At scale, this hammers CoreDNS.

Fixes:

# Option 1: Reduce ndots in pod spec
dnsConfig:
  options:
  - name: ndots
    value: "2"

# Option 2: Use FQDN with trailing dot (bypasses search domains)
# curl http://api.example.com.    ← trailing dot = absolute, no search

# Option 3: Scale CoreDNS
# kubectl scale deployment coredns -n kube-system --replicas=5

# Option 4: NodeLocal DNS Cache (DaemonSet that caches DNS on each node)

Trade-off Discussions

“Monolith in a container vs microservices?”

Factor	Monolith Container	Microservices
Complexity	Low — one Dockerfile, one deploy	High — many services, dependencies, network
Team size	Small team (<10 devs)	Large team (>10 devs, multiple squads)
Scaling	Scale entire app (wasteful)	Scale individual services (efficient)
Debugging	Simple (one process)	Hard (distributed tracing needed)
Deploy speed	Slow (deploy everything)	Fast (deploy one service)
Data consistency	Easy (one database)	Hard (eventual consistency, sagas)
Starting a new project	Start here	Migrate when you hit scaling/team problems

Principal-level answer: “Start with a monolith in a container. It gets you containerization benefits (reproducible builds, consistent environments, CI/CD) without microservice complexity. Extract services when you have a clear need: a component needs independent scaling, teams need independent deploy cycles, or a bounded context is well-defined. Most premature microservice decompositions create distributed monoliths that are worse than the original.”

“Managed Kubernetes vs self-managed?”

Factor	Managed (EKS/GKE/AKS)	Self-managed (kubeadm/k3s)
Control plane	Provider manages (HA, upgrades, patching)	You manage everything
Cost	$72-150/month for control plane + node costs	Only node costs
Customization	Limited (can’t modify API server flags easily)	Full control
Upgrade effort	Push-button (provider handles)	You plan and execute
Support	Provider SLA, support tickets	Community or paid support
Expertise needed	Moderate	High

Principal-level answer: “Managed K8s for 95% of cases. The control plane management alone is worth the cost — HA etcd, API server upgrades, certificate rotation. Self-managed only when you need custom API server configuration (unusual admission controllers, specific authentication), air-gapped environments, or edge deployments where cloud isn’t available.”

“When NOT to use containers?”

High-performance computing (HPC) where direct hardware access matters (GPU passthrough has overhead)
Windows-native applications that don’t containerize well (COM, registry-dependent)
Extremely latency-sensitive workloads where any kernel overhead matters (HFT)
Embedded/IoT with very limited resources (though podman/containerd work on ARM)
Regulatory environments where VM isolation is mandated by compliance

Rapid Fire Questions

Question	Key Points
ENTRYPOINT vs CMD?	ENTRYPOINT = the executable. CMD = default arguments. Together: `ENTRYPOINT ["python"]` + `CMD ["app.py"]`. CMD can be overridden at runtime.
ADD vs COPY?	COPY copies files. ADD also extracts tars and fetches URLs. Always use COPY unless you specifically need tar extraction.
Expose vs publish?	EXPOSE is documentation only. `-p` (publish) creates the actual port mapping with iptables.
Docker stop vs kill?	stop = SIGTERM → wait → SIGKILL. kill = SIGKILL immediately. Always try stop first.
Image vs container?	Image = template (read-only layers). Container = running instance (image + writable layer + process).
Swarm vs K8s?	Swarm: simpler, Docker-native, limited ecosystem. K8s: complex, massive ecosystem, industry standard.
What is kube-proxy?	Implements Service networking. Modes: iptables (default, O(n) rules) or IPVS (hash table, better at scale).
What does kubelet do?	Node agent. Ensures containers in pods are running. Manages pod lifecycle, health checks, resource reporting.
etcd?	Distributed key-value store. THE source of truth for all K8s cluster state. Back it up or lose everything.
What is a CRD?	Custom Resource Definition. Extends K8s API with custom resource types. Operators use CRDs.

Certification Exam Domains Reference

Docker Certified Associate (DCA)

Domain	Weight
Orchestration	25%
Image Creation, Management & Registry	20%
Installation & Configuration	15%
Networking	15%
Security	15%
Storage & Volumes	10%

Certified Kubernetes Administrator (CKA)

Domain	Weight
Troubleshooting	30%
Cluster Architecture, Installation & Configuration	25%
Services & Networking	20%
Workloads & Scheduling	15%
Storage	10%

Certified Kubernetes Security Specialist (CKS)

Domain	Topics
Cluster Setup	API server flags, kubelet security, audit logging
Cluster Hardening	RBAC, ServiceAccount, PSS
System Hardening	AppArmor, SELinux, seccomp
Microservice Vulnerabilities	Image scanning, pod security, secrets
Supply Chain Security	Image signing, admission controllers
Runtime Security	Falco, audit logs, monitoring

Study Recommendations

Resource	Best For
Bret Fisher’s Docker Mastery (Udemy)	Docker deep dive, endorsed by Docker Inc.
KodeKloud CKA/CKS paths	Hands-on labs with browser-based practice
“Kubernetes the Hard Way” (Kelsey Hightower)	Understanding K8s internals by building from scratch
CKA Exam Guide (GitHub)	Exam-aligned study with practice questions
Killer.sh	CKA/CKAD/CKS practice exams (realistic environment)

Interview Scenarios#

Design Questions#

“Design a CI/CD pipeline for containerized microservices”#

“Design container orchestration for a stateful application (PostgreSQL)”#

“Design a multi-tenant container platform”#

Debugging Scenarios#

“A container keeps restarting. How do you debug it?”#

“Pods can’t communicate across nodes”#

“DNS resolution is slow in the cluster”#

Trade-off Discussions#

“Monolith in a container vs microservices?”#

“Managed Kubernetes vs self-managed?”#

“When NOT to use containers?”#

Rapid Fire Questions#

Certification Exam Domains Reference#

Docker Certified Associate (DCA)#

Certified Kubernetes Administrator (CKA)#

Certified Kubernetes Security Specialist (CKS)#

Study Recommendations#

Interview Scenarios

Design Questions

“Design a CI/CD pipeline for containerized microservices”

“Design container orchestration for a stateful application (PostgreSQL)”

“Design a multi-tenant container platform”

Debugging Scenarios

“A container keeps restarting. How do you debug it?”

“Pods can’t communicate across nodes”

“DNS resolution is slow in the cluster”

Trade-off Discussions

“Monolith in a container vs microservices?”

“Managed Kubernetes vs self-managed?”

“When NOT to use containers?”

Rapid Fire Questions

Certification Exam Domains Reference

Docker Certified Associate (DCA)

Certified Kubernetes Administrator (CKA)

Certified Kubernetes Security Specialist (CKS)

Study Recommendations