Interview Scenarios
Interview Scenarios
What interviewers actually ask at senior/principal DevOps level. Not trivia — design questions, debugging scenarios, and trade-off discussions that test depth of understanding.
Design Questions
“Design a CI/CD pipeline for containerized microservices”
Expected answer structure:
Source → Build → Test → Scan → Push → Deploy → Verify
| Stage | What Happens | Tools |
|---|---|---|
| Source | PR triggers pipeline, lint Dockerfiles | GitHub Actions, GitLab CI |
| Build | Multi-stage Docker build, BuildKit cache | Docker/Kaniko/BuildKit |
| Test | Unit tests in container, integration tests with Compose | pytest, Jest in containers |
| Scan | Image vulnerability scan, SBOM generation | Trivy, Docker Scout, Syft |
| Push | Tag with git SHA + semver, push to registry | ECR, GCR, Harbor |
| Deploy | GitOps sync (ArgoCD) or rolling deploy | ArgoCD, Helm, Kustomize |
| Verify | Smoke tests, canary analysis, auto-rollback | Argo Rollouts, Flagger |
Key points to mention:
- Image tags: use git SHA (immutable), never
:latestin production - Build cache: mount cache layers for npm/pip to speed up builds
- Security gate: fail pipeline if HIGH/CRITICAL CVEs found
- Secrets: never in Dockerfile, use CI secrets manager + BuildKit secret mounts
- Multi-arch: build for amd64 + arm64 if targeting Graviton
“Design container orchestration for a stateful application (PostgreSQL)”
Expected answer:
StatefulSet + PVC (Retain policy) + Headless Service
| Component | Why |
|---|---|
| StatefulSet (not Deployment) | Stable pod identity, ordered create/delete, stable storage |
| PVC per replica with Retain policy | Data survives pod rescheduling, no accidental deletion |
| Headless Service | Direct DNS to each pod: postgres-0.postgres-svc |
| Anti-affinity | Spread replicas across nodes/AZs |
| Resource requests = limits | Guaranteed QoS, predictable performance |
| Backup CronJob | pg_dump to S3 + volume snapshots |
Bonus points:
- Mention operators (CloudNativePG, Zalando Postgres Operator) for production databases
- Discuss when NOT to containerize databases (very high I/O, regulatory requirements)
- Mention that most teams use managed databases (RDS, Cloud SQL) and only containerize stateless services
Interview tip: If asked “should you run databases in containers?” — the answer is nuanced. For development and testing, always. For production, it depends on team expertise, I/O requirements, and operational maturity. Managed services (RDS) are simpler but more expensive. Operators (CloudNativePG) make it viable but require K8s expertise.
“Design a multi-tenant container platform”
Expected answer covers:
| Layer | How to Isolate |
|---|---|
| Namespace | One namespace per tenant |
| Network | NetworkPolicy: deny all, allow only within tenant namespace |
| Resources | ResourceQuotas per namespace (CPU, memory, pods, PVCs) |
| RBAC | Per-tenant ServiceAccount, scoped to their namespace |
| Runtime | gVisor RuntimeClass for untrusted workloads |
| Storage | Per-tenant StorageClass with quotas |
| Registry | Per-tenant image repository or admission control for allowed images |
| Observability | Per-tenant labels, Prometheus multi-tenancy, Loki tenant ID |
Key considerations to discuss:
- Namespace-level isolation is NOT as strong as separate clusters. Kernel is shared.
- For strict isolation (financial services, healthcare), consider separate clusters per tenant
- Cost allocation: use labels + Kubecost for per-tenant billing
- Noisy neighbor: ResourceQuotas prevent one tenant from consuming all resources
Debugging Scenarios
“A container keeps restarting. How do you debug it?”
Step-by-step approach:
# 1. Check the restart reason
kubectl describe pod <pod>
# Look for: OOMKilled, CrashLoopBackOff, Error, exit code
# 2. Check logs from the crashed container
kubectl logs <pod> --previous
# 3. Check events
kubectl get events --sort-by=.metadata.creationTimestamp | grep <pod>
# 4. Common causes by exit code:
| Exit Code | Meaning | Common Cause |
|---|---|---|
| 0 | Success | App finished (shouldn’t happen for long-running) — check CMD |
| 1 | Application error | Unhandled exception, missing config |
| 137 | SIGKILL (128 + 9) | OOM killed or kubectl delete --force |
| 139 | SIGSEGV (128 + 11) | Segmentation fault — native code bug |
| 143 | SIGTERM (128 + 15) | Graceful shutdown (normal for pod deletion) |
Key insight: Exit code 137 is almost always OOM kill. Check
kubectl describe podforReason: OOMKilled. Fix: increase memory limit or fix the memory leak. Don’t just increase the limit blindly — profile the app first.
“Pods can’t communicate across nodes”
Debugging checklist:
# 1. Can pods on the SAME node communicate?
kubectl exec pod-on-node-a -- curl pod-on-node-a-ip:port
# If yes → cross-node networking issue
# If no → CNI plugin broken on that node
# 2. Check CNI pods are running
kubectl get pods -n kube-system | grep -E 'calico|cilium|flannel|weave'
# Any CrashLoopBackOff? Restart them.
# 3. Check node firewall rules
# VXLAN (Flannel, Calico VXLAN): UDP 4789
# Calico BGP: TCP 179
# WireGuard (Calico, Cilium): UDP 51820
# 4. Check kube-proxy mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# iptables or IPVS — check respective rules on nodes
# 5. Check node-to-node connectivity
kubectl debug node/<node> -it --image=busybox -- ping <other-node-ip>
“DNS resolution is slow in the cluster”
Root cause is usually the ndots:5 default:
# Default /etc/resolv.conf in a K8s pod:
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
A lookup for api.example.com (4 dots including implicit trailing) triggers:
api.example.com.default.svc.cluster.local→ NXDOMAINapi.example.com.svc.cluster.local→ NXDOMAINapi.example.com.cluster.local→ NXDOMAINapi.example.com→ SUCCESS
Four unnecessary DNS queries before the real one. At scale, this hammers CoreDNS.
Fixes:
# Option 1: Reduce ndots in pod spec
dnsConfig:
options:
- name: ndots
value: "2"
# Option 2: Use FQDN with trailing dot (bypasses search domains)
# curl http://api.example.com. ← trailing dot = absolute, no search
# Option 3: Scale CoreDNS
# kubectl scale deployment coredns -n kube-system --replicas=5
# Option 4: NodeLocal DNS Cache (DaemonSet that caches DNS on each node)
Trade-off Discussions
“Monolith in a container vs microservices?”
| Factor | Monolith Container | Microservices |
|---|---|---|
| Complexity | Low — one Dockerfile, one deploy | High — many services, dependencies, network |
| Team size | Small team (<10 devs) | Large team (>10 devs, multiple squads) |
| Scaling | Scale entire app (wasteful) | Scale individual services (efficient) |
| Debugging | Simple (one process) | Hard (distributed tracing needed) |
| Deploy speed | Slow (deploy everything) | Fast (deploy one service) |
| Data consistency | Easy (one database) | Hard (eventual consistency, sagas) |
| Starting a new project | Start here | Migrate when you hit scaling/team problems |
Principal-level answer: “Start with a monolith in a container. It gets you containerization benefits (reproducible builds, consistent environments, CI/CD) without microservice complexity. Extract services when you have a clear need: a component needs independent scaling, teams need independent deploy cycles, or a bounded context is well-defined. Most premature microservice decompositions create distributed monoliths that are worse than the original.”
“Managed Kubernetes vs self-managed?”
| Factor | Managed (EKS/GKE/AKS) | Self-managed (kubeadm/k3s) |
|---|---|---|
| Control plane | Provider manages (HA, upgrades, patching) | You manage everything |
| Cost | $72-150/month for control plane + node costs | Only node costs |
| Customization | Limited (can’t modify API server flags easily) | Full control |
| Upgrade effort | Push-button (provider handles) | You plan and execute |
| Support | Provider SLA, support tickets | Community or paid support |
| Expertise needed | Moderate | High |
Principal-level answer: “Managed K8s for 95% of cases. The control plane management alone is worth the cost — HA etcd, API server upgrades, certificate rotation. Self-managed only when you need custom API server configuration (unusual admission controllers, specific authentication), air-gapped environments, or edge deployments where cloud isn’t available.”
“When NOT to use containers?”
- High-performance computing (HPC) where direct hardware access matters (GPU passthrough has overhead)
- Windows-native applications that don’t containerize well (COM, registry-dependent)
- Extremely latency-sensitive workloads where any kernel overhead matters (HFT)
- Embedded/IoT with very limited resources (though podman/containerd work on ARM)
- Regulatory environments where VM isolation is mandated by compliance
Rapid Fire Questions
| Question | Key Points |
|---|---|
| ENTRYPOINT vs CMD? | ENTRYPOINT = the executable. CMD = default arguments. Together: ENTRYPOINT ["python"] + CMD ["app.py"]. CMD can be overridden at runtime. |
| ADD vs COPY? | COPY copies files. ADD also extracts tars and fetches URLs. Always use COPY unless you specifically need tar extraction. |
| Expose vs publish? | EXPOSE is documentation only. -p (publish) creates the actual port mapping with iptables. |
| Docker stop vs kill? | stop = SIGTERM → wait → SIGKILL. kill = SIGKILL immediately. Always try stop first. |
| Image vs container? | Image = template (read-only layers). Container = running instance (image + writable layer + process). |
| Swarm vs K8s? | Swarm: simpler, Docker-native, limited ecosystem. K8s: complex, massive ecosystem, industry standard. |
| What is kube-proxy? | Implements Service networking. Modes: iptables (default, O(n) rules) or IPVS (hash table, better at scale). |
| What does kubelet do? | Node agent. Ensures containers in pods are running. Manages pod lifecycle, health checks, resource reporting. |
| etcd? | Distributed key-value store. THE source of truth for all K8s cluster state. Back it up or lose everything. |
| What is a CRD? | Custom Resource Definition. Extends K8s API with custom resource types. Operators use CRDs. |
Certification Exam Domains Reference
Docker Certified Associate (DCA)
| Domain | Weight |
|---|---|
| Orchestration | 25% |
| Image Creation, Management & Registry | 20% |
| Installation & Configuration | 15% |
| Networking | 15% |
| Security | 15% |
| Storage & Volumes | 10% |
Certified Kubernetes Administrator (CKA)
| Domain | Weight |
|---|---|
| Troubleshooting | 30% |
| Cluster Architecture, Installation & Configuration | 25% |
| Services & Networking | 20% |
| Workloads & Scheduling | 15% |
| Storage | 10% |
Certified Kubernetes Security Specialist (CKS)
| Domain | Topics |
|---|---|
| Cluster Setup | API server flags, kubelet security, audit logging |
| Cluster Hardening | RBAC, ServiceAccount, PSS |
| System Hardening | AppArmor, SELinux, seccomp |
| Microservice Vulnerabilities | Image scanning, pod security, secrets |
| Supply Chain Security | Image signing, admission controllers |
| Runtime Security | Falco, audit logs, monitoring |
Study Recommendations
| Resource | Best For |
|---|---|
| Bret Fisher’s Docker Mastery (Udemy) | Docker deep dive, endorsed by Docker Inc. |
| KodeKloud CKA/CKS paths | Hands-on labs with browser-based practice |
| “Kubernetes the Hard Way” (Kelsey Hightower) | Understanding K8s internals by building from scratch |
| CKA Exam Guide (GitHub) | Exam-aligned study with practice questions |
| Killer.sh | CKA/CKAD/CKS practice exams (realistic environment) |