Matched signals
- CrashLoopBackOff
- Back-off restarting failed container
- Error: CrashLoopBackOff
- Restarting failed container
- Reason: CrashLoopBackOff
Kubernetes pod in CrashLoopBackOff
What this failure means
A Kubernetes pod is stuck in CrashLoopBackOff: the container starts, exits with a non-zero code, and Kubernetes restarts it with exponential back-off. The root cause is almost always inside the container itself.
Symptoms
Faultline looks for one or more of these log fragments:
CrashLoopBackOff
Back-off restarting failed container
Error: CrashLoopBackOff
Restarting failed container
Reason: CrashLoopBackOff
Diagnosis
A Kubernetes pod is stuck in CrashLoopBackOff: the container starts, exits with a non-zero code, and Kubernetes restarts it with exponential back-off. The root cause is almost always inside the container itself.
Fix steps
- Read the container’s last log output before it exited:
kubectl logs <pod> --previous. - Describe the pod for exit code and reason:
kubectl describe pod <pod> -n <namespace>— look atLast Stateunder the container section. - Exit code 1 = the application crashed with an error; exit code 137 = OOM killed; exit code 126/127 = entrypoint not found or not executable.
- For missing configuration: check that all required
ConfigMaps,Secrets, and volume mounts exist and have the expected keys. - For application startup errors: reproduce with
docker run --env-file .env <image>to isolate the failure from Kubernetes. - For OOM: increase the container’s
resources.limits.memoryor reduce the application’s memory footprint. - Force-trigger a log dump by temporarily patching the command to sleep:
kubectl patch deployment <name> --patch '{"spec":{"template":{"spec":{"containers":[{"name":"<n>","command":["sleep","3600"]}]}}}}'then exec in to inspect.
Validation
- kubectl get pod
-n -w - kubectl logs
-n
Why it matters
CrashLoopBackOff is the most common Kubernetes failure mode after a bad deploy. Kubernetes will restart the container up to a back-off ceiling (~5 minutes between restarts), meaning the service is unavailable or degraded for that entire window. Root causes include: missing environment variables or secrets, a misconfigured entrypoint, an application startup error, OOMKill from an undersized memory limit, or an incompatible image for the node architecture.
Prevention
- Add a Kubernetes readiness probe that starts slow (with
initialDelaySeconds) to give the application time to initialise. - Test the container image locally with
docker runbefore promoting to a cluster. - Set
resources.requestsandresources.limitsbased on profiled memory usage, not guesses. - Watch event/restart history with
kubectl get events --sort-by=.lastTimestamp -n <namespace>after each deploy.
Try it locally
kubectl logs <pod> -n <namespace> --previous
kubectl describe pod <pod> -n <namespace>
kubectl get pod <pod> -n <namespace>
kubectl logs <pod> -n <namespace>
How Faultline detects it
Use faultline explain k8s-crashloopbackoff to see the full playbook.
faultline analyze build.log
faultline explain k8s-crashloopbackoff
Generated from playbooks/bundled/log/deploy/k8s-crashloopbackoff.yaml. Do not edit directly.