Kubernetes pod in CrashLoopBackOff

What this failure means

A Kubernetes pod is stuck in CrashLoopBackOff: the container starts, exits with a non-zero code, and Kubernetes restarts it with exponential back-off. The root cause is almost always inside the container itself.

Symptoms

Faultline looks for one or more of these log fragments:

CrashLoopBackOff
Back-off restarting failed container
Error: CrashLoopBackOff
Restarting failed container
Reason: CrashLoopBackOff

Diagnosis

Fix steps

Read the container’s last log output before it exited: kubectl logs <pod> --previous.
Describe the pod for exit code and reason: kubectl describe pod <pod> -n <namespace> — look at Last State under the container section.
Exit code 1 = the application crashed with an error; exit code 137 = OOM killed; exit code 126/127 = entrypoint not found or not executable.
For missing configuration: check that all required ConfigMaps, Secrets, and volume mounts exist and have the expected keys.
For application startup errors: reproduce with docker run --env-file .env <image> to isolate the failure from Kubernetes.
For OOM: increase the container’s resources.limits.memory or reduce the application’s memory footprint.
Force-trigger a log dump by temporarily patching the command to sleep: kubectl patch deployment <name> --patch '{"spec":{"template":{"spec":{"containers":[{"name":"<n>","command":["sleep","3600"]}]}}}}' then exec in to inspect.

Validation

kubectl get pod -n -w
kubectl logs -n

Why it matters

CrashLoopBackOff is the most common Kubernetes failure mode after a bad deploy. Kubernetes will restart the container up to a back-off ceiling (~5 minutes between restarts), meaning the service is unavailable or degraded for that entire window. Root causes include: missing environment variables or secrets, a misconfigured entrypoint, an application startup error, OOMKill from an undersized memory limit, or an incompatible image for the node architecture.

Prevention

Add a Kubernetes readiness probe that starts slow (with initialDelaySeconds) to give the application time to initialise.
Test the container image locally with docker run before promoting to a cluster.
Set resources.requests and resources.limits based on profiled memory usage, not guesses.
Watch event/restart history with kubectl get events --sort-by=.lastTimestamp -n <namespace> after each deploy.

Try it locally

kubectl logs <pod> -n <namespace> --previous
kubectl describe pod <pod> -n <namespace>
kubectl get pod <pod> -n <namespace>
kubectl logs <pod> -n <namespace>

How Faultline detects it

Use faultline explain k8s-crashloopbackoff to see the full playbook.

faultline analyze build.log
faultline explain k8s-crashloopbackoff

Generated from playbooks/bundled/log/deploy/k8s-crashloopbackoff.yaml. Do not edit directly.

Try it on your own failed log

$ faultline analyze failed.log

Install Faultline CLI View on GitHub

Want this across every CI run? Faultline Teams tracks recurring failures across all your repos and surfaces patterns in a shared dashboard.