Troubleshooting Failed Cron Jobs: A Debugging Guide

Your monitoring alerts you that a cron job failed. Now what? The job ran fine for months, you haven't changed anything, but suddenly it's not working. Finding the root cause requires systematic investigation.

This guide covers the most common cron job failures and how to diagnose them.

Step 1: Verify the Job Actually Ran

Before debugging the job itself, confirm whether it ran at all. A job that didn't execute has different causes than a job that ran and failed.

Check the system cron logs:

1# Debian/Ubuntu
2grep CRON /var/log/syslog | tail -50
3
4# CentOS/RHEL
5grep CRON /var/log/cron | tail -50
6
7# macOS
8log show --predicate 'process == "cron"' --last 1h

Look for your job's command. If it's not in the logs, the job never started. Causes include:

Crontab syntax error - A typo prevents all jobs from running
Cron service not running - Check with systemctl status cron
User doesn't have cron access - Check /etc/cron.allow and /etc/cron.deny

If the job did run, move to the next step.

Step 2: Check the Exit Code

Every command exits with a code. Zero means success, non-zero means failure. If your job logged its exit code (it should), check what it was:

1# Common exit codes
2# 0 - Success
3# 1 - General error
4# 2 - Misuse of shell command
5# 126 - Command not executable
6# 127 - Command not found
7# 128+n - Killed by signal n (e.g., 137 = killed by SIGKILL)

Exit code 137 (128 + 9) often means the job was killed by the OOM killer or exceeded a memory limit. Exit code 143 (128 + 15) means it received SIGTERM - often from a timeout.

Step 3: Run the Command Manually

The most effective debugging step: run the exact cron command as the cron user.

1# Switch to the user that runs the cron job
2sudo -u cronuser bash
3
4# Set the same minimal environment cron uses
5env -i HOME=$HOME SHELL=/bin/bash PATH=/usr/bin:/bin /path/to/your/script.sh

This surfaces problems that only appear in cron's minimal environment:

Missing PATH entries
Missing environment variables
Permission issues for the cron user

Watch for error messages that wouldn't appear when running as yourself.

Step 4: Check Recent System Changes

Jobs that worked for months don't break randomly. Something changed. Common culprits:

Package Updates

1# Debian/Ubuntu - check recent package changes
2grep " install\| upgrade" /var/log/dpkg.log | tail -20
3
4# CentOS/RHEL
5yum history list

A Python upgrade can break scripts. A database client update can change behavior. An SSL certificate update can break API calls.

Configuration Management

If you use Ansible, Puppet, Chef, or similar tools, check recent runs:

1# Example: Ansible log
2tail -100 /var/log/ansible.log
3
4# Check if crontab was modified
5ls -la /var/spool/cron/crontabs/

Config management can overwrite environment files, change permissions, or modify crontabs.

Disk Space

1df -h

Full disks cause strange failures. Logs can't write, temp files can't create, databases can't operate.

Memory Pressure

1dmesg | grep -i "out of memory" | tail -10

The OOM killer terminates processes when memory runs out. Cron jobs are often victims because they run briefly and aren't "protected."

Step 5: Examine Dependencies

Cron jobs rarely work in isolation. They connect to databases, call APIs, read files, and write output. Each dependency can fail.

Database Connectivity

1# Test database connection
2mysql -u user -p -h hostname -e "SELECT 1"
3# or
4psql -h hostname -U user -d database -c "SELECT 1"

Database issues: connection limits reached, password changed, firewall rules modified.

Network and API Access

1# Test network connectivity
2curl -I https://api.example.com

API issues: SSL certificate expired, endpoint changed, rate limits hit, authentication expired.

File System Access

1# Check file permissions
2ls -la /path/to/required/files
3ls -la /path/to/output/directory
4
5# Check if directories exist
6test -d /expected/directory && echo "exists" || echo "missing"

File issues: permission changes, directory removed, mount point not mounted.

Step 6: Check Resource Limits

Cron jobs run with system-imposed limits that can cause silent failures.

Open File Limits

1# Check current limits
2ulimit -a
3
4# Check system-wide limits for the cron user
5grep cronuser /etc/security/limits.conf

Jobs that open many files or network connections can hit limits.

Memory Limits

If using systemd, check for memory limits:

1systemctl show cron | grep Memory

Container environments (Docker, Kubernetes) impose their own limits.

Step 7: Review Recent Code Changes

If the job's script changed recently, review the diff:

1git log --oneline -10 /path/to/script.sh
2git diff HEAD~1 /path/to/script.sh

Common code problems:

New dependency not installed in production
Hard-coded path that doesn't exist in production
Environment variable expected but not set

Common Failure Patterns

The "Works On My Machine" Failure

Symptom: Script runs fine manually but fails in cron

Cause: Different environment. Cron has minimal PATH, no shell profile, different working directory.

Fix: Use absolute paths for everything. Explicitly set required environment variables. Don't rely on shell aliases or functions.

The Midnight Failure

Symptom: Job fails specifically at midnight or day boundaries

Cause: Date arithmetic issues. Logs rolling over. Backup processes competing for resources.

Fix: Stagger jobs away from the hour. Check date handling in scripts. Review what else runs at the same time.

The "First of the Month" Failure

Symptom: Monthly job fails but weekly/daily jobs work fine

Cause: Often data-related. Monthly aggregations process more data. Reports cover different date ranges. More concurrent users during business hours.

Fix: Check for timeouts. Increase resource allocations for monthly jobs. Verify the job can handle full month's data.

The Gradual Degradation

Symptom: Job gets slower over time, eventually times out

Cause: Growing data. Missing database indexes. Accumulating temp files. Log files getting huge.

Fix: Add monitoring for job duration, not just success/failure. Set up alerts for jobs that take longer than expected.

Building a Debugging Checklist

When a job fails, work through this list:

Did the job run? (Check cron logs)
What was the exit code? (Check job logs)
Can you reproduce manually? (Run as cron user)
What changed recently? (System, config, code)
Are dependencies working? (Database, APIs, files)
Are resources available? (Disk, memory, limits)

Document what you find. The next failure might have the same cause, and future-you will thank present-you for the notes.

Preventing Future Failures

Once you've fixed the immediate issue, prevent recurrence:

Add more logging to capture details that would have helped
Add monitoring for the failure mode you discovered
Document the fix so others can find it
Consider automation to detect the condition before it causes failure

Most cron job failures are preventable with proper monitoring and logging. Set up monitoring with Cronzy to catch failures before they become incidents.

Troubleshooting Failed Cron Jobs: A Debugging Guide

Troubleshooting Failed Cron Jobs: A Debugging Guide

Step 1: Verify the Job Actually Ran

Step 2: Check the Exit Code

Step 3: Run the Command Manually

Step 4: Check Recent System Changes

Package Updates

Configuration Management

Disk Space

Memory Pressure

Step 5: Examine Dependencies

Database Connectivity

Network and API Access

File System Access

Step 6: Check Resource Limits

Open File Limits

Memory Limits

Step 7: Review Recent Code Changes

Common Failure Patterns

The "Works On My Machine" Failure

The Midnight Failure

The "First of the Month" Failure

The Gradual Degradation

Building a Debugging Checklist

Preventing Future Failures

Related Articles

Best Practices for Reliable Cron Jobs

Getting Started with Cron Job Monitoring

Managing Cron Job Monitoring for Teams