Best Practices for Reliable Cron Jobs
Cron jobs appear simple on the surface. Write a script, add a line to crontab, and walk away. But production systems tell a different story. Jobs hang indefinitely, overlap with themselves, fail without anyone noticing, or break in ways that take days to discover.
After seeing patterns across thousands of monitored jobs, here are the practices that separate reliable cron jobs from the ones that wake you up at 3 AM.
1. Always Log Exit Codes
The most common cron failure mode is silent failure. Your job runs, encounters an error, and exits without anyone knowing. By the time you notice, the damage is done.
Bad approach - no visibility:
10 2 * * * /path/to/backup.shBetter approach - log every execution:
1#!/bin/bash2/path/to/script.sh3EXIT_CODE=$?4echo "$(date '+%Y-%m-%d %H:%M:%S'): Script exited with code $EXIT_CODE" >> /var/log/myjob.log5exit $EXIT_CODENow you have a timestamped record of every execution and whether it succeeded (exit code 0) or failed (non-zero). When something goes wrong, you can trace back to exactly when it started.
For important jobs, consider structured logging with JSON output that can be parsed by log aggregation tools.
2. Use Locking to Prevent Overlap
What happens when your job runs longer than expected? If it takes 2 hours but runs every hour, you get overlapping executions. Two processes fight over the same files, corrupt data, or consume resources until the server crashes.
The solution is locking. The flock command provides a clean way to ensure only one instance runs:
1flock -n /tmp/myjob.lock /path/to/script.sh || echo "Job already running, skipping"The -n flag means non-blocking - if the lock is held, exit immediately instead of waiting. This prevents queue buildup when a job gets stuck.
For critical jobs, you might want to alert on lock contention rather than silently skipping:
1flock -n /tmp/myjob.lock /path/to/script.sh || (echo "Lock held - possible stuck job" | mail -s "Cron Lock Alert" [email protected])3. Set Appropriate Timeouts
Jobs that hang indefinitely are worse than jobs that fail. A failed job triggers alerts. A hanging job consumes resources silently and might delay dependent jobs.
Use the timeout command to enforce a maximum runtime:
1timeout 3600 /path/to/long-running-job.shThis kills the job after 1 hour (3600 seconds). Choose a timeout that's comfortably longer than normal execution but short enough to catch problems quickly.
For jobs that need cleanup on timeout, use the --signal flag to send a catchable signal first:
1timeout --signal=TERM --kill-after=60 3600 /path/to/job.shThis sends SIGTERM first, giving your script 60 seconds to clean up before SIGKILL.
4. Handle Errors Gracefully
Bash scripts often continue after errors, producing corrupt output or partial results. Use set -e to exit immediately on any error:
1#!/bin/bash2set -e3set -o pipefail4
5# Script exits on first error6process_data7transform_results8upload_to_serverThe pipefail option ensures errors in pipelines aren't masked. Without it, broken_command | sort exits 0 even if the first command fails.
For cleanup on errors, use a trap:
1#!/bin/bash2set -e3set -o pipefail4
5cleanup() {6 rm -f /tmp/workfile.$$7 echo "Cleanup completed"8}9trap cleanup EXIT10
11# Your script here12create_workfile13process_workfile14upload_resultsThe cleanup function runs whether the script succeeds or fails, preventing orphaned temporary files.
5. Monitor Everything
Here's the uncomfortable truth: you don't know if your cron jobs are working unless you're actively watching them. "No errors in the logs" doesn't mean success - it might mean the job never ran at all.
External monitoring catches the failures that logging misses:
- Crontab gets corrupted during a config management run
- Server reboots and cron service doesn't start
- Disk fills up and the job can't write its output
- Network issue prevents the job from reaching its target
With monitoring in place, you get alerts when jobs miss their expected schedule. Cronzy's grace period feature handles timing variations automatically - your daily job that takes 5-10 minutes won't trigger false alarms.
The pattern is simple: ping on success, get alerted on silence. It takes 30 seconds to add and catches failures that would otherwise go unnoticed for days.
6. Use Full Paths
Cron runs with a minimal environment. The PATH is stripped down, environment variables from your shell profile don't exist, and relative paths resolve from a different working directory.
This job works in your terminal:
1python script.pyThis version works in cron:
1/usr/bin/python3 /home/user/scripts/script.pyAlways use absolute paths for:
- The interpreter (
/usr/bin/python3, notpython) - The script itself
- Any files the script reads or writes
- Any commands the script calls
If your script relies on environment variables, source them explicitly:
10 2 * * * source /home/user/.env && /usr/bin/python3 /home/user/scripts/job.py7. Test in Production (Safely)
Scripts that work perfectly on your laptop fail in production. Different package versions, different paths, different permissions, different network access. You need to test in the actual environment.
But "test in production" doesn't mean "push and pray." Here's a safer approach:
Start with short intervals. Run hourly instead of daily while testing. You'll catch failures faster and iterate more quickly.
Run manually first. SSH into the production server and execute the exact command from crontab. Watch for errors, check the output, verify the result.
Have a rollback plan. Know how to disable the job quickly if something goes wrong. Keep the previous version of your script available.
Verify the result. Don't just check that the job ran - check that it produced the correct output. Does the backup file exist and have the right size? Did the data sync correctly?
8. Document Your Jobs
Six months from now, someone will look at your crontab and wonder what each job does. That someone might be you.
Add comments directly in crontab:
1# Database backup - runs daily at 2 AM, uploads to S32# Owner: [email protected]3# Last updated: 2026-01-1540 2 * * * /opt/scripts/db_backup.sh && curl -s https://api.cronzy.io/ping/abc123 > /dev/nullFor systems with many jobs, maintain a separate document listing:
- What each job does
- Who owns it
- What happens if it fails
- Dependencies on other jobs
- How to manually run or verify it
This documentation saves hours of investigation when something breaks.
The Defense-in-Depth Approach
No single practice makes cron jobs reliable. Reliability comes from layers:
- Logging catches problems during development
- Locking prevents resource contention
- Timeouts ensure jobs don't hang forever
- Error handling makes failures visible
- Monitoring catches everything else
Each layer catches failures the others miss. A job might have perfect error handling but never run because crontab got corrupted. Monitoring catches that. A job might run but hang indefinitely. Timeouts catch that.
Start with monitoring - it's the safety net that makes all other practices visible. When your jobs are monitored, you can see which ones need better error handling, which ones run too long, and which ones fail silently.
Start monitoring your cron jobs with Cronzy and get visibility into what's actually happening with your automated systems.