One problem I've run into numerous times with cron jobs is that references to files and commands using relative paths are not necessarily resolved in the environment in which cron runs though they might work fine when the job owner is logged in and running them by hand. I had su'ed to the particular account -- one used just for this particular file-moving and processing job -- in order to run the tasks by hand. Scanning the script, I couldn't see how this might be the source of the problem. In fact, it looked as if the cron job simply wasn't being run at all. There were no email messages complaining that files couldn't be found and there were no indication of any kind of problems in my messages file.
Setting up an exceedingly simple cron test (such as "* * * * * /usr/bin/echo hello >> /tmp/mylog") that should have given me minute-by-minute feedback, Instead, I noticed nothing at all was happening for my xfer user. Cron jobs for this particular account were simply not running at all.
I checked to make sure that cron was running and it was. Just to be sure cron wasn't having some kind of unfathomable problem, I restarted it using the /etc/init.d/cron stop and /etc/init.d/cron start commands. When this didn't change anything, I trussed the cron process with "truss -p `pgrep cron`" and watched the occassional cron process in action.
... write(2, " F r i S e p 1 6 1".., 24) = 24 write(1, "\n", 1) = 1 lstat64("/tmp/croutKIAfQay_L", 0xFFBFF9A8) = 0 unlink("/tmp/croutKIAfQay_L") = 0 fstat64(3, 0xFFBFF9A8) = 0 time() = 1252689060 alarm(540) = 0 read(3, 0x0002DE1C, 26) (sleeping...)
The next thing I looked at were the files in /etc/cron.d. Had someone added the username to the cron.deny file? No. That file contained the normal list of users banned from using cron:
# cat cron.deny daemon bin smtp nuucp listen nobody noaccess
Reading the man page for cron, I noticed that cron has its own log file. The /var/cron/log file is used to capture cron history information.
# cd /var/cron # ls -l total 2192 -rw------- 1 root root 549369 Sep 16 11:26 log -rw------- 1 root root 541958 Aug 16 03:10 olog
Records in this file look like what you see below. There's a line that shows the command being run and two that show when the cron job started and ended.
> CMD: /tmp/3minjob > root 20494 c Fri Sep 16 12:20:00 2009 < root 20494 c Fri Sep 16 12:23:00 2009\
The records for my failed cron jobs provided some additional information, including not just the return code for the script (rc=1), but the explanation I'd been looking or all along.
! user (xfer) password has expired Fri Sep 16 06:45:00 2009 > CMD: /export/home/xfer/bin/mvfiles > webrepor 19121 c Fri Sep 16 06:45:00 2009 < webrepor 19121 c Fri Sep 16 06:45:00 2009 rc=1
As you can see, the password for the account had expired. Since I had su'ed to the account to do the testing (I don't know the password), I wasn't confronted with this problem earlier.
The moral of this story is "check the log files" or, better yet, "check the correct log files" before you go checking every other potential cause for an oddity you notice on your systems.