- Python Automation Cookbook
- Jaime Buelta
- 189字
- 2021-06-30 14:52:49
Setting up a cron job
Cron is an old-fashioned but reliable way of executing commands. It has been around since the 1970s in Unix, and it's an old favorite in system administration to perform maintenance tasks such as freeing up disk space, rotating log files, making backups, and other common, repetitive operations.
This recipe is Unix and Unix-like operating systems specific, so it will work in Linux and macOS. While it's possible to schedule a task in Windows, it's very different and uses Task Scheduler, which won't be described here. If you have access to a Linux server, this can be a good way of scheduling periodic tasks.
The main advantages are as follows:
- It's present in virtually all Unix or Linux systems and configured to run automatically.
- It's easy to use, although a little deceptive at first.
- It's well known. Almost anyone involved with admin tasks will have a general idea of how to use it.
- It allows for easy periodic commands, with good precision.
However, it also has some disadvantages, including the following:
- By default, it may not give much feedback. Retrieving the output, logging execution, and errors are critical.
- The task should be as self-contained as possible to avoid problems with environment variables, such as using the wrong Python interpreter, or what path should execute.
- It is Unix-specific.
- Only fixed periodic times are available.
- It doesn't control how many tasks run at the same time. Each time the countdown goes off, it creates a new task. For example, a task that takes 1 hour to complete, and that is scheduled to run once every 45 minutes, will have 15 minutes of overlap where two tasks will be running.
Don't understate the latest effect. Running multiple expensive tasks at the same time can have a bad effect on performance. Having expensive tasks overlapping may result in a race condition where each task stops the others from ever finishing! Allow ample time for your tasks to finish and keep an eye on them. Keep in mind that any other program running in the same host may have their performance affected, which can include any service, such as web servers, databases, and email. Check how loaded the host where the task will execute is so as to avoid surprises.
Getting ready
We will produce a script, called cron.py
:
import argparse
import sys
from datetime import datetime
import configparser
def main(number, other_number, output):
result = number * other_number
print(f'[{datetime.utcnow().isoformat()}] The result is {result}',
file=output)
if __name__ == '__main__':
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--config', '-c', type=argparse.FileType('r'),
help='config file',
default='/etc/automate.ini')
parser.add_argument('-o', dest='output', type=argparse.FileType('w'),
help='output file',
default=sys.stdout)
args = parser.parse_args()
if args.config:
config = configparser.ConfigParser()
config.read_file(args.config)
# Transforming values into integers
args.n1 = int(config['ARGUMENTS']['n1'])
args.n2 = int(config['ARGUMENTS']['n2'])
main(args.n1, args.n2, args.output)
Note the following details:
- The config file is, by default,
/etc/automate.ini
. Reuseconfig.ini
from the previous recipe. - A timestamp has been added to the output. This will make it explicit when the task is run.
- The result is being added to the file, as shown with the
a
mode where the file is open. - The
ArgumentDefaultsHelpFormatter
parameter automatically adds information about default values when printing the help using the-h
argument.
Check that the task is producing the expected result and that you can log to a known file:
$ python3 cron.py
[2020-01-15 22:22:31.436912] The result is 35
$ python3 cron.py -o /path/automate.log
$ cat /path/automate.log
[2020-01-15 22:28:08.833272] The result is 35
How to do it...
- Obtain the full path of the Python interpreter. This is the interpreter that's in your virtual environment:
$ which python /your/path/.venv/bin/python
- Prepare the cron job to be executed. Get the full path and check that it can be executed without any problems. Execute it a couple of times:
$ /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log $ /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log
- Check that the result is being added correctly to the result file:
$ cat /path/automate.log [2020-01-15 22:28:08.833272] The result is 35 [2020-01-15 22:28:10.510743] The result is 35
- Edit the crontab file to run the task once every 5 minutes:
$ crontab -e */5 * * * * /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log
Note that this opens an editing terminal with your default command-line editor.
If you haven't set up your default command-line editor, then, by default, it is likely to be Vim. This can be disconcerting if you don't have experience with Vim. Press I to start inserting text and Esc when you're done. Then, exit after saving the file with
wq
. For more information about Vim, refer to this introduction: https://null-byte.wonderhowto.com/how-to/intro-vim-unix-text-editor-every-hacker-should-be-familiar-with-0174674.For information on how to change the default command-line editor, refer to the following link: https://www.a2hosting.com/kb/developer-corner/linux/setting-the-default-text-editor-in-linux.
- Check the crontab contents. Note that this displays the crontab contents, but doesn't set it to edit:
$ contab -l */5 * * * * /your/path/.venv/bin/python /your/path/cron.py -o /path/automate.log
- Wait and check the result file to see how the task is being executed:
$ tail -F /path/automate.log [2020-01-17 21:20:00.611540] The result is 35 [2020-01-17 21:25:01.174835] The result is 35 [2020-01-17 21:30:00.886452] The result is 35
How it works...
The crontab line consists of a line describing how often to run the task (the first six elements), plus the task. Each of the initial six elements means a different unit of time to execute. Most of them are stars, meaning any:
* * * * * *
| | | | | |
| | | | | +-- Year (range: 1900-3000)
| | | | +---- Day of the Week (range: 1-7, 1 standing for Monday)
| | | +------ Month of the Year (range: 1-12)
| | +-------- Day of the Month (range: 1-31)
| +---------- Hour (range: 0-23)
+------------ Minute (range: 0-59)
Therefore, our line, */5 * * * * *
, means every time the minute is divisible by 5, in all hours, all days... all years.
Here are some examples:
30 15 * * * * means "every day at 15:30"
30 * * * * * means "every hour, at 30 minutes"
0,30 * * * * * means "every hour, at 0 minutes and 30 minutes"
*/30 * * * * * means "every half hour"
0 0 * * 1 * means "every Monday at 00:00"
Do not try to guess too much. Use a cheat sheet such as https://crontab.guru/ for examples and tweaks. Most of the common usages will be described there directly. You can also edit a formula and get a descriptive piece of text of how it's going to run.
After the description of how to run the cron job, include the line to execute the task, as prepared in step 2 of the How to do it… section.
Note that the task is described with all of the full paths for every related file—the interpreter, the script, and the output file. This removes all ambiguity related to the paths and reduces the chances of possible errors. A very common error is for cron to not be able to determine one or more of these three elements.
There's more...
The description of the default output (standard output) can be a bit verbose. When calling python3 cron.py -h
, it gets displayed as:
-o OUTPUT output file (default: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)
This is the description of the standard output (stdout). The format of the parameter can be changed using the formatter_class
argument in the ArgumentParser
. This means that you can use a custom formatter inheriting from the available default ones to tweak the display of the value. Refer to the documentation at https://docs.python.org/2/library/argparse.html#formatter-class
If there's any problem in the execution of the crontab, you should receive a system mail. This will show up as a message in the terminal like this:
You have mail.
$
This can be read with mail:
$ mail
Mail version 8.1 6/6/93. Type ? for help.
"/var/mail/jaime": 1 message 1 new
>N 1 jaime@Jaimes-iMac-5K Fri Jun 17 21:15 20/914 "Cron <jaime@Jaimes-iM"
? 1
Message 1:
...
/usr/local/Cellar/python/3.8.1/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python: can't open file 'cron.py': [Errno 2] No such file or directory
In the next recipe, we will explore methods to capture the errors independently so that the task can run smoothly.
See also
- The Adding command-line options recipe in Chapter 1, Let's Begin Our Automation Journey, to understand the basic concepts of command-line options.
- The Capturing errors and problems recipe, next in this chapter, to learn how to store events happening during the execution.