书名：Python Automation Cookbook
作者名：Jaime Buelta
本章字数：402字
更新时间：2024-12-21 01:38:32

Preparing a task

It all starts with defining precisely the work that needs to be executed, and designing it in a way that doesn't require human intervention to run.

Some ideal characteristic points are as follows:

Single, clear entry point: No confusion on how to start the task.
Clear parameters: If there are any parameters, they should be as explicit as possible.
No interactivity: Stopping the execution to request information from the user is not possible.
The result should be stored: In order to be checked at a different time than when it runs.
Clear result: When we oversee the execution of a program ourselves, we can accept more verbose results, such as unlabeled data or extra debugging information. However, for an automated task, the final result should be as concise and to the point as possible.
Errors should be logged: To analyze what went wrong.

A command-line program has a lot of those characteristics already. It always has a clear entry point, with defined parameters, and the result can be stored, even if just in text format. And it can be improved ensuring a config file that clarifies the parameters, and an output file.

Note that point 6 is the objective of the Capturing errors and problems recipe, and will be covered there.

To avoid interactivity, do not use any function that waits for user input, such as input. Remember to delete debugger breakpoints!

Getting ready

We'll start by following a structure in which a main function will serve as the entry point, and all parameters are supplied to it.

This is the same basic structure that was presented in the Adding command-line arguments recipe in Chapter 1, Let's Begin Our Automation Journey.

The definition of a main function with all of the explicit arguments covers points 1 (single, clear entry point) and 2 (clear parameters). Point 3 (no interactivity) is not difficult to achieve.

To improve points 2 (clear parameters) and 5 (clear result), we'll look at retrieving the configuration from a file and storing the result in another. Another option is to send a notification, such as an email, which will be covered later in this chapter.

How to do it...

Prepare the following command-line program by multiplying two numbers, and save it as prepare_task_step1.py:

import argparse
def main(number, other_number):
    result = number * other_number
    print(f'The result is {result}')
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-n1', type=int, help='A number', default=1)
    parser.add_argument('-n2', type=int, help='Another number', default=1)
    args = parser.parse_args()
    main(args.n1, args.n2)

Run prepare_task_step1.py by multiplying two numbers:

$ python3 prepare_task_step1.py -n1 3 -n2 7
The result is 21

Update the file to define a config file that contains both arguments, and save it as prepare_task_step3.py. Note that defining a config file overwrites any command-line parameters:

import argparse
import configparser
def main(number, other_number):
    result = number * other_number
    print(f'The result is {result}')
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-n1', type=int, help='A number', default=1)
    parser.add_argument('-n2', type=int, help='Another number', default=1)
    parser.add_argument('--config', '-c', type=argparse.FileType('r'),
                        help='config file')
    args = parser.parse_args()
    if args.config:
        config = configparser.ConfigParser()
        config.read_file(args.config)
        # Transforming values into integers
        args.n1 = int(config['ARGUMENTS']['n1'])
        args.n2 = int(config['ARGUMENTS']['n2'])
    main(args.n1, args.n2)

Create the config file, config.ini. See the ARGUMENTS section and the n1 and n2 values:
```
[ARGUMENTS]
n1=5
n2=7
```

Run the command with the config file. Note that the config file overwrites the command-line parameters, as described in step 2:

$ python3 prepare_task_step3.py -c config.ini
The result is 35
$ python3 prepare_task_step3.py -c config.ini -n1 2 -n2 3
The result is 35

Add a parameter to store the result in a file, and save it as prepare_task_step6.py:

import argparse
import sys
import configparser
def main(number, other_number, output):
    result = number * other_number
    print(f'The result is {result}', file=output)
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-n1', type=int, help='A number', default=1)
    parser.add_argument('-n2', type=int, help='Another number', default=1)
    parser.add_argument('--config', '-c', type=argparse.FileType('r'),
                        help='config file')
    parser.add_argument('-o', dest='output', type=argparse.FileType('w'),
                        help='output file',
                        default=sys.stdout)
    args = parser.parse_args()
    if args.config:
        config = configparser.ConfigParser()
        config.read_file(args.config)
        # Transforming values into integers
        args.n1 = int(config['ARGUMENTS']['n1'])
        args.n2 = int(config['ARGUMENTS']['n2'])
    main(args.n1, args.n2, args.output)

Run the result to check that it's sending the output to the defined file. Note that there's no output outside the result files:

$ python3 prepare_task_step6.py -n1 3 -n2 5 -o result.txt
$ cat result.txt
The result is 15
$ python3 prepare_task_step6.py -c config.ini -o result2.txt
$ cat result2.txt
The result is 35

How it works…

Note that the argparse module allows us to define files as parameters, with the argparse.FileType type, and opens them automatically. This is very handy and will raise an error if the file path leads to an invalid location.

Remember to open the file in the correct mode. In step 5, the config file is opened in read mode (r) and the output file in write mode (w), which will overwrite the file if it exists. You may find the append mode (a) useful, which will add the next piece of data at the end of an existing file.

configparser module allows us to use config files with ease. As demonstrated in step 2, the parsing of the file is simple, as follows:

config = configparser.ConfigParser()
config.read_file(file)

The config will then be accessible as a dictionary. This will have the sections of the config file as the keys, and inside another dictionary with each of the config values. So, the value n2 in the ARGUMENTS section is accessed as config['ARGUMENTS']['n2'].

Note that the values are always stored as strings, which are required to be transformed into other types, such as integers.

If you need to obtain Boolean values, do not perform value = bool(config[raw_value]), as any string will be transformed into True no matter what; for instance, the string False is a true string, as it's not empty. Using an empty string is a bad option as well, as they are very confusing. Use the .getboolean method instead, for example, value = config.getboolean(raw_value). There are similar getint() and getfloat() for integers and float values.

Python 3 allows us to pass a file parameter to the print function, which will write to that file. Step 5 shows the usage to redirect all of the printed information to a file.

Note that the default parameter is sys.stdout, which will print the value to the terminal (standard output). This means that calling the script without an -o parameter will display the information on the screen, which is helpful when developing and debugging the script:

$ python3 prepare_task_step6.py -c config.ini
The result is 35
$ python3 prepare_task_step6.py -c config.ini -o result.txt
$ cat result.txt
The result is 35

There's more...

Please refer to the full documentation of configparser in the official Python documentation: https://docs.python.org/3/library/configparser.html.

In most cases, this configuration parser should be good enough, but if more power is needed, you can use YAML files as configuration files. YAML files (https://learn.getgrav.org/advanced/yaml) are very common as configuration files. They are well structured and can be parsed directly, taking into account of various data types:

Add PyYAML to the requirements.txt file:
```
PyYAML==5.3
```
Install the requirements in the virtual environment:
```
$ pip install -r requirements.txt
```

Create the prepare_task_yaml.py file:

import yaml
import argpars
import sys
def main(number, other_number, output):
    result = number * other_number
    print(f'The result is {result}', file=output)
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-n1', type=int, help='A number', default=1)
    parser.add_argument('-n2', type=int, help='Another number', default=1)
    parser.add_argument('-c', dest='config', type=argparse.FileType('r'),
 help='config file in YAML format',
 default=None)
    parser.add_argument('-o', dest='output', type=argparse.FileType('w'),
                        help='output file',
                        default=sys.stdout)
    args = parser.parse_args()
    if args.config:
        config = yaml.load(args.config, Loader= yaml.FullLoader)
        # No need to transform values
        args.n1 = config['ARGUMENTS']['n1']
        args.n2 = config['ARGUMENTS']['n2']
    main(args.n1, args.n2, args.output)

Note that the PyYAML yaml.load() function requires a Loader parameter. This is to avoid arbitrary code execution if the YAML file comes from an untrusted source. Always use yaml.SafeLoader unless you need a set of YAML language features. Never use loaders other than yaml.SafeLoader if any part of the data coming from a YAML file comes from an untrusted source (for example, user input). Refer to this article for more information: https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation.

Define the config file, config.yaml:
```
ARGUMENTS:
    n1: 7
    n2: 4
```

Then, run the following:

$ python3 prepare_task_yaml.py -c config.yaml
The result is 28

There's also the possibility of setting a default config file, as well as a default output file. This can be handy to create a task that requires no input parameters.

As a general rule, try to avoid creating too many input and configuration parameters if the task has a very specific objective in mind. Try to limit the input parameters to different executions of the task. A parameter that never changes is probably fine being defined as a constant. A high number of parameters will make config files or command-line arguments complicated and will create more maintenance in the long run. On the other hand, if your objective is to create a very flexible tool to be used in very different situations, then creating more parameters is probably a good idea. Try to find your own proper balance!

Preparing a task

Getting ready

How to do it...

How it works…

There's more...

See also