Exercises - Error Handling#

This chapter suggested several edits to collate.py. Suppose our script now reads as follows:

"""
Combine multiple word count CSV-files
into a single cumulative count.
"""

import csv
import argparse
from collections import Counter
import logging

import utilities as util


ERRORS = {
    'not_csv_suffix' : '{fname}: File must end in .csv',
    }


def update_counts(reader, word_counts):
    """Update word counts with data from another reader/file."""
    for word, count in csv.reader(reader):
        word_counts[word] += int(count)


def main(args):
    """Run the command line program."""
    word_counts = Counter()
    logging.info('Processing files...')
    for fname in args.infiles:
        logging.debug(f'Reading in {fname}...')
        if fname[-4:] != '.csv':
            msg = ERRORS['not_csv_suffix'].format(fname=fname)
            raise OSError(msg)
        with open(fname, 'r') as reader:
            logging.debug('Computing word counts...')
            update_counts(reader, word_counts)
    util.collection_to_csv(word_counts, num=args.num)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('infiles', type=str, nargs='*',
                        help='Input file names')
    parser.add_argument('-n', '--num',
                        type=int, default=None,
                        help='Output n most frequent words')
    args = parser.parse_args()
    main(args)

The following exercises will ask you to make further edits to collate.py.

1) Set the logging level#

Define a new command-line flag for collate.py called --verbose (or -v) that changes the logging level from WARNING (the default) to DEBUG (the noisiest level).

Hint: the following command changes the logging level to DEBUG:

logging.basicConfig(level=logging.DEBUG)

Once finished, running collate.py with and without the -v flag should produce the following output:

$ python bin/collate.py results/dracula.csv
  results/moby_dick.csv -n 5
the,22559
and,12306
of,10446
to,9192
a,7629
$ python bin/collate.py results/dracula.csv
  results/moby_dick.csv -n 5 -v
INFO:root:Processing files...
DEBUG:root:Reading in results/dracula.csv...
DEBUG:root:Computing word counts...
DEBUG:root:Reading in results/moby_dick.csv...
DEBUG:root:Computing word counts...
the,22559
and,12306
of,10446
to,9192
a,7629

2) Send the logging output to file#

In Exercise 1), logging information is printed to the screen when the verbose flag is activated. This is problematic if we want to re-direct the output from collate.py to a CSV file, because the logging information will appear in the CSV file as well as the words and their counts.

  1. Edit collate.py so that the logging information is sent to a log file called collate.log instead. (HINT: logging.basicConfig has an argument called filename.)

  2. Create a new command-line option -l or --logfile so that the user can specify a different name for the log file if they don’t like the default name of collate.log.

3) Handling exceptions#

  1. Modify the script collate.py so that it catches any exceptions that are raised when it tries to open files and records them in the log file.

    When you are finished, the program should collate all the files it can, rather than halting as soon as it encounters a problem.

  2. Modify your first solution to handle nonexistent files and permission problems separately.

4) Testing error handling#

In our suggested solution to the previous exercise, we modified collate.py to handle different types of errors associated with reading input files. If the main function in collate.py now reads:

def main(args):
    """Run the command line program."""
    log_lev = logging.DEBUG if args.verbose else logging.WARNING
    logging.basicConfig(level=log_lev, filename=args.logfile)
    word_counts = Counter()
    logging.info('Processing files...')
    for fname in args.infiles:
        try:
            logging.debug(f'Reading in {fname}...')
            if fname[-4:] != '.csv':
                msg = ERRORS['not_csv_suffix'].format(
                    fname=fname)
                raise OSError(msg)
            with open(fname, 'r') as reader:
                logging.debug('Computing word counts...')
                update_counts(reader, word_counts)
        except FileNotFoundError:
            msg = f'{fname} not processed: File does not exist'
            logging.warning(msg)
        except PermissionError:
            msg = f'{fname} not processed: No read permission'
            logging.warning(msg)
        except Exception as error:
            msg = f'{fname} not processed: {error}'
            logging.warning(msg)
    util.collection_to_csv(word_counts, num=args.num)
  1. It is difficult to write a simple unit test for the lines of code dedicated to reading input files, because main is a long function that requires command-line arguments as input. Edit collate.py so that the six lines of code responsible for processing an input file appear in their own function that reads as follows (i.e., once you are done, main should call process_file in place of the existing code):

def process_file(fname, word_counts):
    """Read file and update word counts"""
    logging.debug(f'Reading in {fname}...')
    if fname[-4:] != '.csv':
        msg = ERRORS['not_csv_suffix'].format(
            fname=fname)
        raise OSError(msg)
    with open(fname, 'r') as reader:
        logging.debug('Computing word counts...')
        update_counts(reader, word_counts)
  1. Add a unit test to test_zipfs.py that uses pytest.raises to check that the new collate.process_file function raises an OSError if the input file does not end in .csv. Run pytest to check that the new test passes.

  2. Add a unit test to test_zipfs.py that uses pytest.raises to check that the new collate.process_file function raises a FileNotFoundError if the input file does not exist. Run pytest to check that the new test passes.

  3. Use the coverage library (Section test coverage) to check that the relevant commands in process_file (specifically raise OSError and open(fname, 'r')) were indeed tested.

5) Error catalogs#

In Section on writing usefule error messages we started to define an error catalog called ERRORS.

  1. Remember PEP8 and codingstyle, explain why we have used capital letters for the name of the catalog.

  2. Python has three ways to format strings: the % operator, the str.format method, and f-strings (where the “f” stands for “format”). Look up the documentation for each and explain why we have to use str.format rather than f-strings for formatting error messages in our catalog/lookup table.

  3. There’s a good chance we will eventually want to use the error messages we’ve defined in other scripts besides collate.py. To avoid duplication, move ERRORS to the utilities module that was first created in Section.

6) Tracebacks#

Run the following code:

try:
    1/0
except Exception as e:
    help(e.__traceback__)
  1. What kind of object is e.__traceback__?

  2. What useful information can you get from it?