STAT 39000: Project 5 — Fall 2021

Testing Python: part II

Motivation: Code, especially newly written code, is refactored, updated, and improved frequently. It is for these reasons that testing code is imperative. Testing code is a good way to ensure that code is working as intended. When a change is made to code, you can run a suite a tests, and feel confident (or at least more confident) that the changes you made are not introducing new bugs. While methods of programming like TDD (test-driven development) are popular in some circles, and unpopular in others, what is agreed upon is that writing good tests is a useful skill and a good habit to have.

Context: This is the second in a series of two projects that explore writing unit tests, and doc tests. In The Data Mine, we will focus on using pytest, and mypy, while writing code to manipulate and work with data.

Scope: Python, testing, pytest, mypy

Learning Objectives
  • Write and run unit tests using pytest.

  • Include and run doc tests in your docstrings, using doctest.

  • Gain familiarity with mypy, and explain why static type checking can be useful.

  • Comprehend what a function is, and the components of a function in Python.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /depot/datamine/data/apple/health/2021/*

Questions

At this end of this project you will have only 3 files to submit:

  • watch_data.py

  • test_watch_data.py

  • firstname-lastname-project05.ipynb

Make sure that the output from running the cells is displayed and saved in your notebook before submitting.

Question 1

First, setup your workspace in Brown. Create a new folder called project05 in your $HOME directory. Make sure your apple_watch_parser package from the previous project is in your $HOME/project05 directory. In addition, create your new notebook, firstname-lastname-project05.ipynb in your $HOME/project05 directory. Great!

An updated apple_watch_parser package will be made available for this project on Saturday, September 25, in /depot/datamine/data/apple/health/apple_watch_parser02. To copy this over to your $HOME/project05 directory, run the following in a terminal on Brown.

cp -R /depot/datamine/data/apple/health/apple_watch_parser02 $HOME/project05/apple_watch_parser

Note that this updated package is not required by any means to complete this project.

Now, in $HOME/project05/apple_watch_parser/test_watch_data.py, modify the test_time_difference function to use parametrizing to test the time difference between 100 sets of times.

The following is an example of how to generate a list of 100 datetimes 1 day and 1 hour apart.

import pytz
import datetime

start_time = datetime.datetime.now(pytz.utc)
one_day = datetime.timedelta(days=1, hours=1)

list_of_datetimes = [start_time+one_day*i for i in range(100)]

An example of how to convert a datetime to a string in the same format our time_difference function expects is below.

import pytz
import datetime

my_datetime = datetime.datetime.now(pytz.utc)
my_string = my_datetime.strftime('%Y-%m-%d %H:%M:%S %z')

See the very first example here for how to parametrize a test for a function accepting 2 arguments instead of 1.

The zip function in Python will be particularly useful. Note in first example here, that the second argument to the @pytest.mark.parametrize() decorator is a list of tuples. zip accepts n lists of m elements, and returns a list of m tuples, where each tuple contains the elements of the lists in the same order.

zip([1,2,3], [5,5,5], [9 for i in range(3)])

You do not need to manually calculate the expected result for each combination of datetime’s that you will pass to the time_difference function. Since you know exactly how many seconds you put between the datetime’s you automatically generated, you can just use those values for the expected result. For example, if you generated 100 datetimes that are each 1 day apart, you will know that the expected difference is 86400 seconds, and could pass a list of the value 86400 repeated 100 times to the test_time_difference function’s third argument (the expected results).

Run the pytest tests from a bash cell in your notebook.

%%bash

cd $HOME/project05
python -m pytest

Relevant topics: parametrizing tests

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 2

Read and understand the filter_elements method in the WatchData class. This is an example of a function that accepts another function as an argument. A term for functions that accept other functions as arguments are higher-order functions. Think about the filter_elements method, and list at least 2 reasons why this may be a good idea.

The docstring contains an explanation of the filter_elements method. In addition, the module provides a function called example_filter that can be used with the filter_elements method.

Use the example_filter function to filter the WatchData class, and print the first 5 results.

Remember to import and use the package, make sure that the notebook is in the same directory as the apple_watch_parser package.

from apple_watch_parser import watch_data

dat = watch_data.WatchData('/depot/datamine/data/apple/health/2021/')
print(dat)

When passing a function as an argument to another function, you should not include the opening and closing parentheses in the argument. For example, the following is not correct.

dat.filter_elements(example_filter())

Why? Because the example_filter() part will try to evaluate the function and will essentially be translated into the output of running example_filter(), and we don’t want it to. We want to pass the function itself, so that the filter_elements method can use the example_filter function internally.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 3

Write your own *_filter function in a Python code cell in your notebook (like example_filter) that can be used with the filter_elements method. Be sure to include a Google style docstring (no doctests are needed).

Does it work as intended? Print the first 5 results when using your filter.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 4

In the previous project, we did not test out the filter_elements method in our WatchData class. Testing this method is complicated for two main reasons.

  1. The method accepts any function following a set of rules (described in our docstring) as an argument. This (a *_filter function) may not be something that is available after immediately importing the WatchData class — normally there wouldn’t be an example_filter function in the module for you to use, as this would be a function that a user of the library would create for their own purposes.

  2. In order to be able to test the filter_elements method, we would need a dataset that is similarly structured as the intended dataset (Apple Watch exports), that we know the expected output for, so we can test.

pytest supports writing fixtures that can be used to solve these problems.

To address problem (1):

  • Remove the example_filter function from the watch_data.py module, and instead, modify the test_watch_data.py file and add the example_filter function to the test_watch_data.py module as a pytest fixture. Read this and 2 or 3 following sections. In addition, see this stackoverflow answer to better understand how to create a fixture that is a function that can accept arguments.

You may need to import lxml and other libraries in your test_watch_data.py file. For safety, you can just add the following.

import watch_data
import pytest
from pathlib import Path
import os
import lxml.etree
import pytz
import datetime

Why do we need to do something like the stackoverflow post describes? The reason is, by default, pytest will assume that the argument, element, to the example_filter function is a fixture itself, and won’t work! This is the workaround.

In the example in the stackoverflow post, the _method(a,b) function is the equivalent of the example_filter function.

As a side note, sometimes helper functions (functions defined and used inside of another function) are called helper functions, and it is good practice to name them starting with an underscore — just like the _method(a,b) function in the stackoverflow post.

You can start by cutting the example_filter function from watch_data.py and paste it in test_watch_data.py. Then, to make it a fixture, wrap it in another function just like in the stackoverflow post.

To address problem (2):

  • Create a new test_data directory in your apple_watch_parser package. So, $HOME/project05/apple_watch_parser/test_data should now exist. Add /depot/datamine/data/apple/health/2021/sample.xml to this directory, and rename it to export.xml. So, $HOME/project05/apple_watch_parser/test_data/export.xml should now exist.

    sample.xml is a small sample of the the watch data that we can use for out tests. It is small enough to be portable, yet is similar enough to the intended types of datasets that it will be a good way to test our WatchData class and its methods. Since we renamed it to export.xml, it will work with our WatchData class.

  • Create a test_filter_elements function in your test_watch_data.py module. Use this library (already installed), to handle properly copying the test_data/export.xml file to a temporary directory for the test. Examples 2 and 3 here will be particularly helpful.

    You may be wondering why we would want to use this library for our test rather than just hard-coding the path to our test files in our test function(s). The reason is the following. What if one of your functions had a side-effect that modified your test data? Then, any other tests you run using the same data would be tainted and potentially fail! Bad news. This package allows for a systematic way to first copy our test data to a temporary location, and then run our test using the data in that temporary location.

    In addition, if you have many test function that work on the same dataset, you can do something like the following to re-use the code over and over again.

    export_xml_decorator = pytest.mark.datafiles(...)
    
    @export_xml_decorator
    def test_1(datafiles):
        pass
    
    @export_xml_decorator
    def test_2(datafiles):
        pass

    Each of the tests, test_1 and test_2, will work on the same example dataset, but will have a fresh copy of the dataset each time. Very cool!

    The decorator, @pytest.mar.datafiles() is expecting a path to the test data, export.xml. To get the absolute path to the test data, $HOME/project05/apple_watch_parser/test_data/export.xml, you can use the pathlib library.

    test_watch_data.py
    import watch_data # since watch_data.py is in the same directory as test_watch_data.py, we can import it directly
    from pathlib import Path
    
    # To get the path of the watch_data Python module
    this_module_path = Path(watch_data.__file__).resolve().parent
    print(this_module_path) # $HOME/project05/apple_watch_parser
    
    # To get the test_data folders absolute path, we could then do
    print(this_module_path / 'test_data') # $HOME/project05/apple_watch_parser/test_data
    
    # To get the test_data/export.xml absolute path, we could then do ...?
    # HINT: The answer to this question is _exactly_ what should be passed to the `@pytest.mark.datafiles()` decorator.
    @pytest.mark.datafiles(answer_here)
    def test_filter_elements(datafiles, example_filter_fixture): # replace example_filter_fixture with the name of your fixture function
        pass

Okay, great! Your test_watch_data.py module should now have 2 additional functions, "symbolically" something like this:

# from https://stackoverflow.com/questions/44677426/can-i-pass-arguments-to-pytest-fixtures
@pytest.fixture
def my_fixture():

  def _method(a, b):
    return a*b

  return _method

@pytest.mark.datafiles(answer_here)
def test_filter_elements(datafiles, my_fixture):
    pass

Fill in the test_filter_elements function with at least 1 assert statements that tests the filter_elements function. It could be as simple as comparing the length of the output when using the example_filter function as our filter. test_data/example.xml should return 2 elements using our example_filter function as the filter.

As a reminder, to run pytest from a bash cell in your notebook (which should be in the same directory as your apple_watch_parser directory, or $HOME/project05/apple_watch_parser/firstname-lastname-project05.ipynb), you can run the following.

%%bash

cd $HOME/project05
python -m pytest

If you get an error that says pytest.mark.datafiles isn’t defined, or something similar, do not worry, this can be ignored. Alternatively, if you add a file called pytest.ini to your $HOME/project05 directory, with the following contents, this warning will go away.

pytest.ini
[pytest]
markers =
    datafiles: mark a test as a datafiles.
Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 5

Create an additional method in the WatchData class in the watch_data.py module that does something interesting or useful with the data. Be sure to include a Google style docstring (no doctests are needed). In addition, write 1 or more pytest tests for your new method that uses fixtures. Make sure your test passes (you can run your pytest tests from a bash cell in your notebook).

If you are up for a bigger challenge, design your new method to be similar to filter_elements in that a user can write their own functions or classes that can be passed to it (as arguments) in order to accomplish something useful that they may want to be customized.

We will count the use of the @pytest.mark.datafiles() decorator as a fixture, if you decide to not complete the "bigger challenge".

Make sure to run the pytest tests from a bash cell in your notebook.

%%bash

cd $HOME/project05
python -m pytest
Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.