TDM 30100: Project 3 — 2023

Motivation: Documentation is one of the most critical parts of a project. There are so many tools that are specifically designed to help document a project, and each have their own set of pros and cons. Depending on the scope and scale of the project, different tools will be more or less appropriate. For documenting Python code, however, you can’t go wrong with tools like Sphinx, or pdoc.

Context: This is the second project in a 3-project series where we explore thoroughly documenting Python code, while solving data-driven problems.

Scope: Python, documentation

Learning Objectives
  • Use Sphinx to document a set of Python code.

  • Use pdoc to document a set of Python code.

  • Write and use code that serializes and deserializes data.

  • Learn the pros and cons of various serialization formats.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data/apple/health/watch_dump.xml

Questions

Please use Firefox for this project. While other browsers like Chrome and Edge may work, we are providing instructions that are specific to Firefox and you may need to do a bit of research before getting another browser to work.

Before you begin, open Firefox, and where you would normally put a URL, type the following, followed by enter/return.

about:config

Search for network.cookie.sameSite.laxByDefault, and change the value to false, and close the tab. (This was set to false in my browser, so don’t be concerned if yours isn’t true by default. Just ensure it is set to false before starting the project.)

Question 1 (2 pt)

  1. Create a new directory in your $HOME directory called project03: $HOME/project03

  2. Create a new copy of the project template in a Jupyter notebook in your project03 folder called project03.ipynb.

  3. Create a module called firstname_lastname_project03.py in your $HOME/project03 directory, with the contents of the previous project.

  4. Write a module-level docstring for your project03 module.

  5. Write a function-level docstring for the get_records_for_date function.

You may be concerned that this project will leave your Jupyter notebook looking empty. This is intended, as the majority of the deliverables for this project will be the documentation generated by bash code you will write soon. Additionally, we will explicity specify what the deliverables are step-by-step in each question, so you will know exactly what to submit.

First, start by creating your new directory and copying in the template. While the deliverables say this has to have a path of $HOME/project03, you can put it anywhere you want, just note that you will have to update your code to reflect the location you choose and your final submission should not contain files unrelated to this specific project.

Next, copy the code you wrote in the previous project into a new python file in your project 3 directory called firstname_lastname_project03.py. If you didn’t finish the previous project, feel free to copy in the below code to get up-to-date. Then fill in a module-level docstring for the module along with a function-level docstring for the get_records_for_date function, both using Google style docstrings.

Make sure you change "firstname" and "lastname" to your first and last name.

This is simply the code from the previous project that you wrote, along with all the docstrings you wrote. If you did not complete the previous project or get things working for whatever reason, feel free to use the code below. Otherwise, copy and paste your code from the previous project.

"""
This module is for project 3 for TDM 30100.

**Serialization:** Serialization is the process of taking a set or subset of data and transforming it into a specific file format that is designed for transmission over a network, storage, or some other specific use-case.

**Deserialization:** Deserialization is the opposite process from serialization where the serialized data is reverted back into its original form.

The following are some common serialization formats:

- JSON
- Bincode
- MessagePack
- YAML
- TOML
- Pickle
- BSON
- CBOR
- Parquet
- XML
- Protobuf

**JSON:** One of the more wide-spread serialization formats, JSON has the advantages that it is human readable, and has a excellent set of optimized tools written to serialize and deserialize. In addition, it has first-rate support in browsers. A disadvantage is that it is not a fantastic format storage-wise (it takes up lots of space), and parsing large JSON files can use a lot of memory.

**MessagePack:** MessagePack is a non-human-readable file format (binary) that is extremely fast to serialize and deserialize, and is extremely efficient space-wise. It has excellent tooling in many different languages. It is still not the *most* space efficient, or *fastest* to serialize/deserialize, and remains impossible to work with in its serialized form.

Generally, each format is either *human-readable* or *not*. Human readable formats are able to be read by a human when opened up in a text editor, for example. Non human-readable formats are typically in some binary format and will look like random nonsense when opened in a text editor.

"""
import lxml
import lxml.etree
from datetime import datetime, date


def get_records_for_date(tree: lxml.etree._ElementTree, for_date: date) -> list:
    """
    insert function-level docstring here
    """

    if not isinstance(tree, lxml.etree._ElementTree):
        raise TypeError('tree must be an lxml.etree')

    if not isinstance(for_date, date):
        raise TypeError('for_date must be a datetime.date')

    results = []
    for record in tree.xpath('/HealthData/Record'):
        if for_date == datetime.strptime(record.attrib.get('startDate'), '%Y-%m-%d %X %z').date():
            results.append(record)

    return results

Next, in a bash cell in your project03.ipynb notebook, run the following, replacing "Firstname Lastname" with your name. This code will initialize a new Sphinx project inside your project03 directory, and we will explore the actual contents and purpose of the files generated throughout this project. Before moving on though, be sure to read through this page of the official Sphinx documentation to understand exactly what all of the arguments in this command do.

%%bash

cd $HOME/project03
python3 -m sphinx.cmd.quickstart ./docs -q -p project03 -a "Firstname Lastname" -v 1.0.0 --sep

What do all of these arguments do? Check out this page of the official documentation.

You should be left with a newly created docs directory within your project03 directory: $HOME/project03/docs. The directory structure should look similar to the following.

contents
project03(1)
├── 39000_f2021_project03_solutions.ipynb(2)
├── docs(3)
│   ├── build (4)
│   ├── make.bat
│   ├── Makefile (5)
│   └── source (6)
│       ├── conf.py (7)
│       ├── index.rst (8)
│       ├── _static
│       └── _templates
└── kevin_amstutz_project03.py(9)

5 directories, 6 files
1 Our module (named project03) folder
2 Your project notebook (probably named something like firstname_lastname_project03.ipynb)
3 Your documentation folder
4 Your empty build folder where generated documentation will be stored (inside docs)
5 The Makefile used to run the commands that generate your documentation (inside docs)
6 Your source folder. This folder contains all hand-typed documentation (inside docs)
7 Your conf.py file. This file contains the configuration for your documentation. (inside source)
8 Your index.rst file. This file (and all files ending in .rst) is written in reStructuredText — a Markdown-like syntax. (inside source)
9 Your module. This is the module containing the code from the previous project, with nice, clean docstrings. (also given above)

Please make the following modifications:

  1. To Makefile:

    # replace
    SPHINXOPTS    ?=
    SPHINXBUILD   ?= sphinx-build
    SOURCEDIR     = source
    BUILDDIR      = build
    
    # with the following
    SPHINXOPTS    ?=
    SPHINXBUILD   ?= python3 -m sphinx.cmd.build
    SOURCEDIR     = source
    BUILDDIR      = build
  2. To conf.py:

    # CHANGE THE FOLLOWING CONTENT FROM:
    
    # -- Path setup --------------------------------------------------------------
    
    # If extensions (or modules to document with autodoc) are in another directory,
    # add these directories to sys.path here. If the directory is relative to the
    # documentation root, use os.path.abspath to make it absolute, like shown here.
    #
    # import os
    # import sys
    # sys.path.insert(0, os.path.abspath('.')
    
    # TO:
    
    # -- Path setup --------------------------------------------------------------
    
    # If extensions (or modules to document with autodoc) are in another directory,
    # add these directories to sys.path here. If the directory is relative to the
    # documentation root, use os.path.abspath to make it absolute, like shown here.
    #
    import os
    import sys
    sys.path.insert(0, os.path.abspath('../..'))

Finally, with the modifications above having been made, run the following command in a bash cell in Jupyter notebook to generate your documentation.

cd $HOME/project03/docs
make html

After complete, your module folders structure should look something like the following.

structure
project03
├── 39000_f2021_project03_solutions.ipynb
├── docs
│   ├── build
│   │   ├── doctrees
│   │   │   ├── environment.pickle
│   │   │   └── index.doctree
│   │   └── html
│   │       ├── genindex.html
│   │       ├── index.html
│   │       ├── objects.inv
│   │       ├── search.html
│   │       ├── searchindex.js
│   │       ├── _sources
│   │       │   └── index.rst.txt
│   │       └── _static
│   │           ├── alabaster.css
│   │           ├── basic.css
│   │           ├── custom.css
│   │           ├── doctools.js
│   │           ├── documentation_options.js
│   │           ├── file.png
│   │           ├── jquery-3.5.1.js
│   │           ├── jquery.js
│   │           ├── language_data.js
│   │           ├── minus.png
│   │           ├── plus.png
│   │           ├── pygments.css
│   │           ├── searchtools.js
│   │           ├── underscore-1.13.1.js
│   │           └── underscore.js
│   ├── make.bat
│   ├── Makefile
│   └── source
│       ├── conf.py
│       ├── index.rst
│       ├── _static
│       └── _templates
└── kevin_amstutz_project03.py

9 directories, 29 files

Finally, let’s take a look at the results! In the left-hand pane in the Jupyter Lab interface, navigate to yourpath/project03/docs/build/html/, and right click on the index.html file and choose Open in New Browser Tab. You should now be able to see your documentation in a new tab. It should look something like the following.

Resulting Sphinx output
Figure 1. Resulting Sphinx output

Make sure you are able to generate the documentation before you proceed, otherwise, you will not be able to continue to modify, regenerate, and view your documentation.

Items to submit
  • Directory for project 3, containing an ipynb file and a python file as described above.

  • Module and function level docstrings where appropriate in the python file.

  • Documentation generated by Sphinx, as instructed above.

Question 2 (3 pts)

  1. Write a function called get_avg_heart_rate to get the average heart rate for a given date from our watch data.

  2. Write a function called get_median_heart_rate to find median heart rate for a given date from our watch data.

  3. Write a function called graph_heart_rate to create a box-and-whisker plot of heart rate for a given date from our watch data.

  4. Give each function an appropriate docstring.

  5. Run each function for April 4th, 2019 in your Jupyter notebook to prove they work. Ensure you add them to project03-key.py.

  6. Regenerate your documentation, and view the results in a new tab.

While you could redefine all of your logic to get data for a given date, it would be much easier to simply reuse the function you wrote in the previous project within your new functions.

Feel free to use library functions for the above functions (i.e. statistics for mean and median and matplotlib for plotting)

You can test your code using the following code in your Jupyter notebook:

date_records = get_records_for_date(tree, for_date)
print(f"Average: {format(get_avg_heart_rate(date_records),'.2f')}")
print(f"Median : {format(get_median_heart_rate(date_records),'.2f')}")
graph_heart_rate(date_records)

# This should output values in a format similar to the following:
# Average: 86.25
# Median : 83.00
# The box and whisker plot should reflect what you see in the average/median measures. Feel free to write an extra function to get standard deviations or quartiles for a more accurate way to check your work is correct.
Items to submit
  • 3 functions, named and as described above, including function-level docstrings.

  • Outputs of running the functions on April 4th, 2019.

  • Documentation generated by Sphinx, as instructed above.

Question 3 (3 pts)

  1. Create your own README.rst file in the docs/source folder.

  2. regenerate your documentation, and take a picture of the resulting webpage.

One of the most important documents in any package or project is the README file. This file is so important that version control companies like GitHub and GitLab will automatically display it below the repositories contents. This file contains things like instructions on how to install the packages, usage examples, lists of dependencies, license links, etc. Check out some popular GitHub repositories for projects like numpy, pytorch, or any other repository you’ve come across that you believe does a good job explaining the project.

In the docs/source folder, create a new file called README.rst. Choose 5 of the following "types" of reStructuredText from the this webpage, and create a fake README. The content can be Lorem Ipsum type of content as long as it demonstrates 5 of the types of reStructuredText.

  • Inline markup

  • Lists and quote-like blocks

  • Literal blocks

  • Doctest blocks

  • Tables

  • Hyperlinks

  • Sections

  • Field lists

  • Roles

  • Images

  • Footnotes

  • Citations

  • Etc.

Make sure to include at least 1 section. This counts as 1 of your 5 types of reStructuredText.

Once complete, add a reference to your README to the index.rst file. To add a reference to your README.rst file, open the index.rst file in an editor and add "README" as follows.

index.rst
.. project3 documentation master file, created by
   sphinx-quickstart on Wed Sep  1 09:38:12 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to project3's documentation!
====================================

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   README

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

Make sure "README" is aligned with ":caption:" — it should be 3 spaces from the left before the "R" in "README".

In a new bash cell in your notebook, regenerate your documentation.

%%bash

cd $HOME/project03/docs
make html

Check out the resulting index.html page, and click on the links. Pretty great!

Things should look similar to the following images.

Sphinx output
Figure 2. Sphinx output
Sphinx output
Figure 3. Sphinx output
Items to submit
Items to submit

When you submit your assignment, make sure that the .ipynb is viewable from within Gradescope. If it says something like (Large file hidden), you can submit the screenshots as PNGs (or any image format that works) as separate files on the assignment and then reference their names in the .ipynb. The bottom line is that we should be able to see each screenshot in Gradescope, without having to download your project first. This is because asking our TAs to download hundreds of projects would be a bit rude. Please post any clarifying questions on Piazza and we can answer them.

For this project, please submit the following files:

  • The .ipynb file with:

  • all functions throughout the project, demonstrated to be working as excpected.

  • every different bash command used to call Sphinx at least once

  • screenshots whenever we asked for them in a question

  • Screenshots of each section of your webpage documentation (NOT inside your Jupyter notebook).

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.